Modern Storage in DBMS (PhD students graduated)

People involved: Manos Athanassoulis, Radu Stoica 

Over the past decade, new solid state storage technologies, with flash being the most mature one, have become increasingly popular. Such technologies store data durably, and can alleviate many handicaps of hard disk drives (HDDs). Nonetheless, they have very different characteristics compared to HDDs, making it challenging to integrate such technologies into data intensive systems, such as database management systems (DBMS), that rely heavily on underlying storage behaviors. In this project we reserarch where and how will flash be exploited in a DBMS. We describe techniques for making effective use of flash: (i) as the main data store for transaction processing, and (ii) as an update cache for HDD-resident data warehouses.
Using Flash a secondary storage: In this project we investigate how a DBMS take full advantage of flash memory as persistent storage. We propose new a flash aware data layout “append and pack” which stabilizes device performance by eliminating random writes. We assess the impact of append and pack on OLTP workload performance using both an analytical model and micro-benchmarks, and our results suggest that significant improvements can be achieved for real workloads.
Flash as a specialized cache: This project presents a novel approach for supporting online updates in data warehouses that overcomes the limitations of prior approaches, by making judicious use of available SSDs to cache incoming updates. We model the problem of query processing with differential updates as a type of outer join between the data residing on disks and the updates residing on SSDs. We present MaSM algorithms for performing such joins and periodic migrations, with small memory footprints, low query overhead, low SSD writes, efficient in-place migration of updates, and correct ACID support. Our experiments show that MaSM incurs only up to 7% overhead both on synthetic range scans (varying range size from 100GB to 4KB) and in a TPC-H query replay study, while also increasing the update throughput by orders of magnitude.


This work is partially supported in the means of the european funded project BIGFOOT.