Learned Query Engines

The ongoing data explosion necessitates that the database software efficiently uses the available hardware and exploits data properties, to enable timely business intelligence. Additionally, data and hardware become increasingly heterogeneous: modern servers are adopting a variety of hardware accelerators to increase their energy efficiency while data scientists query inputs from multiple sources, formats and even scientific domains. In this line of work, we are enabling query engines to adapt to the available hardware and data formats as well as automatically exploit domain-specific properties by generating specialized query engines on demand, achieving the performance of specialized engines without the extra development effort and time.

Data Cleaning:

Data cleaning has become an indispensable part of data analysis due to the increasing amount of dirty data. Data scientists spend most of their time preparing dirty data before it can be used for data analysis. We focus on approaches that address the coverage, and performance issues of data cleaning operations, while also integrating data cleaning tasks seamlessly into the data analysis process.

Elastic & Distributed Query Engines:

We build transactional and analytical engines that leverage native cloud functionality, such as elasticity and distribution. We provide fine-grained elasticity through cross-cutting system designs, spanning throughout the whole software virtualization stack, whereas we build our distributed query processing systems on top of Spark and other parallel frameworks.

Findings:

P. Chrysogelos; M. Karpathiotakis; R. Appuswamy; A. Ailamaki : HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines. 2019. 45th International Conference on Very Large Data Bases, Los Angeles, California, USA, August 26-30, 2019.

S. A. Giannakopoulou; M. Karpathiotakis; B. C. D. Gaidioz; A. Ailamaki : CleanM: An Optimizable Query Language for Unified Scale-Out Data Cleaning. 2017. 43rd International Conference on Very Large Databases, Munich, Germany, August 28th to September 1, 2017.

P. Chrysogelos; P. Sioulas; A. Ailamaki : Hardware-conscious Query Processing in GPU-accelerated Analytical Engines. 2019. 9th Biennial Conference on Innovative Data Systems Research, Asilomar, California, USA, January 13-16, 2019.

M. Karpathiotakis; I. Alagiannis; A. Ailamaki : Fast Queries Over Heterogeneous Data Through Engine Customization. 2016. 42nd International Conference on Very Large Databases, New Delhi, India, September 5-9, 2016. p. 972-983.

M. Olma; M. Karpathiotakis; I. Alagiannis; M. Athanassoulis; A. Ailamaki : Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing. 2017-06-01. p. 1106-1117. DOI : 10.14778/3115404.3115415.

M. Karpathiotakis; I. Alagiannis; T. Heinis; M. Branco; A. Ailamaki : Just-In-Time Data Virtualization: Lightweight Data Management with ViDa. 2015. 7th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, California, USA, January 4-7, 2015.

I. Alagiannis; R. Borovica; M. Branco; S. Idreos; A. Ailamaki : NoDB: Efficient Query Execution on Raw Data Files. 2012. ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, USA, May 20–24, 2012.