Work sharing in Data Analytics

People involved: Iraklis Psaroudakis, Manos Athanassoulis

Data warehousing workloads often consist of star queries. Data warehouses optimize and execute each star query as if it was the only one running in the system, using a separate execution plan. In fact, each execution plan is independent of other concurrent star queries even if they are similar or identical. For a small number of concurrent star queries, this approach makes the optimization phase faster and creates efficient execution plans. For higher concurrency, however, it cannot exploit work and data sharing opportunities, resulting in increased contention for I/O and CPU resources.

In this project, we begin by comparing two existing approaches for work sharing, via pipelined execution, among concurrent star queries: (a) QPipe, a staged operator-centric query execution engine – that supports the On-demand Simultaneous Pipelining (OSP) of intermediate results – and (b) CJOIN, a join operator that exploits the semantics of star schemas. We show that work sharing via data pipelining (CJOIN) outperforms task pipelining (QPipe) in most cases, except when the similarity among submitted star queries is high or when the selectivity of dimension tables is high. Hence, we argue that an execution engine for star queries needs a hybrid approach to decide, at run-time, which form of work sharing is more beneficial and we continue to work towards creating such a hybrid query execution engine.

EXTERNAL FUNDING SOURCES

This work is partially supported in the means of the european funded project BIGFOOT.

PUBLICATIONS

Iraklis Psaroudakis, Manos Athanassoulis, Anastasia Ailamaki: “Sharing data and work across concurrent analytical queries.” VLDB 2013.

Iraklis Psaroudakis, Manos Athanassoulis, Matthaios Olma, Anastasia Ailamaki: “Reactive and proactive sharing across concurrent analytical queries.” SIGMOD 2014 (demo).