Authors
Wolf Rödiger, Tobias Mühlbauer, Philipp Unterbrunner, Angelika Reiser, Alfons Kemper, Thomas Neumann
Publication date
2014
Conference
Proceedings of the International Conference on Data Engineering
Pages
592-603
Publisher
IEEE
Description
The growth in compute speed has outpaced the growth in network bandwidth over the last decades. This has led to an increasing performance gap between local and distributed processing. A parallel database cluster thus has to maximize the locality of query processing. A common technique to this end is to co-partition relations to avoid expensive data shuffling across the network. However, this is limited to one attribute per relation and is expensive to maintain in the face of updates. Other attributes often exhibit a fuzzy co-location due to correlations with the distribution key but current approaches do not leverage this. In this paper, we introduce locality-sensitive data shuffling, which can dramatically reduce the amount of network communication for distributed operators such as join and aggregation. We present four novel techniques: (i) optimal partition assignment exploits locality to reduce the network phase …
Total citations
20132014201520162017201820192020202120222023202415109111356551
Scholar articles
W Rödiger, T Mühlbauer, P Unterbrunner, A Reiser… - 2014 IEEE 30th International Conference on Data …, 2014