Authors
Long Cheng, Spyros Kotoulas, Tomas E Ward, Georgios Theodoropoulos
Publication date
2014/11/3
Book
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
Pages
1399-1408
Description
The performance of joins in parallel database management systems is critical for data intensive operations such as querying. Since data skew is common in many applications, poorly engineered join operations result in load imbalance and performance bottlenecks. State-of-the-art methods designed to handle this problem offer significant improvements over naive implementations. However, performance could be further improved by removing the dependency on global skew knowledge and broadcasting. In this paper, we propose PRPQ (partial redistribution & partial query), an efficient and robust join algorithm for processing large-scale joins over distributed systems. We present the detailed implementation and a quantitative evaluation of our method. The experimental results demonstrate that the proposed PRPQ algorithm is indeed robust and scalable under a wide range of skew conditions. Specifically …
Total citations
2013201420152016201720182019202020212022202320241634114412211
Scholar articles
L Cheng, S Kotoulas, TE Ward, G Theodoropoulos - Proceedings of the 23rd ACM International Conference …, 2014