Authors
Bingsheng He, Naga K Govindaraju, Qiong Luo, Burton Smith
Publication date
2007/11/10
Book
Proceedings of the 2007 ACM/IEEE Conference on Supercomputing
Pages
1-12
Description
Gather and scatter are two fundamental data-parallel operations, where a large number of data items are read (gathered) from or are written (scattered) to given locations. In this paper, we study these two operations on graphics processing units (GPUs).
With superior computing power and high memory bandwidth, GPUs have become a commodity multiprocessor platform for general-purpose high-performance computing. However, due to the random access nature of gather and scatter, a naive implementation of the two operations suffers from a low utilization of the memory bandwidth and consequently a long, unhidden memory latency. Additionally, the architectural details of the GPUs, in particular, the memory hierarchy design, are unclear to the programmers. Therefore, we design multi-pass gather and scatter operations to improve their data access locality, and develop a performance model to help understand …
Total citations
20082009201020112012201320142015201620172018201920202021202220232024121817161912171871385914672
Scholar articles
B He, NK Govindaraju, Q Luo, B Smith - Proceedings of the 2007 ACM/IEEE Conference on …, 2007