View article

[PDF] from microsoft.com

Efficient gather and scatter operations on graphics processors

Authors

Bingsheng He, Naga K Govindaraju, Qiong Luo, Burton Smith

Publication date

2007/11/10

Book

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing

Pages

1-12

Description

Gather and scatter are two fundamental data-parallel operations, where a large number of data items are read (gathered) from or are written (scattered) to given locations. In this paper, we study these two operations on graphics processing units (GPUs).

With superior computing power and high memory bandwidth, GPUs have become a commodity multiprocessor platform for general-purpose high-performance computing. However, due to the random access nature of gather and scatter, a naive implementation of the two operations suffers from a low utilization of the memory bandwidth and consequently a long, unhidden memory latency. Additionally, the architectural details of the GPUs, in particular, the memory hierarchy design, are unclear to the programmers. Therefore, we design multi-pass gather and scatter operations to improve their data access locality, and develop a performance model to help understand …

Total citations

Cited by 205

2008200920102011201220132014201520162017201820192020202120222023202412 18 17 16 19 12 17 18 7 13 8 5 9 14 6 7 2

Scholar articles

Efficient gather and scatter operations on graphics processors

B He, NK Govindaraju, Q Luo, B Smith - Proceedings of the 2007 ACM/IEEE Conference on …, 2007