View article

[PDF] from osti.gov

Gpu-centric communication on nvidia gpu clusters with infiniband: A case study with openshmem

Authors

Sreeram Potluri, Anshuman Goswami, Davide Rossetti, Chris J Newburn, Manjunath Gorentla Venkata, Neena Imam

Publication date

2017/12/18

Conference

2017 IEEE 24th International Conference on High Performance Computing (HiPC)

Pages

253-262

Publisher

IEEE

Description

GPUs have become an essential component for building compute clusters with high compute density and high performance per watt. As such clusters scale to have 1000s of GPUs, efficiently moving data between the GPUs becomes imperative to get maximum performance. NVSHMEM is an implementation of the OpenSHMEM standard for NVIDIA GPU clusters which allows communication to be issued from inside GPU kernels. In earlier work, we have shown how NVSHMEM can be used to achieve better application performance on GPUs connected through PCIe or NVLink. As part of this effort, we implement IB verbs for Mellanox InfiniBand adapters in CUDA. We evaluate different design alternatives, taking into consideration the relaxed memory model, automatic memory access coalescing and thread hierarchy on the GPU. We also consider correctness issues that arise in these designs. We take advantage of …

Total citations

Cited by 35

20182019202020212022202320245 8 8 3 2 6 3

Scholar articles

Gpu-centric communication on nvidia gpu clusters with infiniband: A case study with openshmem

S Potluri, A Goswami, D Rossetti, CJ Newburn… - 2017 IEEE 24th International Conference on High …, 2017