View article

[PDF] from uth.gr

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

Authors

Onur Kayıran, Adwait Jog, Mahmut T Kandemir, Chita R Das

Publication date

2013/9/7

Conference

Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Pages

157-166

Publisher

IEEE

Description

General-purpose graphics processing units (GPG-PUs) are at their best in accelerating computation by exploiting abundant thread-level parallelism (TLP) offered by many classes of HPC applications. To facilitate such high TLP, emerging programming models like CUDA and OpenCL allow programmers to create work abstractions in terms of smaller work units, called cooperative thread arrays (CTAs). CTAs are groups of threads and can be executed in any order, thereby providing ample opportunities for TLP. The state-of-the-art GPGPU schedulers allocate maximum possible CTAs per-core (limited by available on-chip resources) to enhance performance by exploiting TLP. However, we demonstrate in this paper that executing the maximum possible number of CTAs on a core is not always the optimal choice from the performance perspective. High number of concurrently executing threads might cause more …

Total citations

Cited by 332

20122013201420152016201720182019202020212022202320241 5 27 47 49 39 57 31 23 17 11 8 2

Scholar articles

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir, CR Das - Proceedings of the 22nd international conference on …, 2013