Authors
Paul Delestrac, Debjyoti Battacharjee, Simei Yang, Diksha Moolchandani, Francky Catthoor, Lionel Torres, David Novo
Publication date
2024/3/25
Conference
2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)
Pages
1-6
Publisher
IEEE
Description
Training time has become a critical bottleneck due to the recent proliferation of large-parameter ML models. GPUs continue to be the prevailing architecture for training ML models. However, the complex execution flow of ML frameworks makes it difficult to understand GPU computing resource utilization. Our main goal is to provide a better understanding of how efficiently ML training workloads use the computing resources of modern GPUs. To this end, we first describe an ideal reference execution of a GPU-accelerated ML training loop and identify relevant metrics that can be measured using existing profiling tools. Second, we produce a coherent integration of the traces obtained from each profiling tool. Third, we leverage the metrics within our integrated trace to analyze the impact of different software optimizations (e.g., mixed-precision, various ML frameworks, and execution modes) on the throughput and the …
Scholar articles
P Delestrac, D Battacharjee, S Yang, D Moolchandani… - 2024 Design, Automation & Test in Europe Conference …, 2024