Authors
Md Mohsin Ali, Peter E Strazdins, Brendan Harding, Markus Hegland, Jay W Larson
Publication date
2015/7/20
Conference
2015 International Conference on High Performance Computing & Simulation (HPCS)
Pages
499-507
Publisher
IEEE
Description
Applications performing ultra-large scale simulations via solving PDEs require very large computational systems for their timely solution. Studies have shown the rate of failure grows with the system size and these trends are likely to worsen in future machines as less reliable components are used to reduce the energy cost. Thus, as systems, and the problems solved on them, continue to grow, the ability to survive failures is becoming a critical aspect of algorithm development. The sparse grid combination technique (SGCT) is a cost-effective method for solving time-evolving PDEs, especially for higher-dimensional problems. It can also be easily modified to provide algorithm-based fault tolerance for these problems. In this paper, we show how the SGCT can produce a fault-tolerant version of the GENE gyrokinetic plasma application, which evolves a 5D complex density field over time. We use an alternate component …
Total citations
2015201620172018201920202021202221111111
Scholar articles
MM Ali, PE Strazdins, B Harding, M Hegland… - 2015 International Conference on High Performance …, 2015