Authors
Sorina Camarasu-Pop, Tristan Glatard, Rafael Ferreira Da Silva, Pierre Gueth, David Sarrut, Hugues Benoit-Cattin
Publication date
2013/3/31
Journal
Future Generation Computer Systems
Volume
29
Issue
3
Pages
728-738
Publisher
North-Holland
Description
This paper introduces an end-to-end framework for efficient computing and merging of Monte Carlo simulations on heterogeneous distributed systems. Simulations are parallelized using a dynamic load-balancing approach and multiple parallel mergers. Checkpointing is used to improve reliability and to enable incremental results merging from partial results. A model is proposed to analyze the behavior of the proposed framework and help tune its parameters. Experimental results obtained on a production grid infrastructure show that the model fits the real makespan with a relative error of maximum 10%, that using multiple parallel mergers reduces the makespan by 40% on average, that checkpointing enables the completion of very long simulations and that it can be used without penalizing the makespan.
Total citations
2012201320142015201620172018201920202021202220231551086353142