Authors
Matt Ezell, David Dillow, S Oral, Feiyi Wang, Devesh Tiwari, Don E Maxwell, D Leverman, J Hill
Publication date
2014/5
Journal
Cray User Group Conference (CUG 2014)
Description
The Oak Ridge Leadership Computing Facility (OLCF) introduced the concept of Fine-Grained Routing in 2008 to improve I/O performance between the Jaguar supercomputer and Spider, OLCF’s center-wide Lustre file system. Finegrained routing organizes I/O paths to minimize congestion. Jaguar has since been upgraded to Titan, providing more than a ten-fold improvement in peak performance. To support the center’s increased computational capacity and I/O demand, the Spider file system has been replaced with Spider II. Building on the lessons learned from Spider, an improved method for placing LNET routers was developed and implemented for Spider II. The fine-grained routing scripts and configuration have been updated to provide additional optimizations and better match the system setup. This paper presents a brief history of fine-grained routing at OLCF, an introduction to the architectures of Titan and Spider II, methods for placing routers in Titan, and details about the fine-grained routing configuration.
Total citations
20142015201620172018201920202021124111
Scholar articles
M Ezell, D Dillow, S Oral, F Wang, D Tiwari… - Cray User Group Conference (CUG 2014), 2014