Authors
Sarp Oral, James Simmons, Jason Hill, Dustin Leverman, Feiyi Wang, Matt Ezell, Ross Miller, Douglas Fuller, Raghul Gunasekaran, Youngjae Kim, Saurabh Gupta, Devesh Tiwari Sudharshan S Vazhkudai, James H Rogers, David Dillow, Galen M Shipman, Arthur S Bland
Publication date
2014/11/16
Conference
SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
Pages
217-228
Publisher
IEEE
Description
The Oak Ridge Leadership Computing Facility (OLCF) has deployed multiple large-scale parallel file systems (PFS) to support its operations. During this process, OLCF acquired significant expertise in large-scale storage system design, file system software development, technology evaluation, benchmarking, procurement, deployment, and operational practices. Based on the lessons learned from each new PFS deployment, OLCF improved its operating procedures, and strategies. This paper provides an account of our experience and lessons learned in acquiring, deploying, and operating large-scale parallel file systems. We believe that these lessons will be useful to the wider HPC community.
Total citations
20142015201620172018201920202021202220232024146154691335
Scholar articles
S Oral, J Simmons, J Hill, D Leverman, F Wang, M Ezell… - SC'14: Proceedings of the International Conference for …, 2014