Center Projects

The National Center for Computational Sciences (NCCS) supports the open-source software process and is pleased to be able to make contributions to this effort.

Applications, Software, Tools, & Support

Spider Center-Wide File System

Introduction

With the increasing computing capabilities and multiple platforms of the NCCS, the clear need for a centralized and unified file system, available from all platforms, has emerged. The Spider project was initiated in late 2005 to investigate this centerwide centralized file-system approach.

Scope

Early on, Lustre was selected as the file system for the Spider project. Becaue Lustre was already being used on the Jaguar system, it was a natural choice. Expansions and upgrades to the Spider project are already planned to satisfy the increasing needs for bandwidth and capacity driven by the NCCS road map.

Current Status

The Spider center-wide file system is now deployed. It is the operational work file system on the XT5 partition of Jaguar, Lens, the Smoky development cluster, and dedicated GridFTP servers. It is the largest-scale Lustre file system in the world, with over 26,000 clients, and it is the fastest Lustre file system in the world, with a demonstrated bandwidth of 240 GB/s.

Future Plans

Work continues to improve the metadata performance, resiliency, and performance stability of Spider. In addition to increasing aggregate bandwidth, we have plans to adapt the Lustre configuration for changing workloads, including data analytics and visualization.

IOTA

Input/output (I/O) tuning and analysis tool for profiling applications. This work is funded through the National Leadership Computing Facility of the Department of Energy.

Lustre User Toolkit

The Lustre User Toolkit consists of two areas. The first area is application programming interfaces. To this end, we are pleased to release libLUT, an attempt to provide a simplified interface to critical application needs for communication with the Lustre filesystem. The second area is to provide utility applications that return capability functionality to the user. The first offering in this area spdcp, may be used in batch jobs or used to stage batch jobs from an interactive session to employ the compute capability of the cluster to effect copy of large datasets. The spdcp utility can effectively exploit multiple levels of parallelism in datasets to achieve the copy in much less wall clock time than if the Linux cp utility function is used. This work is funded through the National Leadership Computing Facility of the Department of Energy.

People

Galen Shipman, Group Leader (gshipman@ornl.gov)
Sarp Oral, Testing and Evaluation (oralhs@ornl.gov)
David Dillow, Testing and Evaluation (dillowda@ornl.gov)

Last modified on August 24th, 2009 at 4:46 pm