Scientific Computing at the Cutting Edge

For breakthrough scientific discovery at the cutting edge of knowledge, scientists need extraordinary computational resources. That’s why many of the most exciting research programs in the world are currently taking place at the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL). A Department of Energy Leadership Computing Facility, the NCCS offers the world’s most powerful nonclassified supercomputer to scientists from government, academia, and industry.

The NCCS earned its Leadership Computing Facility designation from the DOE Office of Science with an aggressive five-year plan to build an unmatched national computing resource. Beginning in 2004, the NCCS deployed a series of machines to provide continually greater computational power and performance, starting at 6 teraflops (TF) and expanding to 18, 25, 50, 100, 250, and finally 1,000 TF.

The most powerful computer system in ORNL’s Leadership Computing Facility is Jaguar (figure 1), a Cray XT4 supercomputer which currently boasts more than 23,000 processors and peak processing speeds of 119 TF. Relentlessly driving along this leadership roadmap, the system will be upgraded to 250 TF later this year.

Breakthrough Science
Today the combination of balanced leadership computing resources available at the NCCS, as well as the large blocks of computing time allocated to research teams, is allowing scientists across the nation to conduct truly cutting-edge research—often addressing problems that previously were unapproachable. Breakthrough research now occurring at the NCCS includes:

Climate modeling. SciDAC-funded climate researchers are developing the new Community Climate System Model (CCSM, figure 2, p38) to incorporate more sophisticated simulations of arctic ice, the surface hydrology of land, and the carbon-nitrogen cycle (”Developing Models for Predictive Climate Science,” SciDAC Review, Spring 2007, p44). Using Jaguar, researchers were able to perform 100-year runs in three days—performance that was inconceivable just a few years ago.

Nuclear fusion. Fusion is becoming critically important as a potential alternative energy source, and researchers at the NCCS are simulating the planned multibillion-dollar ITER fusion reactor. While previous plasma modeling was limited to two dimensions, Jaguar has allowed the research team to create the first fully three-dimensional models and gain new insights into reactor design. The fusion application, All-ORders Spectral Algorithm (AORSA), developed under SciDAC, has achieved 87.5 TF on Jaguar—74% of the system’s theoretical peak.

fig1
Figure 1. Jaguar currently boasts more than 23,000 processors and peak processing speeds of 119 TF, making it the most powerful supercomputer in the world dedicated to open science.

Biomass energy production. Researchers are investigating the 350,000-atom cellulase enzyme’s process of converting cellulose to sugars (figure 3, p39). Jaguar allows the team to simulate the enzyme’s activity on a time scale of 50 to 100 nanoseconds. The work has already revealed that interior “vibrations” in enzymes influence the rate at which they carry out chemical reactions. This breakthrough might ultimately help scientists design new, more efficient enzymes for a wide range of industrial processes, such as developing lower-cost ethanol.

Combustion. Combustion researchers are using Jaguar to gain a more detailed understanding of the chemistry and behavior of flames. Part of SciDAC’s “Terascale High-Fidelity Simulations of Turbulent Combustion with Detailed Chemistry” project (”Energy Science with Digital Combustors,” SciDAC Review, Fall 2006, p42), ongoing work could provide government and industry with the first-ever viable method for realistically simulating conditions inside engines rather than relying solely on physical experiment.

Astrophysics. Researchers are using the power of Jaguar to gain new insight into one of the Universe’s enduring mysteries by modeling the shock waves created when the cores of massive stars collapse in the early stages of a supernova (figure 4, p39; “Modeling the First Instants of a Star’s Death,” SciDAC Review, Spring 2006, p26). At the NCCS, the team has run simulations of this process—which encompasses scales from the size of the Earth’s orbit around the Sun to the interaction of subatomic particles—that are twice as long as any previous simulation, and identified startling new conclusions about how the initial shock wave is revived in this stellar event.

Industry advances. General Motors (GM) re-searchers are studying ways to convert waste heat from automobile exhaust (which consumes 60% of the energy generated by the engine) into electricity. The GM team at the NCCS was able to perform the largest-ever simulation of a 1,000-plus-atom supercell—a simulation that would not have been possible at any other nonclassified facility.

fig2
Figure 2. Simulated time evolution of the atmospheric carbon dioxide concentration originating from the land’s surface. Climate scientists are using the resources of the NCCS for the prediction of global and regional climate.

Building a Leadership-Class Supercomputer
All of these advances—and many still to come—are fueled by the unprecedented computing resources available at the NCCS. Chief among these resources is Jaguar, the NCCS’ flagship supercomputer, which today is one of just three supercomputers in the world to surpass 100 TF. Jaguar exemplifies the NCCS’ core philosophy of not only providing scientists with extraordinary computational capabilities, but of continually expanding those capabilities to facilitate the next generation of scientific discovery.

Jaguar began when the NCCS partnered with Cray to build a new supercomputer based on its XT3 platform. The center chose the Cray XT3 because the system offered exceptional balance and could perform well on a wide range of applications—allowing it to support a rich diversity of scientific disciplines.

Jaguar arrived at the NCCS in 2005 as a 56-cabinet Cray XT3 system that was capable of achieving peak performance of 25 TF. In July of that year, the center more than doubled that capacity to 54 TF by updating all 1,303 of the system’s compute nodes from single-core OpteronTM processors to dual-core processors and doubled the memory on each node. The AMD OpteronTM processors provide exceptionally high-memory bandwidth and low-memory latency, and the architecture uses the HyperTransportTM interconnect to offer a high-speed, low-latency link among the processors.

fig3
Figure 3. Cellulase enzyme in action on the cellulose surface. The cellulase-enzyme-based conversion of cellulose present in biomass has shown the potential of lowering the cost of ethanol production. The system was built using Assisted Model Building and Energy Refinement (AMBER) force-field programs, and the simulation runs were performed using the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) on 1,024 processor nodes on Jaguar.

In November of 2006, the NCCS added a 68-cabinet Cray XT4 system with a peak performance of 65 TF. Ultimately, the Cray XT3 system was combined with the Cray XT4 to create today’s Jaguar—a massive 11,706-node supercomputer that delivers peak processing speeds of 119 TF with 46 TB of aggregate memory. The system allows users to scale applications up to more than 23,000 processors by maintaining exceptional system balance among the processors, memory, interconnect, and input/output (I/O) system.

Supporting Systems and Resources
High-performance systems alone are not enough to facilitate groundbreaking science. So the NCCS also provides a suite of supporting resources and infrastructure that allows scientists to make the best use of the systems.

Complementary to Jaguar, the NCCS also offers a Cray X1E with custom-designed vector processors for high performance of scientific codes. This machine, named Phoenix, has a peak performance of 18.5 TF and is widely used in domains as diverse as climate research, fusion simulation, and modeling of the aerodynamics of plane wings.

The NCCS’ massively parallel, high-performance storage system (HPSS, figure 5, p40) data archive allows users to store vast amounts of long-term data and move those data back and forth very quickly between computers. The center is also deploying Spider, a Lustre-based high-performance file system. When fully deployed, the center-wide system will allow users to store data sets from their simulations and immediately perform data analysis and visualization using other NCCS computer resources, without having to explicitly move the data.

To link NCCS resources with users across the United States, the center is connected to every major research network in the country at backbone rate. Ten-gigabit-per-second connections to DOE’s Energy Sciences Network and the Internet2 academic network allow users to send and receive up to 20 billion bits of information per second. Researchers also have high-speed access to the National Science Foundation’s TeraGrid and CHEETAH networks, and DOE’s UltraScienceNet, among others.

The NCCS also offers state-of-the-art visualization capabilities. Software is deployed to support visualization on both the user’s local systems and the Exploratory Visualization Environment for Research in Science and Technology (EVEREST, figures 6, p40; figure 7, p41)—a 30-foot-wide, 35-million-pixel display wall for large-scale, immersive data exploration and analysis.

A Collaborative Research Environment
In addition to its extraordinary technical capacity, the NCCS provides a research environment that is uniquely oriented toward facilitating scientific breakthroughs—even when scientists conducting the research are remote users. The NCCS supports just a few dozen projects at a time. The Center strives to act as a true collaborator and contributor to the science teams using its resources, and the staff members believe they have a vested interest in the outcome of their users’ work.

The NCCS is committed to continually upgrading the computational capabilities of its systems. However, while these upgrades make increasingly powerful computational tools available to scientists, the NCCS recognizes that research teams must adapt their codes to use new resources to their full potential. To help accelerate this process, the Scientific Computing Group provides a liaison to each project running at the facility. These liaisons are experts in their scientific disciplines as well as expert programmers for massively parallel machines. The liaisons work as part of the science teams to help ensure that their codes make the best use of new computing capabilities.

fig4
Figure 4. Entropy in a three-dimensional simulation of the instability of the accretion in a core-collapse supernova.

Working at NCCS
What makes the NCCS a leading scientific research facility is not just the massive computational power, but also the fact that research groups are allocated large blocks of time to use it. In 2006 DOE awarded research teams 75 million hours of processor time on Jaguar and Phoenix.

Time is allocated through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) process, first pioneered by Dr. Raymond L. Orbach at DOE. Through INCITE, users apply for time and undergo a scientific and technical review—both to evaluate the potential of the science and assess their readiness to make good use of the machine. Proposals are chosen strictly on those bases, and anyone—from DOE, other government agencies, academia, or industry—can compete as equals.

fig5
Figure 5. The NCCS massively parallel high-performance storage system (HPSS) data archive allows users to store vast amounts of long-term data.

fig6
Figure 6. The NCCS Exploratory Visualization Environment for Research in Science and Technology (EVEREST) brings breakthrough research to life with a 30-foot-wide display wall for immersive data exploration and analysis.

The Next Generation of NCCS Systems
While Jaguar already delivers extraordinary computational power, the NCCS is continually working to improve the resources available to scientists. The next major upgrade to Jaguar will occur in December 2007, when the Center will replace all of the system’s dual-core OpteronTM processors with quad-core processors and double the system memory again. This upgrade will increase Jaguar’s peak performance to 250 TF.

These new capabilities will be unprecedented, but the NCCS will continue pushing the boundaries even farther. The center will be deploying a new supercomputer in 2009 that will break the petaflops barrier—delivering peak performance of 1,000 TF, or one quadrillion floating point operations per second.

While many NCCS users are already energized by thoughts of the new insights they will be able to achieve with the petaflops system, they are equally excited about the fact that they won’t need to rewrite their applications to use it. The new machine will provide a programming model that is a logical continuation of the one currently used on Jaguar. By providing this programming continuity over the life of its systems, the NCCS is helping to ensure that developers can take advantage of continuously evolving computational capacity as quickly and easily as possible.

fig7
Figure 7. NCCS offers researchers state-of-the-art visualization capabilities including EVEREST, a 35-million-pixel display wall.

An Ongoing Commitment to Scientific Discovery
Unparalleled computational scalability, extensive research allocations, extraordinary supporting resources, continuous enhancements in computing capacity—the NCCS has not developed these capabilities through random circumstance. Rather, they all stem from the NCCS’ driving mission: to collaborate with scientists and help them to conduct truly groundbreaking work. It is this commitment to collaboration that is allowing NCCS science teams to advance the boundaries of human knowledge in many areas of research today, and that will enable the scientific breakthroughs of tomorrow.

Contributors: This article was submitted by Buddy Bland, director for the Leadership Computing Facility project at ORNL.