Lawrence Livermore National Laboratory (LLNL) is home to some of the fastest computers in the world (Figure 1). The machines are solving extreme scale problems, using many tens of thousands of processing cores. For example, LLNL recently installed the Dawn supercomputing cluster, which can deliver 500 TFLOP/s of performance. In 2012, LLNL expects to have the Sequoia cluster operational, with a projected performance of over 20 PFLOP/s.
Figure 1. LLNL Clusters
These systems will strengthen the foundations of predictive simulation by running very large suites of complex simulations and then comparing model predictions with experimental data. In addition, the machines will be used for weapons-science calculations, necessary to build more accurate physical models. This work is a cornerstone of the National Nuclear Security Administration’s (NNSA's) Stockpile Stewardship Program to ensure the safety, security, and reliability of the U.S. nuclear weapons stockpile today and into the future without underground testing.
Such large clusters are capable of crunching numbers at a breakneck pace. While attention is often placed on the FLOP/s rate, the amount of data generated by these systems is equally amazing. It is not uncommon to produce multi-terabyte data sets, with tens of billions of zones, thousands of files per time step, and hundreds of time steps.
Figure 2. Graph Cluster on datacenter floor at LLNL
The need to visualize this data is critically important to the program’s success. One of the challenges facing the LLNL visualization team was the design and procurement of a cluster with enough horsepower to manage the large data sets generated by clusters such as Dawn. As stated by Becky Springmeyer, Visualization Group Leader in LLNL’s Advanced Simulation and Computing Program, “To meet our current and future scientific computing needs requires a visualization cluster with enough memory to generate visualizations of simulation runs on our largest compute platforms, as well as sufficient I/O rates for interactive analysis.”
An often forgotten aspect of high-performance computing (HPC), these visualization resources provide an essential and critical part of the HPC environment by providing computational scientists with the ability to interactively explore very large data sets, as well as enabling large-scale post-processing and data-reduction operations.
To visualize the large amounts of data, LLNL requested an Appro Hyper supercomputing cluster, designed to support interactive data analysis on extreme-scale computing systems. The successful Scalable Unit (SU) concept—pioneered by LLNL, Los Alamos National Laboratory, Sandia National Laboratory, and Appro—was part of the solution. Aptly named Graph (Figure 2), the Appro Hyper cluster was deployed at LLNL, using four Scalable Units, with measured performance of 110 TFLOP/s. Similar to previous procurements, the Appro Hyper cluster was based on the successful combination of Appro AMD Opteron–based servers, InfiniBand interconnect, and a shared software stack, developed by the Tri-Labs Project. The ease of procurement and installation leveraged the common hardware and software platform that allowed easy growth of clusters in incremental units while reducing costs.
The compute nodes of the cluster are based on six-core AMD Opteron processors. The cluster includes 564 compute nodes, for a total of 2,256 processors (4 per node). Using six-core AMD Opteron processors (2.0 GHz), Appro was able to deliver a total of 13,536 processing cores. To achieve effective visualization performance, each node was given a full complement of 128 GB of RAM, for a total of 73 TB of memory. InfiniBand was used as the interconnect between nodes. Forty-eight Flextronics InfiniBand (20 Gbps DDR) edge switches and 2 Voltaire (20 Gbps DDR) Spine switches were used for the switch fabric.
The largest machine at LLNL to be supported by the Appro Hyper cluster is Dawn. Delivered to the laboratory in January and February, Dawn (an IBM Blue Gene/P system) will lay the applications’ foundation for multi-petaFLOPS (floating point operations per second) computing on Sequoia. These supercomputers are capable of running very large suites of complex simulations, and the Appro Hyper cluster is capable of supporting complex visualization and analysis tasks on data sets generated by these machines.
|"To meet our current and future scientific computing needs requires a visualization cluster with enough memory to generate visualizations of simulation runs on our largest compute platforms as well as sufficient I/O rates for interactive analysis." |
The interactive visualization and analysis applications, often called post-processing, are different from those run on the typical computational (number crunching) batch cluster. Interactive exploration of large data sets requires a memory-rich environment, with I/O optimized through dedicated file system connections. Post-processing tasks are heavily I/O bound, so specialized visualization servers that optimize I/O rather than CPU speed are better suited for this work. The Appro Hyper cluster has high-speed connections to the Sun Lustre file system to provide users with zero-copy access to their files. Both memory and I/O rates are key enablers of the Appro Hyper cluster because there has to be enough memory to generate visualizations of the largest data sets and sufficient I/O for interactive analysis.
LLNL has procured a new visualization cluster geared specifically to support interactive data analysis and visualization on the extreme scale. Visualization resources support an essential and critical part of the HPC environment, providing computational scientists with the ability to interactively explore very large data sets and perform large-scale post-processing and data-reduction operations.
LLNL needed a cluster with much more memory and high-speed connections to the Lustre file system than existing LLNL visualization clusters. The Appro Hyper cluster was quickly procured and brought into service. Users can now experience a reduction in the time required for multi-day visualization problems and maintain a high level of interactivity in analyzing the data sets generated by the largest LLNL computers.
Lawrence Livermore National Laboratory
Los Alamos National Laboratory
Los Alamos, NM
National Nuclear Security Administration
Sandia National Laboratory
Santa Clara , CA