Home / Engineering Computing / Server Clusters: Flexible Performance

Server Clusters: Flexible Performance

By Peter Varhol

Server Clusters: Flexible Performance
Intel Ready Clusters such as the SGI Altix enable engineering teams to configure cluster servers, interconnects, and software based on a common specification.

Engineers have long faced a tradeoff in their efforts. The better and more comprehensive analysis and simulation they do on designs, the better the resulting product. Yet to get the most comprehensive and detailed results, it has been necessary to turn to mainframes and supercomputers, which are the only individual systems powerful enough to turn around high-fidelity analyses in a reasonable period of time.

The cost of the most powerful computers is enormous, in the tens of millions of dollars. Even renting time on a shared supercomputer can run a hundred thousand dollars or more. In the vast majority of design initiatives, engineers make do with less detailed analysis work.

Today, the server cluster is rapidly emerging to fill the gap between the detail that can be achieved on an individual workstation and the supercomputer. Clusters usually consist of industry-standard server systems with lots of memory and processor cores, linked together with a high-performance interconnect like InfiniBand.

Clusters are a classic case of the whole being more than the sum of its parts. High-end analysis and simulation software, such as the ANSYS product line, enables problems to be broken apart and run not only on multiple processor cores, but also on multiple processors, even across different physical servers.

The result is that properly designed and configured clusters can cost a fraction of the price of a supercomputer, and perform engineering computations almost as fast. Further, because they use standard hardware and operating systems, corporate IT can use them for general-purpose computing tasks when not running engineering jobs.

Cores Make the Cluster
A good starting place to talk about server clusters is the list of the 500 fastest supercomputers, found at top500.org. The vast majority of the systems listed there are in fact clusters, not traditional supercomputers. In many cases, they are custom-built and more expensive than off-the-shelf systems, but they demonstrate the power of inexpensive industry standard processors.

A key differentiator that reduces the cost of servers today is multiple processors and processor cores. Individual servers have had multiple processors for a number of years, but most general-purpose applications can’t effectively take advantage of more than a single processor at a time.

Cores are more or less full processors, though in a single processor package. Each core has only one execution pipeline, a sequence of steps through which an instruction or set of instructions is executed. That means each core can take a piece of an engineering computation, and execute the computation. Common analyses and simulation problems often involve running the same computation many times on different data, making these problems readily adaptable to executing on multiprocessor and multi-core systems.

Server Clusters: Flexible Performance
Fast connectivity between servers, essential for cluster performance, is often provided by  InfiniBand components, such as this switch from Mellanox.

Hyperthreading adds still more to the equation. A hyperthreaded core has multiple parts of the execution pipeline, though not all of it. This allows the processor to appear as two processors to the operating system, enabling the operating system to schedule two threads or processes simultaneously.

The result is that each core can hold multiple thread states at one time. It can’t be actively executing multiple threads, because there’s still only one pipeline, but it can be holding threads that are waiting partway through the execution process. Processors could always do this, as a part of a context switch, but the additional registers means it can hold the entire thread state on-chip while another thread is using the pipeline.

Cluster configurations with high-end server processors are available from the vast majority of system vendors, including those who use industry standard Intel and AMD processors. In addition, competitive clusters with other common processor families can be found from IBM and Oracle.

Interconnect Speed Matters
They key to running engineering computations across servers is the ability to have extremely fast connectivity between those servers. That speed makes it possible for processors on different computers to be able to synchronize the execution of an analysis problem without significantly slowing down its completion. The net effect is that clusters can achieve close to the full benefit of all of the processors and cores without the additional expense of putting them in a single custom computer.

The bad news is that most common networks don’t have anywhere near the speed to make this happen, even over short distances. Ultimately, the faster its connectivity, and especially the lower its latency, the better the cluster will perform.

The primary technologies used for cluster interconnects are InfiniBand and Fiber Channel, with a smattering of other choices such as gigabit Ethernet. Of these, InfiniBand is the most common, primarily because it offers among the lowest latency, or response time, of any interconnect. Many of the fastest clusters, such as those on the Top 500 Supercomputer list, use InfiniBand components from the likes of Mellanox or QLogic. Having fast interconnectivity can make a big difference in the performance of analysis software, and the overall value of the cluster.

Software Provides the Edge
None of this hardware would matter without engineering software that can seamlessly break up an engineering problem into multiple parts that different processors and processor cores can execute. While writing applications that can run on multiple processors and cores is notoriously difficult, the performance advantages of doing so can more than justify the investment in more powerful systems and clusters with multiple processor cores.

While much of the high-end analysis software, such as that from ANSYS and The MathWorks has been written to run in parallel on these systems, there are still many applications that don’t have that ability. In addition, the completeness of that effort varies from vendor to vendor, anywhere from a few key algorithms parallelized to the entire application.

This is why benchmarking is important. In general, the better the parallelization, the faster the software on clustered systems, so engineers should check the performance on their own cluster, or a similar cluster. While competing software packages may note that they enable parallel execution, one may be significantly better for your work than another. Get published data, run your own tests, or ask other engineers.

Many engineering groups have also developed their own analysis software in-house, customized for their type of projects. If the expertise exists in-house to parallelize that code, it should also benefit from running on a cluster. Even changing a few key algorithms may dramatically decrease execution time, depending on the problem, so even those with custom software can take advantage of clusters.

Pre-Configured Clusters Ease the Transition
Cluster computing was given a boost with Intel Cluster Ready, a specification from the chip vendor that enables system buyers to be assured that cluster components, including servers and software, can reliably work together. Platform providers and system integrators use the Intel Cluster Ready architecture and specification to develop interoperable clusters that are straightforward for engineering groups and their IT shops to deploy and manage.

Because many high-end servers use Intel Xeon processors, the chip maker has an interest in ensuring that vendor hardware scales up well, and works with both components and software. Developed with hardware and software vendors, Intel Cluster Ready lets engineering groups match HPC applications to today’s leading platforms and components. This includes servers from Appro, Dell, SGI, or Super Micro, as well as InfiniBand interfaces and engineering analysis software.

If your engineering team needs to take advantage of the power of a cluster to deliver relatively low-cost HPC, there are a wide variety of cluster systems to choose from, many running Intel processors, but also IBM servers using POWER processors, and Oracle servers using Sun SPARC processors. IBM and Oracle clusters often have better performance characteristics, but sometimes have less software support.

Lastly, choosing the right application software is critical. You may already have a preferred vendor for engineering analysis and simulation, or you may be looking for solutions that can take advantage of a cluster. In either case, make comparisons of benchmark results. Use published benchmarks if they are available, and if possible run some of your own analyses.

If you need more than just a workstation for the best analysis and simulation, a server cluster can make a lot of sense. Because they are typically standard servers, an IT department can help configure and manage them if needed. Ultimately, server clusters for fast and high-fidelity engineering analyses can pay for themselves with better product designs more quickly, without the added expense of supercomputing services.

More Info:
Intel

AMD

IBM

Oracale

Mellanox

Qlogic

ANSYS

MathWorks

Appro

Dell

SGI

Super Micro


Contributing Editor Peter Varhol covers the HPC and IT beat for DE. His expertise is software development, math systems, and systems management. You can reach him at DE-Editors@deskeng.com.

About Peter Varhol

Contributing Editor Peter Varhol covers the HPC and IT beat for Desktop Engineering. His expertise is software development, math systems, and systems management. You can reach him at DE-Editors@deskeng.com.