NVIDIA’s Tesla GPU processor architecture has the capability to quickly execute computations that are important in engineering analysis and simulation.
Graphics processing units (GPUs) have, for many years, powered the display of images and motion on computer displays. GPUs are now powerful enough to do more than just move images across the screen. They are capable of performing high-end computations that are the staple of many engineering activities.
Benchmarks that focus on floating point arithmetic, those most often used in these engineering computations, show that GPUs can perform such computations much faster than the traditional central processing units (CPUs) used in today’s workstations—sometimes as much as 20 times faster, depending on the computation.
But the performance advantage in these benchmarks doesn’t automatically make it a slam dunk for running engineering applications. Comparing CPUs with GPUs is like comparing apples with oranges.
GPU Challenges—and Rewards
The GPU remains a specialized processor, and its performance in graphics computation belies a host of difficulties to perform true general-purpose computing. The processors themselves require recompiling any software; they have rudimentary programming tools, as well as limits in programming languages and features.
These difficulties mean applications are limited to those that commercial software vendors develop and make available to engineering customers, or in some cases, where source code is owned by the engineering firm and ported to the GPU. Vendors have to perceive that a market for a GPU version of their software exists, while engineering groups have to determine that it will pay for them to make the investment in hardware, software and expertise.
GPU Systems Available Today
Commercial GPU-based systems are becoming increasingly common. NVIDIA, in addition to providing processors to third parties, also builds its own systems and clusters under the Tesla brand. These include the Tesla Personal Supercomputer, which has up to 448 cores in a multiprocessor configuration, with up to 6GB of memory per processor, in a deskside configuration for under $10,000. The cluster systems include either straight GPU or GPU-CPU systems in 1U configurations for the data center. A 1U NVIDIA unit with a quad processor configuration can do four teraflops of single precision operations, and about 340 gigaflops of double precision.
In addition, third-party systems are available from engineering system vendors such as Appro, Microway, Supermicro and Tyan. These systems typically provide multiple processors and cores, and deliver high levels of computational power for specific uses.
That concept is a long way from the industry standard Intel and AMD CPUs, which are used to power the majority of workstations (and even high-end supercomputers). Changing that would be an expensive and time-consuming affair for software vendors.
Nevertheless, the cost and performance of GPUs can make a difference in how design engineering is done. Imagine being able to run an analysis on your design 20 times faster than you can today, for example.
Benchmarks Are Not Real Life
But it’s not a simple matter. First of all, “20 times faster” is highly problematic: Just because some computations can be speeded up by that much doesn’t mean that the entire analysis would be. In fact, the overall analysis could even be slower than using a CPU, if the CPU can compute other parts of the analysis faster.
Second, it would be a significant software development effort to run even fairly common code on a GPU. Some types of code may require modification, while other types may not be able to run on the GPU at all. Many engineering software vendors aren’t yet convinced that the effort can pay for itself and make a profit.
So it turns out that you still need the traditional CPU after all. You need it because that is where the vast majority of engineering and office software runs, where the primary software development skill set resides, and whose all-around performance is at least good enough to remain in that role for the foreseeable future.
Intel hasn’t been sitting still as GPUs have increased performance. Up until the beginning of this year, the company had been working on its own multi-core processor, codenamed Larrabee. While it ultimately canceled the initial release of a Larrabee processor, the technology still exists, and will likely find its way into either an Intel-designed GPU or a hybrid CPU.
Such technology may ultimately provide the best of both worlds: compatible performance on most applications, and high performance on engineering computations.
A Future with Both Processors
To their credit, NVIDIA and AMD are expanding both the sophistication of their processors and the software development tools for developing, porting, and debugging GPU code. NVIDIA has an intriguing software tool called Nexus that should go a long way toward helping software developers to trace and debug application code from the CPU running on Windows into the GPU, including parallel applications on the GPU, and back to the CPU. These enhancements mean it will be easier to get existing software running on GPUs, although it will still require a software development effort.
A ‘Jacket’ that Fits
The lack of engineering applications that run on the GPU is a problem that isn’t going away soon. Still, there may be an easier way of getting code to run on GPUs.
A startup company called Accelereyes is working to ease the burden for moving code over to GPUs using a product called Jacket. It has started doing so with MATLAB, the special-purpose language from The Math Works used by scientists and engineers.
Here’s how it works: Engineers examine their code, and tag data structures that might execute more quickly on a GPU. Jacket takes those tags and automatically compiles those data structures into GPU-executable code. When data and functions use those data structures, it compiles the functions to GPU code, and fetches the data into GPU memory space. When the computation is complete, the data is returned to the CPU space.
Because most engineering groups own their own MATLAB source code, this can be a relatively straightforward approach to using GPUs.
No discussion of GPU computing is complete without mention of NVIDIA’s Compute Unified Device Architecture (CUDA) parallel computing architecture. CUDA is a key to getting high performance out of certain computations that are important in engineering analysis and simulation.
Many systems using GPUs and CUDA have a single industry-standard processor, usually running Windows or Linux. An application written for a GPU typically has a front end running on one of these operating systems. When a computation is required, the relevant data is passed off to executable code loaded onto the GPUs. When execution is complete, the results are returned to the CPU and displayed.
However, none of the advances and excitement surrounding GPUs mean that industry standard CPU systems are slacking off in running engineering applications. Intel and AMD processors are used in about 80% of the Top 500 supercomputers, and Xeon processors with four cores and three threads per core were released this year.
In addition, if Intel incorporates Larrabee features into future processors, it could minimize the performance advantage of GPUs for engineering applications.
With or without Larrabee, industry-standard CPUs continue to advance. Moreover, the majority of commercial software development targets these processor families.
Because of these caveats, it’s unlikely that any engineering team is going to be able to work strictly on GPUs anytime in the foreseeable future. However, that doesn’t mean that systems using GPUs can’t be useful for more than rendering images on the display. The performance in certain computations involving analysis and simulation can make a difference between one design that is good enough and another that is optimum.
An ideal configuration is one with one or more CPUs and a set of GPUs that use CUDA or similar parallel computation architecture. All support applications, such as email, web browsing, and word processing use the CPU. And with tools such as Accelereyes Jacket (see “A ‘Jacket’ That Fits,”) and NVIDIA Nexus, engineering software will eventually take advantage of both to speed up complex computations.
Contributing Editor Peter Varhol covers the HPC and IT beat for DE. His expertise is software development, math systems, and systems management. You can reach him at DE-Editors@deskeng.com.