Almost since its inception, one of the major hallmarks of desktop computing has been the quest for ever-faster, smoother, and more realistic graphics processing and display. Obsolete names such as Hercules and Number Nine are part of the history and folklore of this quest for faster and more realistic graphics.
AMD’s ATI Radeon GPUs supports the company’s Stream architecture, which supports parallel computation for high-performance computing.
Today, fast graphics are primarily the province of NVIDIA (Santa Clara, CA) and ATI as part of AMD (Sunnyvale, CA). Any serious gamer, design engineer, or simulation specialist knows which graphics cards are best for their applications. But a funny thing happened in the drive to build better and faster graphics processors. These same processors became good at other types of processing, including to some extent general-purpose processing—that is, the ability to execute any application built for them.
But where they really excel is in the mathematics of graphics, called floating point processing. This also extends to any computation involving numbers, so any computation-intensive application can benefit. This includes graphics processing and rendering, structural analysis, fluid dynamics, and simulation, as well as fields outside engineering like financial modeling and chemical analysis.
How much of a speedup are we talking about? It varies on the application, but on a parallel graphics processing unit (GPU) system with a highly parallel set of computations, it is possible to see an improvement of a hundred times over that of a single Intel processor. In general-purpose computations, the overall performance increase will be less impressive.
It’s important to remember, however, that GPU systems are inexpensive compared to similar traditional CPU systems. This is perhaps best brought into perspective by the highly parallel Tesla systems introduced more than a year ago. The 960-processor configuration was priced at just under $10,000. The system is rated at 36 TeraFLOPS, making it theoretically possible to solve all but the most computationally intensive problems. It is, in effect, a supercomputer for many types of applications.
There is a catch, of course. The GPU uses a different instruction set than the standard Intel processors. Applications compiled to the Intel instruction set won’t run on any GPU. This means that to take advantage of GPU computing, your software vendors must build their applications to that processor, or alternatively, you need to have your own source code to convert it on your own.
NVIDIA’s Tesla system family provides for multiple GPUs to work with a single industry-standard CPU in a single computer, making it possible to run Windows or Linux while taking advantage of parallel GPU operation.
Commercial GPUs for General-Purpose Computing Probably the best-known GPU family geared toward general-purpose applications is from NVIDIA. For parallel operations, NVIDIA supports the CUDA (compute unified device architecture) parallel-computing architecture. CUDA gives developers access to the native instruction set and memory of the parallel computational elements in CUDA GPUs. Using CUDA, the latest NVIDIA GPUs effectively become open architectures like CPUs. Unlike CPUs however, GPUs have a parallel multiple-core architecture, each core capable of running thousands of threads simultaneously.
Last fall, NVIDIA introduced its new GPU architecture, code-named Fermi, supporting up to 512 CUDA cores featuring the IEEE 754-2008 floating-point standard. Fermi, which is scheduled to be available in production quantities shortly after this is printed, makes it easier to port existing applications, both commercial and custom, as it supports the C++ language natively as well as several other languages. It also takes care of on-chip hardware error checking, something that traditional CPUs have done for a long time.
For developers, NVIDIA also announced a development tool that operates as a plug-in to the ubiquitous Microsoft Visual Studio. This tool, called Nexus, is unique in that it enables developers to trace and debug application code from the CPU running on Windows into the GPU, including parallel applications on the GPU, and back to the CPU. While writing parallel-execution GPU code that interacts with both Windows and the Intel CPU remains a significant technical challenge, Nexus goes a long way toward easing the process.
AMD also supports general-purpose computing on its GPU families, the ATI FireStream and Radeon series. ATI’s technology for using the Radeon for computation is called Stream. ATI Stream technology is a set of advanced hardware and software technologies that enable AMD graphics processors (GPU), working in concert with the computer system’s CPU, to accelerate many applications beyond graphics.
Most GPU systems have a single industry-standard CPU, usually running Windows or Linux. An application written for a GPU typically has a front end running on one of these operating systems. When a computation is required, the relevant data is passed off to executable code loaded onto the GPUs. When that execution has been completed, the results are returned to the CPU and displayed.
Of course, if the display of those results involves rendering highly detailed graphics, that code may also continue through the graphics processor for rendering on the screen. Depending on the architecture of the system, that can be computed on parallel GPUs, or on a separate GPU-powered graphics card.
If you have your own custom analysis code, you would write a small program running on the CPU that would kick off the code on the GPU, and receive its results for display. Mostly these would be text results, unless your programmers have built a complete front end for your code.
Parallel All the Way
Many engineering applications, especially those that do analysis or simulation, spent much of their time performing the same sets of computations on different sets of data. These types of applications can have those parts broken apart and run on separate processors, then combined again at the end to produce the result. This is broadly known as parallel computation, and can be done when parts of an application have no dependencies on one another.
The GPU is especially good at parallel computations. In addition to architectures such as CUDA that support large numbers of processors, individual GPUs can also support large numbers of independently executing threads. This means that computations can be done more efficiently, improving overall performance still more.
There is a catch, of course. Applications can’t break themselves into parts and reassemble those parts automatically; they have to be programmed to do so. And such programming is difficult and not well understood by most software developers. While new techniques are being developed, and developers are acquiring new skills, this will remain the biggest obstacle to taking full advantage of GPUs.
If you have your own source code, ideally it is in the C programming language (many C++ preprocessors convert that language to C before compilation, making C++ also feasible). However, the number of supported languages is expanding. The Portland Group of Portland, OR, recently released Fortran compilers for GPUs, making it possible for engineering groups with their own high-performance analysis code to easily convert it to run on GPUs. And there are some Java bindings available for specific libraries, enabling Java developers to use C interfaces to execute on the NVIDIA processors. Also, recent improvements to the processor architecture make porting existing code easier.
If you are dependent upon commercial software vendors for design, analysis, rendering, and simulation, find out what those vendors’ plans are for supporting GPUs. No doubt many are at least considering it, but they need to hear from their users. Today, at least AutoDesk’s AutoCAD is available for the NVIDIA GPU, as is the iray rendering solution from mental images of Berlin, Germany, and Adobe’s Flash graphics development toolkit.
Get Ready for GPU Computing
You should be looking into GPU computing if your important software vendors offer versions that support it, or if you have source code that you are willing to parallelize and rebuild for GPUs. Without one or both, you can’t run anything on the GPU.
Fortunately, both are becoming easier. If you have your own code, porting it to a GPU is becoming easier. You still have to manually identify opportunities for parallel computation, and insert the appropriate code into your program. This is the difficult part, and it requires skilled programmers.
It’s important to note that general-purpose GPU computing is in its infancy. There’s no guarantee that it will move into the mainstream; many promising computing technologies remain stuck in a niche rather than breaking out. Intel, for example, is attempting to prove that its CPU standard, implemented in the upcoming Larrabee architecture, is sufficient for the types of engineering and scientific applications currently being targeted by GPUs. (Intel is having technical difficulties finalizing Larrabee chips, but is committed to delivering processors with multiple pipelines for graphics and floating point applications.)
But the cost and computation advantages, especially for heavy mathematics or graphical applications, are likely to prove compelling. And with an increasing number of commercial software vendors either supporting the popular GPUs, or announcing future support, it is definitely on the upswing. Many engineering applications can use this computational boost to make you more productive on the desktop.
Contributing Editor Peter Varhol covers the HPC and IT beat for DE. His expertise is software development, math systems, and systems management. You can reach him at DE-Editors@deskeng.com.