Agrand title, but with the mass adoption of consumers using some sort of mobile computing device -- either the phone or tablet -- the area of high-end compute performance goes unnoticed while the focus is on the numbers of end users and not uses of computing that are pushing the boundaries. There are lots of intensive compute problems in the engineering and simulation world that need to be solved. They require the right hardware, software, and infrastructure to be in place to make that happen.
One area getting a lot of attention in the last few years is GPGPU (general purpose graphics processing unit), which is the use of the graphics card for more than just rendering real-time visuals on the screen.
Modern GPUs are great when you have big data parallel computations that you need to execute. In fact, we have been able to see the performance improvements for parts of engineering software applications that use intensive computation like DGEMM, BLAS subroutines, or linear algebra system solvers. In these tests we learned that since GPUs can run asynchronously with other processors, it is a good introduction to heterogonous computing languages like OpenCL. OpenCL was designed to program heterogonous systems, which contain several specific processors running asynchronously from each other. Having CPUs and GPUs in a system and programming them to reach the best system and software efficiency is where OpenCL is aimed.
There are some limitations on algorithms that are still sequential or serial, for example reductions. When accessing system memory multiple times from the GPU during computation, if your data has a small granularity (i.e. a small block size of a few kilobytes) compared to computation, running multiple small compute tasks can add a high latency cost. The good news is that Heterogeneous System Architecture (HSA) and APUs solve this issue because the CPU and GPU are on the same die and will access the shared unified memory space. This eliminates the need to talk to via the PCI–Express bus, so latency is minimized.
HSA is an open-standard system architecture that provides a unified view of common computing elements. It allows programmers to write applications that seamlessly integrate CPUs, GPUs and other programmable compute elements while benefiting from the best attributes of each. AMD’s APU combines the processor (CPU) and GPU together onto a single piece of silicon.
You can get major gains from using pure GPGPU today as long as the code is optimized for parallel computing. Most of the early gains have been in rendering technology, finite element analysis codes, or ray-tracing technology. With HSA and APUs there are potential gains by changing the code to take advantage of the shared memory, but the core architecture is there to run existing X86 programs today.
This is very important, as you can run your code now and modify only sections of the code when your resources allow you to take advantage of the enhanced memory access. Undertaking any major code changes in modern computer-aided design and engineering software is a complicated process. With HSA and GPUs, you can focus on just the areas of the code where there are the most benefits in the short term.
Dedicated discrete graphics cards come with a fixed amount of memory. Lots of analysis and simulation software programs use the larger memory footprints to compute masses of data at the same time. The current crop of professional AMD FirePro cards, for example, comes with 6GB of GDRR5 memory, which being very fast has a physical limit of 6GB. With HSA and APUs, the shared memory is user configurable. In the near future this would allow very large amounts of physical memory to be accessed by not only the CPU, but also by the compute engines of a GPU!
This could change the face of intensive computing forever. It would allow the GPU to become a complete and dedicated co-processor that can be used anytime, and in many cases, with better performance and energy efficiency. If you combine the easy access to using HSA and APUs by running current x86 architecture code and the massive physical memory potentially available with no loss of time swapping data, the future of computing is looking very interesting.
Antoine Reymond is senior strategic alliances manager at AMD. Comment on this article via firstname.lastname@example.org.