NVIDIA announced NVIDIA CUDA 6, the latest version of its parallel computing platform and programming model. According to the company, it offers new performance enhancements that enable developers to accelerate applications up to eight times by replacing existing CPU-based libraries.
Unified memory simplifies programming by enabling applications to access CPU and GPU memory without the need to manually copy data from one to the other, and makes it easier to add support for GPU acceleration in a wide range of programming languages. Drop-in libraries automatically accelerate applications' BLAS and FFTW calculations by up to eight times by replacing the existing CPU libraries with the GPU-accelerated equivalents.
Re-designed BLAS and FFT GPU libraries automatically scale performance across up to eight GPUs in a single node, delivering more than nine teraflops of performance per node, and supporting larger workloads (up to 512GB). Multi-GPU scaling can also be used with the new BLAS drop-in library.
"By automatically handling data management, unified memory enables us to quickly prototype kernels running on the GPU and reduces code complexity, cutting development time by up to 50 percent," said Rob Hoekstra, manager of the Scalable Algorithms Department at Sandia National Laboratories. "Having this capability will be very useful as we determine future programming model choices and port more sophisticated, larger codes to GPUs."
In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides.
Version 6 of the CUDA Toolkit is expected to be available in early 2014. Members of the CUDA-GPU Computing Registered Developer Program will be notified when it is available for download.
For more information, visit NVIDIA.
Sources: Press materials received from the company and additional information gleaned from the company's website.