Though not released yet, NVIDIA has announced the upcoming CUDA Toolkit 8 set of tools for high performance computing through your NVIDIA GPUs.
The announcement gives a hint of the beefed-up features that are in store with this upcoming release. The current version is CUDA 7.5.
What’s Expected in CUDA Toolkit 8
First off we now know CUDA Toolkit 8 will have support for the NVIDIA Pascal Architecture. This means performance improvements on the Tesla P100. There are now new data migration APIs and improved support for large datasets, concurrent data access, and atomics. Also in store is support for native FP16 computation and faster Deep Learning using optimized cuBLAS.
nvGRAPH library now comes with CUDA Toolkit 8 allowing accelerated graph analytics algorithms. There are now new cuBLAS matrix multiply optimizations on smaller matrices (512 or less).
As far as development goes there is a new critical path analysis feature and over twice as fast NVIDIA CUDA Compiler Driver (NVCC) compilation speed.
We will be discussing more about the CUDA Toolkit 8 after the official release. Meanwhile, you can get the current CUDA 7.5 by going to this location and signing up for the CUDA developer program.