NVIDIA has announced CUDA Toolkit 11 for general availability which introduces support for the new NVIDIA A100 based on the NVIDIA Ampere architecture, Arm server processors, performance-optimized libraries, and new developer tools with improvements for A100.
This new release comes with new a host of new features and improvements. These include the ability to develop for the NVIDIA Ampere GPU architecture such as the new NVIDIA A100 GPU and multi-GPU systems based on A100 such as DGX A100 and HGX A10.
CUDA Toolkit 11 also supports the new third-generation Tensor Cores to accelerate mixed-precision matrix operations on different data types, including TF32 and Bfloat16.
Multi-Instance GPU virtualization and GPU partitioning capabilities for improved GPU utilization as well as library performance optimizations for linear algebra, FFTs, matrix multiplication, JPEG decoding, and more are now supported.
Other features include programming and API improvements for task graphs, asynchronous data movement, fine-grained synchronization, L2 cache residency control, and enhancements to the Nsight developer tools family for tracing, profiling, debugging, and roofline analysis.
CUDA C++ also gets a boost with compiler performance and usability improvements, new link-time optimization capabilities, support for new host compilers, and language standards including C++17.
Also on the same Parallel C++ STL support using libcu++ and integration of CUB as a CUDA C++ core library in the Toolkit has been added.
You can find out more about NVIDIA CUDA Toolkit 11 from the official CUDA page here.