CUDA

Updated: 06/30/2019 by Computer Hope
CUDA

CUDA is an architecture for GPUs developed by NVIDIA that was introduced on June 23, 2007. The name "CUDA" was originally an acronym for "Compute Unified Device Architecture," but the acronym has since been discontinued from official use.

CUDA improves the performance of computing tasks which benefit from parallel processing. These workloads, such as rendering 3D images in real-time, are often called "embarrassingly parallel" because they naturally lend themselves to being computed by individual cores. CUDA GPUs feature many of these CUDA cores, which may number in the thousands, integrated onto a single video card. Software must be written specifically for the architecture using low-level CUDA libraries and APIs, provided by NVIDIA. The native programming language of these libraries is C++, but wrappers have been written for other languages, enabling CUDA processing in a wide array of applications.

Although originally designed to perform graphics-specific tasks, in 2012, the CUDA architecture transitioned to handling more general types of computation, such as mining cryptocurrency blockchains.

CUDA low-level APIs

Low-level APIs for performing specific tasks on the CUDA architecture include:

API Description More info
cuBLAS Basic linear algebra subroutines, accelerated for image analysis and machine learning. https://developer.nvidia.com/cublas
cudaRT The runtime API, providing simplified management of initialization, threading contexts, and modules for CUDA applications. https://docs.nvidia.com/cuda/cuda-runtime-api/
cuFFT Fast Fourier Transforms, applicable to a wide range of scientific disciplines, accelerated to run up to 10x as fast as on a CPU. https://developer.nvidia.com/cufft
cuRAND Pseudo-random number generation in bulk quantities. https://developer.nvidia.com/curand
cuSOLVER Accelerated "direct solvers," efficient algorithms for solving certain linear algebra applications. https://developer.nvidia.com/cusolver
cuSPARSE Subroutines for working with sparse matrices, which contain many zero-value elements. Accelerated to operate up to 5x faster than CPU implementations. https://developer.nvidia.com/cusparse
NPP NVIDIA Performance Primitives library for processing images, video, and other digital signals, up to 30x as fast as on a CPU. https://developer.nvidia.com/npp
nvGRAPH Accelerated implementations of graph analytics algorithms, including Google's famous PageRank algorithm. https://developer.nvidia.com/nvgraph
NVML NVIDIA Management Library, enabling supervision and administration of multiple GPUs performing CUDA tasks. https://developer.nvidia.com/nvidia-management-library-nvml
NVRTC Runtime compilation library, which converts strings of C++ code into CUDA code in real-time. https://docs.nvidia.com/cuda/nvrtc/index.html
PhysX A scalable physics engine, supporting devices ranging from smartphones to high-end workstations. Integrated with existing third-party game engines such as Unreal Engine, Unity3D, and Stingray. https://developer.nvidia.com/gameworks-physx-overview

Languages with CUDA wrappers

Programming languages (other than C++) that can create software for CUDA GPUs include:

Examples of CUDA GPUs

The following are examples of a range of NVIDIA GPUs, compared by number of CUDA cores, maximum frequency in MHz, memory in GB, and MSRP of each when first released.

GPU name CUDA cores Max Frequency (MHz) Memory (GB) MSRP
GeForce GTX TITAN Z 5760 876 12 $1420
NVIDIA TITAN Xp 3840 1582 12 $1200
GeForce GTX 1080 2560 1733 8 $499
GeForce GTX 980 2048 1216 4 $550
GeForce GTX 960 1024 1178 2 $230
GeForce GTX 750 512 1085 1 $120
GeForce GT 430 96 700 1 $60

3D, Graphic, Hardware terms, Library, Video card, Workstation