Updated: 11/16/2019 by Computer Hope
NVIDIA CUDA logo and GPU architecture

CUDA is an architecture for GPUs developed by NVIDIA that was introduced on June 23, 2007. The name "CUDA" was originally an acronym for "Compute Unified Device Architecture," but the acronym has since been discontinued from official use.

CUDA improves the performance of computing tasks which benefit from parallel processing. These workloads, such as rendering 3D images in real-time, are often called "embarrassingly parallel" because they naturally lend themselves to being computed by individual cores. CUDA GPUs feature many of these CUDA cores, which may number in the thousands, integrated onto a single video card. Software must be written specifically for the architecture using low-level CUDA libraries and APIs, provided by NVIDIA. The native programming language of these libraries is C++, but wrappers are written for other languages, enabling CUDA processing in a wide array of applications.

Although first designed to perform graphics-specific tasks, in 2012, the CUDA architecture transitioned to handling more general types of computation, such as mining cryptocurrency blockchains.

CUDA low-level APIs

Low-level APIs for performing specific tasks on the CUDA architecture include:

API Description
cuBLAS Basic linear algebra subroutines, accelerated for image analysis and machine learning.
cudaRT The runtime API, providing simplified management of initialization, threading contexts, and modules for CUDA applications.
cuFFT Fast Fourier Transforms, applicable to a wide range of scientific disciplines, accelerated to run up to 10x as fast as on a CPU.
cuRANDM Pseudorandom number generation in bulk quantities.
cuSOLVER Accelerated "direct solvers," efficient algorithms for solving certain linear algebra applications.
cuSPARSE Subroutines for working with sparse matrices, which contain many zero-value elements. Accelerated to operate up to 5x faster than CPU implementations.
NPP NVIDIA Performance Primitives library for processing images, video, and other digital signals, up to 30x as fast as on a CPU.
nvGRAPH Accelerated implementations of graph analytics algorithms, including Google's famous PageRank algorithm.
NVML NVIDIA Management Library, enabling supervision and administration of multiple GPUs performing CUDA tasks.
NVRTC Runtime compilation library, which converts strings of C++ code into CUDA code in real-time.
PhysX A scalable physics engine, supporting devices ranging from smartphones to high-end workstations. Integrated with existing third-party game engines such as Unreal Engine, Unity3D, and Stingray.

Languages with CUDA wrappers

Programming languages (other than C++) that can create software for CUDA GPUs include:

Examples of CUDA GPUs

The following are examples of a range of NVIDIA GPUs, compared by number of CUDA cores, maximum frequency in MHz, memory in GB, and MSRP when released.

GPU name CUDA cores Max Frequency (MHz) Memory (GB) MSRP
GeForce GTX TITAN Z 5760 876 12 $1420
NVIDIA TITAN Xp 3840 1582 12 $1200
GeForce GTX 1080 2560 1733 8 $499
GeForce GTX 980 2048 1216 4 $550
GeForce GTX 960 1024 1178 2 $230
GeForce GTX 750 512 1085 1 $120
GeForce GT 430 96 700 1 $60

3D, Graphic, Hardware terms, Library, Video card, Workstation