CUDA

Updated: 09/12/2023 by Computer Hope

CUDA is an architecture for GPUs developed by NVIDIA that was introduced on June 23, 2007. The name "CUDA" was originally an acronym for "Compute Unified Device Architecture," but the acronym has since been discontinued from official use.

CUDA improves the performance of computing tasks which benefit from parallel processing. These workloads, such as rendering 3D images in real-time, are often called "embarrassingly parallel" because they naturally lend themselves to being computed by individual cores. CUDA GPUs feature many of these CUDA cores, which may number in the thousands, integrated onto a single video card. Software must be written specifically for the architecture using low-level CUDA libraries and APIs (application programming interface), provided by NVIDIA. The native programming language of these libraries is C++, but wrappers are written for other languages, enabling CUDA processing in a wide array of applications.

Although first performed graphics-specific tasks, in 2012, the CUDA architecture transitioned to handling more general types of computation, such as mining cryptocurrency blockchains.

CUDA low-level APIs

Low-level APIs for performing specific tasks on the CUDA architecture include:

API	Description
cuBLAS	Basic linear algebra subroutines, accelerated for image analysis and machine learning.
cudaRT	The runtime API, providing simplified management of initialization, threading contexts, and modules for CUDA applications.
cuFFT	Fast Fourier Transforms, applicable to multiple scientific disciplines, accelerated to run up to 10x as fast as on a CPU (central processing unit).
cuRANDM	Pseudorandom number generation in bulk quantities.
cuSOLVER	Accelerated "direct solvers," efficient algorithms for solving certain linear algebra applications.
cuSPARSE	Subroutines for working with sparse matrices, which contain many zero-value elements. Accelerated to operate up to 5x faster than CPU implementations.
NPP	NVIDIA Performance Primitives library for processing images, video, and other digital signals, up to 30x as fast as on a CPU.
nvGRAPH	Accelerated implementations of graph analytics algorithms, including Google's famous PageRank algorithm.
NVML	NVIDIA Management Library, enabling supervision and administration of multiple GPUs (graphics processing unit) performing CUDA tasks.
NVRTC	Runtime compilation library, which converts strings of C++ code into CUDA code in real-time.
PhysX	A scalable physics engine, supporting devices ranging from smartphones to high-end workstations. Integrated with existing third-party game engines such as Unreal Engine, Unity3D, and Stingray.

Languages with CUDA wrappers

Programming languages (other than C++) that can create software for CUDA GPUs include:

Haskell
Java
Julia
Lua
Lisp
Mathematica
MATLAB
Perl
Python
R
Ruby

Examples of CUDA GPUs

The following are examples of a range of NVIDIA GPUs, compared by number of CUDA cores, maximum frequency in MHz, memory in GB, and MSRP (Manufacturer's Suggested Retail Price) when released.

GPU name	CUDA cores	Max Frequency (MHz)	Memory (GB)	MSRP
GeForce GTX TITAN Z	5760	876	12	$1420
NVIDIA TITAN Xp	3840	1582	12	$1200
GeForce GTX 1080	2560	1733	8	$499
GeForce GTX 980	2048	1216	4	$550
GeForce GTX 960	1024	1178	2	$230
GeForce GTX 750	512	1085	1	$120
GeForce GT 430	96	700	1	$60

3D, Graphic, Hardware terms, Library, Video card, Workstation