GPGPU

This article or section is a candidate for moving to General-purpose computing on graphics processing units.

Notes: Like with ALSA, follow Help:Style#Title. (Discuss in Talk:GPGPU)

GPGPU stands for General-purpose computing on graphics processing units. Some GPGPU API definitions include the vendor-independent OpenCL, SYCL, HIP, OpenMP, and Vulkan compute shader; and Nvidia's CUDA. Each API can have multiple implementations on multiple types of hardware or software: GPUs, CPUs, NPUs, FPGAs, or just a different GPGPU API (shim/transpiler).

OpenCL

OpenCL (Open Computing Language) is an open, royalty-free parallel programming specification developed by the Khronos Group, a non-profit consortium.

The OpenCL specifications describe a C-based programming language, a general environment that is required to be present, and a C API to enable programmers to call into this environment. There are also bindings that let other languages to call into this environment as well as alternative languages for writing the OpenCL computation kernel.

Tip The clinfo utility can be used to list OpenCL platforms, devices present and ICD loader properties.

An OpenCL environment includes at least one of the following:

A copy of the libOpenCL.so, the ICD loader presenting a full OpenCL API interface.
Platform-dependent drivers that are loaded by the ICD loader.

ICD loader (libOpenCL.so)

It is very common for a system to have multiple OpenCL-capable hardware and hence multiple OpenCL runtime implementations. A component called the "OpenCL ICD loader" is supposed to be a platform-agnostic library that provides the means to load device-specific drivers through the OpenCL API. Most OpenCL vendors provide their own implementation of an OpenCL ICD loader, and these should all work with the other vendors' OpenCL implementations. Unfortunately, most vendors do not provide completely up-to-date ICD loaders, and therefore Arch Linux has decided to provide this library from a separate project (ocl-icd) which provides a functioning implementation of the current OpenCL API. As of 2025, the current OpenCL version is 3.0.

The other ICD loader libraries are installed as part of each vendor's SDK. If you want to ensure the ICD loader from the ocl-icd package is used, you can create a file in /etc/ld.so.conf.d which adds /usr/lib to the dynamic program loader's search directories:

/etc/ld.so.conf.d/00-usrlib.conf

/usr/lib

This is necessary because all the SDKs add their runtime's lib directories to the search path through ld.so.conf.d files.

The available packages containing various OpenCL ICDs are:

ocl-icd: recommended, most up-to-date
intel-opencl^AUR by Intel. Provides OpenCL 2.0, deprecated in favour of intel-compute-runtime.

Note An ICD Loader's vendor is functionally irrelevant: they should be vendor-agnostic and may be used interchangeably as long as they are implemented correctly. The vendor is only useful for the purpose to identify them from each other.

Managing implementations

To see which OpenCL implementations are currently active on your system, use the following command:

$ ls /etc/OpenCL/vendors

To find out all possible (known) properties of the OpenCL platform and devices available on the system, install clinfo.

The factual accuracy of this article or section is disputed.

Reason: It's questionable whether the bug of OCL_ICD_VENDORS still exists. With ocl-icd 2.3.3, setting the OCL_ICD_VENDORS to a single file seems to work without an issue, and clinfo -l produces correct result. (Discuss in Talk:GPGPU)

You can specify which implementations should your application see using ocl-icd-choose^AUR. For example:

$ ocl-icd-choose amdocl64.icd:mesa.icd davinci-resolve-checker

ocl-icd-choose is a simple wrapper script that sets the OCL_ICD_FILENAMES environment variable. It would not be needed if OCL_ICD_VENDORS could handle single icd files like the manual suggests; this is a known bug being tracked as OCL-dev/ocl-icd#7.

Runtime

To execute programs that use OpenCL, a runtime that can be loaded by libOpenCL.so needs to be installed.

OpenCL on generic GPU

For any GPU supported by Mesa, you can use rusticl by installing opencl-mesa. It can be enabled by using the environment variable RUSTICL_ENABLE=driver, where driver is a Gallium driver, such as radeonsi or iris. It works on the broadest set of hardware but does not necessarily provide the best performance. If you have trouble setting up other drivers, try using it: it takes a very small amount of configuration to enable.

OpenCL on AMD/ATI GPU

rocm-opencl-runtime: Part of AMD's ROCm GPU compute stack, officially supporting a small range of GPU models (other cards may work with unofficial or partial support). To support cards older than Vega, you need to set the runtime variable ROC_ENABLE_PRE_VEGA=1. This is similar, but not quite equivalent to specifying opencl=rocr in ubuntu's amdgpu-install, because this package's rocm version differs from ubuntu's installer version.
opencl-legacy-amdgpu-pro^AUR: Legacy Orca OpenCL repackaged from AMD's ubuntu releases. Equivalent to specifying opencl=legacy in ubuntu's amdgpu-install.
opencl-amd^AUR, opencl-amd-dev^AUR: ROCm components repackaged from AMD's Ubuntu releases. Equivalent to specifying opencl=rocr,legacy in ubuntu's amdgpu-install.

OpenCL image support

The latest ROCm versions now includes OpenCL Image Support used by GPGPU accelerated software such as Darktable. ROCm with the AMDGPU open source graphics driver are all that is required. AMDGPU PRO is not required.

$ /opt/rocm/bin/clinfo | grep -i "image support"

Image support                                   Yes

OpenCL on NVIDIA GPU

opencl-nvidia: official NVIDIA runtime

OpenCL on Intel GPU

intel-compute-runtime: a.k.a. the Neo OpenCL runtime, the open-source implementation for Intel HD Graphics GPU on Gen12 (Alder Lake) and beyond.
intel-compute-runtime-legacy^AUR: same as above only for Gen11(Rocket Lake) and lower
beignet^AUR: the open-source implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy hardware platforms (e.g. Ivy Bridge, Haswell).
intel-opencl^AUR: the proprietary implementation for Intel HD Graphics GPU on Gen7 (Ivy Bridge) and beyond, deprecated by Intel in favour of NEO OpenCL driver, remains recommended solution for legacy hardware platforms (e.g. Ivy Bridge, Haswell).

OpenCL on CPU

The following allow OpenCL to be run on a CPU:

intel-opencl-runtime^AUR: Intel's LLVM and oneAPI-based implementation for x86_64 processors. Officially intended for Intel Core and Xeon processors, it actually works on all x86_64 processors with SSE4.1. Even on AMD Zen processors it outperforms pocl.
pocl: LLVM-based OpenCL implementation, works for all CPU architectures supported by LLVM and some GPUs (Nvidia through libCUDA, Intel through Level Zero) as well as OpenASIP. Despite this broad coverage, its performance leaves a lot to be desired.
amdapp-sdk^AUR: AMD CPU runtime, abandoned

OpenCL on Vulkan

The following allow OpenCL to be run on top of a Vulkan runtime:

clspv-git^AUR: Clspv is a prototype compiler for a subset of OpenCL C to Vulkan compute shaders.
clvk-git^AUR: clvk is a prototype implementation of OpenCL 3.0 on top of Vulkan using clspv as the compiler.

OpenCL on FPGA

pocl for OpenASIP, see above.
xrt-bin^AUR: Xilinx Run Time for FPGA xrt
fpga-runtime-for-opencl: Intel FPGA Runtime

32-bit runtime

To execute 32-bit programs that use OpenCL, a compatible hardware 32-bit runtime needs to be installed.

Tip The clinfo utility can only be used to list 64-bit OpenCL platforms, devices present and ICD loader properties. for 32-bit you need to compile clinfo for 32-bit or use the 32-bit clinfo from the archlinux32 project.

OpenCL on Generic GPU (32-bit)

lib32-opencl-mesa: OpenCL support for Mesa drivers (32-bit)

OpenCL on NVIDIA GPU (32-bit)

lib32-opencl-nvidia: OpenCL implemention for NVIDIA (32-bit)

Development

For OpenCL development, the bare minimum additional packages required, are:

ocl-icd: OpenCL ICD loader implementation, up to date with the latest OpenCL specification.
opencl-headers: OpenCL C/C++ API headers.

The vendors' SDKs provide a multitude of tools and support libraries:

intel-opencl-sdk^AUR: Intel OpenCL SDK (old version, new OpenCL SDKs are included in the INDE and Intel Media Server Studio)
amdapp-sdk^AUR: This package is installed as /opt/AMDAPP and apart from SDK files it also contains a number of code samples (/opt/AMDAPP/SDK/samples/). It also provides the clinfo utility which lists OpenCL platforms and devices present in the system and displays detailed information about them. As the SDK itself contains a CPU OpenCL driver, no extra driver is needed to execute OpenCL on CPU devices (regardless of its vendor).
rocm-opencl-sdk: Develop OpenCL-based applications for AMD platforms.
cuda: Nvidia's GPU SDK which includes support for OpenCL 3.0.

Language bindings

These bindings allow other languages to call into the OpenCL environment. They generally do not alter the requirement to write the kernel in OpenCL C.

JavaScript/HTML5: WebCL
Python: python-pyopencl
D: cl4d or DCompute
Java: Aparapi or JOCL (a part of JogAmp)
Mono/.NET: Open Toolkit
Go: OpenCL bindings for Go
Racket: Racket has a native interface on PLaneT that can be installed via raco.
Rust: ocl
Julia: OpenCL.jl

SYCL

According to Wikipedia:SYCL:

SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17.

SYCL is [...] inspired by OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single-source development where C++ template functions can contain both host and device code to construct complex algorithms that use hardware accelerators, and then re-use them throughout their source code on different types of data.

While the SYCL standard started as the higher-level programming model sub-group of the OpenCL working group and was originally developed for use with OpenCL and SPIR, SYCL is a Khronos Group workgroup independent from the OpenCL working group [...] starting with SYCL 2020, SYCL has been generalized as a more general heterogeneous framework able to target other systems. This is now possible with the concept of a generic backend to target any acceleration API while enabling full interoperability with the target API, like using existing native libraries to reach the maximum performance along with simplifying the programming effort. For example, the OpenSYCL implementation targets ROCm and CUDA via AMD's cross-vendor HIP.

In other words, the SYCL defines a programming environment based on C++17. This environment is intended to be combined with compilers that produce code for both CPUs and GPGPUs. The language intended for the GPGPU side used to be SPIR, but compilers that target other intermediate representations also exist.

Implementations

trisycl-git^AUR: Open source implementation mainly driven by Xilinx.
adaptivecpp^AUR: Compiler for multiple programming models (SYCL, C++ standard parallelism, HIP/CUDA) for CPUs and GPUs from all vendors.
intel-oneapi-dpcpp-cpp: Intel's Data Parallel C++: the LLVM/oneAPI Implementation of SYCL.
computecpp^AUR Codeplay's proprietary implementation of SYCL 1.2.1. Can target SPIR, SPIR-V and experimentally PTX (NVIDIA) as device targets (ends of support on 1st september 2023, will get merged into intel llvm implementation Source).

oneAPI

oneAPI is a marketing name used by many of Intel's high-performance computing libraries. The package intel-oneapi-dpcpp-cpp provides /opt/intel/oneapi/setvars.sh that can be used to set up the environment for compilation and linking of SYCL programs. Some parts of the components like the compiler are open-sourced by Intel on GitHub while others are not.

Some packages can be installed for additional functionality. For example, intel-oneapi-mkl-sycl enables GPU offloading of the Math Kernel Library (MKL) to be supported.

Checking for SPIR support

SYCL was originally intended to be compiled to SPIR or SPIR-V. Both are intermediate languages designed by Khronos that can be consumed by an OpenCL driver. SPIR is included as a OpenCL 1.0 or 2.0 extension while SPIR-V is included in OpenCL core from version 2.1 onwards (clCreateProgramWithIL). To check whether SPIR or SPIR-V are supported clinfo can be used:

$ clinfo | grep -i spir

Platform Extensions                             cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer cl_intel_vec_len_hint 
  IL version                                    SPIR-V_1.0
  SPIR versions                                 1.2

ComputeCpp additionally ships with a tool that summarizes the relevant system information:

$ computecpp_info

Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz
  CL_DEVICE_VENDOR                        : Intel(R) Corporation
  CL_DRIVER_VERSION                       : 18.1.0.0920
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU

This article or section is out of date.

Reason: Is the driver for AMD opencl-legacy-amdgpu-pro^AUR or opencl-amd^AUR? (Discuss in Talk:GPGPU)

Drivers known to at least partially support SPIR or SPIR-V include intel-compute-runtime, intel-opencl-runtime^AUR, pocl and amdgpu-pro-opencl^AUR^{[broken link: package not found]}.

Other uses of SPIR/SPIR-V

SPIR-V covers not only compute kernels, but also graphics shaders. It is able to serve as a shader intermediate language for OpenGL and Vulkan as well as a compute intermediate language for OpenCL, Vulkan, and SYCL. It can also be decompiled as a way to convert a kernel or shader from one language to another.

The C++ for OpenCL language (not to be confused with the short-lived "OpenCL C++") combines C++17 with OpenCL C. A handful of OpenCL drivers support loading it directly with clBuildProgram() like one would do with OpenCL C, but the main expected usage is through clang converting it to SPIR-V first. It can be an option for programmers who want to use C++17 in writing OpenCL kernels but do not want to embrace the whole model of SYCL.

Development

SYCL requires a working C++11 environment to be set up.

There are a few open source libraries available for making use of the parallel capabilities in SYCL from its C++17 language:

ComputeCpp SDK: Collection of code examples, cmake integration for ComputeCpp
SYCL-DNN: Neural network performance primitives
SYCL-BLAS: Linear algebra performance primitives
VisionCpp: Computer Vision library
SYCL Parallel STL: GPU implementation of the C++17 parallel algorithms

CUDA

CUDA (Compute Unified Device Architecture) is NVIDIA's proprietary, closed-source parallel computing architecture and framework. It requires an NVIDIA GPU, and consists of several components:

Required:
- Proprietary NVIDIA kernel module
- CUDA "driver" and "runtime" libraries
Optional:
- Additional libraries: CUBLAS, CUFFT, CUSPARSE, etc.
- CUDA toolkit, including the nvcc compiler
- CUDA SDK, which contains many code samples and examples of CUDA and OpenCL programs

The kernel module and CUDA "driver" library are shipped in nvidia and opencl-nvidia. The "runtime" library and the rest of the CUDA toolkit are available in cuda. cuda-gdb needs ncurses5-compat-libs^AUR to be installed, see FS#46598.

Development

The cuda package installs all components in the directory /opt/cuda. The script in /etc/profile.d/cuda.sh sets the relevant environment variables so all build systems that support CUDA can find it.

To find whether the installation was successful and whether CUDA is up and running, you can compile the CUDA samples. One way to check the installation is to run the deviceQuery sample.

Language bindings

Fortran: PGI CUDA Fortran Compiler
Haskell: The accelerate package lists available CUDA backends
Java: JCuda
Mathematica: CUDAlink
Mono/.NET: CUDAfy.NET, managedCuda
Perl: KappaCUDA, CUDA-Minimal
Python: python-pycuda
Ruby: rbcuda
Rust: Rust-CUDA. Unmaintained: cuda-sys (bindings), RustaCUDA (high-level wrapper).

ROCm

ROCm ROCm (Radeon Open Compute Platform) is AMD's answer to CUDA. It includes many pieces of software from driver (AMDGPU) to compiler and runtime library, much like CUDA. Some parts are specific to select AMD GPUs, other parts completely hardware-agnostic. See the ROCm for Arch Linux repository for more information.

ROCm includes a singular software stack for implementing compute capabilities on the AMDGPU driver. On top of this stack, it implements HIP, OpenMP, and OpenCL. It also includes some parts built on top of HIP as well as an implementation of HIP using NVIDIA's CUDA stack.

ROCm-enabled models

ROCm should suppport AMD GPUs from the Polaris architecture (RX 500 series) and above. The official list of GPU models is very short, consisting of mostly profession models. However, consumer GPUs and APUs of the equivalent generations are known to work. Other generations may work with unofficial or partial support. To support Polaris, you need to set the runtime variable ROC_ENABLE_PRE_VEGA=1.

It takes some time for newer AMD GPU architectures to be added to ROCm; see Wikipedia:ROCm#Consumer-grade GPUs for an up-to-date support matrix. Also see #OpenCL on AMD/ATI GPU from before.

HIP

The Heterogeneous Interface for Portability (HIP) is AMD's dedicated GPU programming environment for designing high performance kernels on GPU hardware. HIP is a C++ runtime API and programming language that allows developers to create portable applications on different platforms.

HIP's specification is managed in the rocm-systems repository, but HIP itself is hardware-agnostic.

HIP runtime library packages:

rocm-hip-runtime: The high-level runtime library, analogous to the OpenCL ICD loader.
hip-runtime-amd: Implementation of HIP for AMD GPUs.
hip-runtime-nvidia: The Heterogeneous Interface for NVIDIA GPUs in ROCm. This is really just a bunch of header files for the CUDA C++ compiler mostly consisting of #define renames.

The rocm-hip-sdk package includes the HIP SDK. All components are installed in the directory /opt/rocm. The script that sets the relevant environment variables is provided in /etc/profile.d/rocm.sh. Some software might also check for HIP_PATH, which can be manually set to the same value as ROCM_PATH.

miopen-hip is the HIP backend for AMD's open source deep learning library.

ROCm troubleshooting

First check if your GPU shows up in /opt/rocm/bin/rocminfo. If it does not, it might mean that ROCm does not support your GPU or it is built without support for your GPU.

Also check the ROCm HIP environment variables for debugging, GPU isolation, etc.

OpenMP and OpenACC

OpenMP is better-known for its use in CPU multiprocessing, but it also supports some offloading for moving some work to GPGPUs. OpenACC is in a similar position: both are based on inserting pragmas into ordinary C/C++/Fortran code and having the compiler split out the marked part for offloading or multiprocessing.

AMD provides an implementation of OpenMP for ROCm-capable AMD GPUs. The openmp-extras^AUR package provides AOMP - an open source Clang/LLVM based compiler with added support for the OpenMP API on AMD GPUs.
Nvidia's nvhpc provides OpenMP implementation with GPU offloading on their GPUs. [1]
GCC can generate Nvidia (nvptx) and AMD (gfx9, gfx10, gfx11) code for offloading OpenMP and OpenACC [2].
Clang/LLVM can generate Nvidia (nvptx) and AMD (amdgpu) code for offloading OpenMP and OpenACC [3] [4].

List of GPGPU accelerated software

This article or section needs expansion.

Reason: More application may support GPGPU. (Discuss in Talk:GPGPU)

Bitcoin
Blender – CUDA support for Nvidia GPUs and HIP support for AMD GPUs. More information in [5].
BOINC – some projects provide OpenCL and/or CUDA programs
cuda_memtest^AUR – a GPU memtest. Despite its name, is supports both CUDA and OpenCL.
darktable – OpenCL feature requires at least 1 GB RAM on GPU and Image support (check output of clinfo command).
DaVinci Resolve - a non-linear video editor. Can use both OpenCL and CUDA.
FFmpeg – more information in [6].
- HandBrake
Folding@home – uses OpenCL and CUDA versions of the molecular simulation software GROMACS
GIMP – experimental – more information in [7].
Hashcat
imagemagick
LibreOffice Calc – more information in [8].
mpv - See mpv#Hardware video acceleration.
lc0^AUR - Used for searching the neural network (supports tensorflow, OpenCL, CUDA, and openblas)
opencl-benchmark - simple FP64/FP32/FP16/INT64/INT32/INT16/INT8 and memory/PCIe bandwidth benchmarking tool for OpenCL
opencv
ollama - LLM inference software
llama.cpp-cuda^AUR, llama.cpp-hip^AUR - Port of Facebook's LLaMA model in C/C++
pyrit^AUR
python-pytorch-cuda, python-pytorch-rocm
tensorflow-cuda, tensorflow-computecpp^AUR
xmrig - High Perf CryptoNote CPU and GPU (OpenCL, CUDA) miner

PyTorch on ROCm

To use PyTorch with ROCm install python-pytorch-rocm.

$ python -c 'import torch; print(torch.cuda.is_available())'

True

ROCm pretends to be CUDA so this should return True. If it does not, either it is not compiled with your GPU support or you might have conflicting dependencies. You can verify those by looking at ldd /usr/lib/libtorch.so - there should not be any missing .so files nor multiple versions of same .so.

OpenCL

ICD loader (libOpenCL.so)

Managing implementations

Runtime

OpenCL on generic GPU

OpenCL on AMD/ATI GPU

OpenCL image support

OpenCL on NVIDIA GPU

OpenCL on Intel GPU

OpenCL on CPU

OpenCL on Vulkan

OpenCL on FPGA

32-bit runtime

OpenCL on Generic GPU (32-bit)

OpenCL on NVIDIA GPU (32-bit)

Development

Language bindings

SYCL

Implementations

oneAPI

Checking for SPIR support

Other uses of SPIR/SPIR-V

Development

CUDA

Development

Language bindings

ROCm

ROCm-enabled models

HIP

ROCm troubleshooting

OpenMP and OpenACC

List of GPGPU accelerated software

PyTorch on ROCm

See also