T05 Dec 18, 2025 3 min read

GPU

A processor optimized for massively-parallel numeric work, typically used for graphics and data-parallel computation.

Definition

A GPU (graphics processing unit) is a processor designed to perform the same kind of computation over many data elements in parallel.

GPUs shine when a problem can be expressed as data-parallel work: apply a similar operation across a large array, image, tensor, or set of vectors.

  • Related: kernel (GPU work unit), driver, device memory, throughput
  • Often contrasted with: CPU

How GPUs show up in systems

Even when a workload “uses the GPU”, there is typically still a program on the CPU orchestrating:

  • moving data to/from GPU memory
  • launching GPU kernels (units of GPU work)
  • scheduling, batching, and handling errors/timeouts
  • integrating results back into the rest of the system

So “GPU acceleration” is usually a split: CPU for control flow and coordination, GPU for high-throughput numeric kernels.

What “branchy” means on a GPU

Branchy code on a GPU can be expensive for a different reason than on a CPU. GPUs are built to run the same instruction across many lanes/threads efficiently. When different lanes want to take different branches (e.g., some threads go into the if, others go into the else), the GPU may have to execute both paths and mask off lanes, reducing effective parallelism. This effect is often called branch divergence.

The practical rule is not “avoid branches” but “avoid data-dependent branches inside tight, massively-parallel kernels when possible.”

Compilation, runtimes, and GPU code

GPU workloads often involve additional compilation layers:

  • AOT compilation: ship precompiled GPU kernels (or ship code that can be compiled during install/build).
  • JIT compilation: compile kernels at runtime based on the specific GPU and shapes of data.
  • A runtime may handle compilation, caching, and dispatching under the hood. The runtime environment still matters (drivers, permissions, device availability, container setup).

Common failure modes

GPU-related issues often look like “the code is correct but something is off”, because the runtime environment is part of the stack:

  • missing or mismatched drivers
  • insufficient GPU memory or fragmentation
  • performance cliffs due to data transfer overhead
  • differences between dev/prod hardware

CPU vs GPU (the boundary)

Compared to the CPU, the GPU is less about flexible branching and more about running a large number of similar operations efficiently. If the workload is small, highly branchy, or dominated by I/O, a GPU may not help. Overhead can dominate.

Mini-scenario

An image-processing service might decode images on the CPU, then batch pixel transforms on the GPU, then encode results back on the CPU. If requests arrive one-by-one with tiny images, the GPU overhead can dominate. If requests are batched or images are large, the GPU’s throughput advantage becomes visible.