CPU vs GPU

A CPU (Central Processing Unit) is built for general-purpose work: a small number of powerful cores that handle whatever the operating system, your apps, and your users throw at them. A GPU (Graphics Processing Unit) is built for parallel work: thousands of small cores that all do the same operation on different data at once. Picking between them is mostly about whether your workload is "many different things" or "the same thing, lots of times."

Last reviewed on 2026-04-27.

Quick Comparison

Aspect	CPU	GPU
Core count	Handful to a few dozen powerful cores	Hundreds to thousands of simpler cores
Per-core power	High — sophisticated branch prediction, large cache, high clock	Lower per core, but enormous total throughput
Optimised for	Latency on a single task; many small varied tasks	Throughput on a single task replicated across data
Programming model	General sequential code with threads (C, Java, Python, etc.)	Parallel kernels (CUDA, Metal, OpenCL, Vulkan, DirectX, ROCm)
Memory	System RAM, large but slower (relative to GPU memory)	Dedicated VRAM with very high bandwidth
Common workloads	Operating system, web browsing, compilation, databases, business logic	3D graphics, video encoding/decoding, machine learning, scientific simulation
Where it lives	Always on a computer — even integrated GPUs need a CPU	Integrated into the CPU package, or a separate (discrete) card

Key Differences

1. Few smart cores vs many simple cores

A modern CPU has somewhere between four and sixty-four cores, each capable of running independent threads, doing complex out-of-order execution, and predicting branches deep into the future. Each core is expensive in silicon, but the result is excellent latency — a single instruction stream finishes quickly.

A modern GPU has thousands of cores grouped into "warps" or "wavefronts," and the whole group typically executes the same instruction in lock-step on different data. Per-core, they're simpler. As an aggregate, the GPU's throughput is enormous — but only when the workload looks the same across many data items.

2. Latency vs throughput

This is the cleanest one-line summary. The CPU is a latency machine: it tries to finish each individual task as quickly as possible, even at the cost of leaving execution units idle while it waits for memory. Hence enormous caches, branch predictors, and out-of-order execution.

The GPU is a throughput machine: it doesn't try to make any single task fast. It tries to keep as many threads in flight as possible so that, while one warp is waiting on memory, another is computing. Across the whole batch, the work-per-second is huge.

3. The kinds of code that suit each

CPU-friendly code looks like ordinary programs: lots of branches, lots of small data structures, lots of "do this, then check that, then maybe do something different." Compilers, web servers, browsers, databases, business logic — all CPU territory.

GPU-friendly code looks like the same operation applied to a wide array: shading every pixel of a frame, multiplying two large matrices, encoding every block of a video, running the same neural-network layer over a batch of inputs. The more uniform the work and the bigger the data, the better it suits a GPU.

4. Memory architecture

CPUs use the system's main memory (DDR RAM). It's large, comparatively cheap per gigabyte, and shared across the whole computer, but its bandwidth is modest by GPU standards.

Discrete GPUs ship their own memory (GDDR or HBM) on the card. Capacity is smaller — a GPU might have 12 GB or 24 GB while a system has 32 GB or 64 GB of RAM — but bandwidth is many times higher, often well over 500 GB/s. That's what lets thousands of cores feed their data.

Integrated GPUs and Apple Silicon use a unified-memory architecture where CPU and GPU share the same pool. That sidesteps the data-copy cost between system RAM and VRAM, at the cost of less specialised memory tuning.

5. Programming differences

You write CPU code in any general-purpose language — C, C++, Java, Python, JavaScript, Go, Rust. The OS schedules threads onto cores. Hardware details are mostly hidden.

You write GPU code in a parallel programming model: CUDA (NVIDIA), Metal (Apple), Vulkan or DirectX 12 (cross-platform graphics), OpenCL or ROCm (cross-vendor compute), or higher-level libraries like PyTorch and TensorFlow that hide the GPU calls. You think in terms of "kernels" launched across thousands of threads, with explicit data movement to and from VRAM.

6. Power, cost, and physical shape

A high-end CPU draws tens to a couple hundred watts and fits in a roughly square package on the motherboard. A high-end discrete GPU can draw 300–600 watts, occupies multiple slots, and often demands its own auxiliary power connectors. That's why builds for ML or 3D work tend to be physically large and require careful cooling and PSU planning.

Per dollar of throughput on parallel workloads, a GPU is usually the cheapest option. Per dollar of single-threaded responsiveness on the kinds of code you run all day, a CPU is unbeatable. Most computers need both.

Worked Example: Training a Small Image Model

Suppose you're training a small image classifier on a few thousand photos.

The CPU handles the program itself — running Python, parsing arguments, loading files from disk, decoding JPEGs, splitting into training/validation, and orchestrating the whole pipeline.
The GPU handles the actual training step: matrix multiplications inside each layer, the activation functions, the backward pass. Those operations are the same arithmetic done across millions of weights and many images at once — exactly the GPU's strength.
Without a GPU, the same training would still complete on the CPU — it would just take many times longer because the CPU has to do the matrix math one (or a few) operation at a time.
Without a CPU, the GPU does nothing useful: there's no operating system, no file I/O, no orchestration.

That's the typical pattern: the CPU runs the program; the GPU runs the hot inner loop.

Common Misconceptions

"More cores is always faster." Only for parallelisable work. A workload that can't be split (a single-threaded compiler pass, a single web request) doesn't benefit from extra cores. That's why CPU clock speed and IPC still matter.
"GPUs are just for graphics." Modern GPUs spend much of their silicon on general-purpose compute. Machine-learning training and inference, scientific simulation, and crypto mining all use the same parallel architecture as 3D rendering.
"You can replace a CPU with a GPU." No — even a GPU-heavy workload like training a neural network needs a CPU to run the program around it. The GPU is an accelerator, not a substitute.
"Integrated GPUs are useless." Integrated GPUs have improved dramatically and now handle 1080p gaming, light video editing, and most everyday GPU work. They share system memory, which limits the absolute top end but is fine for many use cases.

Decision Rules

Use this when planning what hardware a task needs:

Is the work mostly varied, branchy, sequential code? CPU is the bottleneck.
Is the work the same operation applied to a large batch of data? GPU will likely dominate.
Is memory size more important than memory speed? Lean CPU + system RAM.
Is memory bandwidth the bottleneck? GPU's VRAM is the right tool.
Does the task already have a mature CUDA/Metal/Vulkan path? Almost always faster on a GPU.