Imagine an automotive assembly line running at full speed. Station 4 is installing the engine. But suddenly, the engine blocks aren't ready — the casting machine upstream is still machining them. Does Station 4 keep working? It can't. The entire line stalls, waiting for that one part.

In a CPU pipeline, the same problem emerges constantly. An instruction in the Execute stage might need the result of the instruction right before it, which is still in Write Back. The instruction can't proceed. The pipeline stalls. These blocking conditions are called hazards, and handling them is one of the central engineering challenges of CPU design.

Understanding hazards isn't just academic — they determine the difference between a CPU with CPI 1.0 and one with CPI 1.3. They're also responsible for two of the most significant security vulnerabilities in computing history: Meltdown and Spectre.

Why Hazards Happen: The Root Cause

The pipeline's entire value comes from overlapping instructions. But instructions aren't independent — they share resources, depend on each other's results, and change the program's flow. Whenever the overlap assumption breaks down, you have a hazard.

There are exactly three categories:

Structural Hazards

A structural hazard occurs when two instructions in different pipeline stages need the same hardware resource simultaneously.

Classic example: Early computers had a single memory with one port. The IF stage (fetching instructions) and the MEM stage (loading/storing data) both need memory access — at the same time. Only one can win.

Solution: Separate instruction and data caches. Modern CPUs have distinct L1 Instruction Cache and L1 Data Cache — two independent memory systems that can be accessed simultaneously. This is why the Harvard architecture (separate instruction and data paths) influenced modern CPU design even though the von Neumann model (unified memory) dominates at the system level.

Another common structural hazard: a single write port on the register file when two instructions both want to write back simultaneously. Solution: add a second write port (costs silicon area but eliminates the stall).

Data Hazards

Data hazards are the most common type. They occur when an instruction depends on the result of a previous instruction that hasn't finished yet.

RAW — Read After Write (True Dependency)

Instruction 2 needs R1, which instruction 1 won't write until WB (cycle 5). But instruction 2 tries to read R1 in ID (cycle 3). The value isn't there yet — two cycles too early.

WAR — Write After Read (Anti-Dependency)

Instruction 2 writes a register that instruction 1 still needs to read. Less common in classic 5-stage pipelines but critical in out-of-order execution.

WAW — Write After Write (Output Dependency)

Two instructions both write the same register. The second write must happen after the first to preserve correct behavior. Again, more relevant in out-of-order execution.

Solution 1: Stalling (Inserting Bubbles)

The simplest solution: freeze the pipeline. Insert NOP (no-operation) instructions — called bubbles — to delay instruction 2 until instruction 1's result is available.

A stall injects one or more empty pipeline cycles
The hardware detects the hazard using a hazard detection unit that compares destination registers of instructions in the pipeline
Cost: Each bubble = 1 wasted clock cycle; a 2-cycle stall costs 2 cycles of throughput

Solution 2: Forwarding (Bypassing)

The much better solution. Instead of waiting for the result to be written to the register file and then read back, forward it directly from where it's computed to where it's needed.

The ALU result is available at the end of EX stage. A forwarding path routes it directly to the input of the next instruction's EX stage. No waiting. No wasted cycles.

Forwarding paths needed:

EX/MEM → EX: Forward from end of Execute to start of Execute (1-cycle gap)
MEM/WB → EX: Forward from end of Memory to start of Execute (2-cycle gap)

Load-use hazard — the one case forwarding can't fully fix: A load instruction doesn't have its data until the end of MEM stage. If the very next instruction needs that data in EX, even forwarding arrives one cycle late. One bubble is still required. This is the load-use hazard, and compilers try hard to reorder instructions to avoid it.

Hazard Type	Cause	Detection	Solution	Performance Impact
Structural	Two instructions need same hardware	Static design analysis	Duplicate resources (separate I/D caches)	None (design-time fix)
RAW	Read before write completes	Compare pipeline register IDs	Forwarding / 1-2 cycle stall	0–2 cycles per hazard
WAR	Write before dependent read	Out-of-order scheduling	Register renaming	Minimal in-order; significant OOO
WAW	Two writes to same register	Out-of-order scheduling	Register renaming	Minimal in-order; significant OOO
Load-Use	Load result needed immediately	Detect load in EX stage	1 bubble + forwarding	1 cycle per occurrence
Control	Branch changes PC	Branch instruction in ID/EX	Branch prediction	1–20 cycles on misprediction

Control Hazards

Control hazards arise from branch and jump instructions. When the CPU fetches instruction 3, it might be inside an if block that won't execute. The CPU has already fetched and started decoding the wrong instructions.

The Problem

A conditional branch like BEQ R1, R2, target doesn't resolve until the EX stage (when the comparison happens). By that time, two more instructions have already entered the pipeline — instructions that might be from the wrong path.

If those instructions are allowed to complete, the program produces incorrect results. The CPU must flush them — discard their work — and start fetching from the correct address.

Flush cost = pipeline depth at which branch resolves

5-stage pipeline: 2 wasted cycles per mispredicted branch
Modern 19-stage pipeline: up to 15–18 wasted cycles per misprediction

Solution: Branch Prediction

Rather than stalling, the CPU guesses which path the branch will take and continues fetching from that path speculatively.

Static Prediction:

Always not-taken: assume branches don't branch; ~60% accurate
Predict backward taken, forward not-taken: better for loops; ~65% accurate

Dynamic Prediction (the real solution):

1-bit predictor: Remember the last outcome (taken/not-taken). Use it as prediction. Works well for loops but fails on the last iteration.

2-bit saturating counter: Four states — Strongly Taken, Weakly Taken, Weakly Not-Taken, Strongly Not-Taken. Misprediction requires two wrong outcomes to flip the prediction. Loop exit no longer causes persistent misprediction.

Branch History Table (BHT): Indexed by lower bits of branch PC. Each entry holds a 2-bit counter. Thousands of entries in hardware.

Tournament predictors: Use multiple predictors and a meta-predictor to choose between them. Used in Intel's TAGE predictor (Tagged Geometric history length) achieving 95–99% accuracy in modern CPUs.

Out-of-Order Execution (OOE)

Modern CPUs don't just solve hazards defensively — they reorganize instruction execution to avoid them proactively.

In out-of-order execution (OOE), instructions enter the pipeline in program order but execute in whatever order their inputs are ready:

Instructions are fetched and decoded in order
Placed into a Reservation Station (or Reorder Buffer / ROB)
An instruction executes as soon as its operands are available — regardless of program order
Results are committed to the register file in program order (to maintain correctness)

This means a long-latency division instruction doesn't block a sequence of independent additions that follow it. OOE effectively hides latency by finding independent work to do.

Intel's ROB (Reorder Buffer) on 13th Gen: 512 entries — meaning up to 512 instructions can be "in flight" simultaneously, out of order.

Speculative Execution and Security

When the CPU predicts a branch and fetches speculatively, it's executing instructions that might not even belong in the program flow. This is called speculative execution.

In 2018, researchers discovered that speculative execution can be exploited:

Spectre (CVE-2017-5753/5715): Tricks the CPU into speculatively executing instructions across security boundaries. The speculative access leaves traces in the cache (a side channel), which an attacker can measure to infer secret data. Affects virtually all modern processors.
Meltdown (CVE-2017-5754): Exploits the window between a speculative memory access and the permission check. An unprivileged program can speculatively read kernel memory before the CPU realizes the access was illegal. Primarily affected Intel processors.

Both vulnerabilities stem from the same root cause: the CPU caches the results of speculative work, even when that work is later discarded. The cache state is observable, leaking information.

Mitigations include microcode updates, OS patches (KPTI — Kernel Page Table Isolation), and hardware fixes in newer CPU generations — all at a performance cost of 5–30% for affected workloads.

Summary

Pipeline hazards are the price of pipeline performance. The three categories — structural, data, and control — each require different solutions: hardware duplication, forwarding/stalling, and branch prediction. Modern CPUs combine all these techniques with out-of-order execution and speculative execution to achieve IPC values far above the theoretical pipeline maximum of 1.0.

The same speculative execution that enables today's multi-GHz performance also created the Spectre and Meltdown vulnerabilities — a reminder that performance and security are permanently in tension at the hardware level.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

30 minLesson 10 of 16

Course Contents(16 lessons)

▾

Chapter 1: Foundations

What Is Computer Architecture? Von Neumann vs Harvard20 min

Number Systems: Binary, Octal, Hexadecimal28 min

Data Representation: Integers, Floats, and IEEE 75430 min

Chapter 2: Digital Logic

Boolean Algebra and Logic Gates32 min

Combinational Circuits: Adders, Multiplexers, Decoders28 min

Sequential Circuits: Flip-Flops, Registers, Counters30 min

Chapter 3: CPU Architecture

ALU, Registers, and the Datapath32 min

Instruction Set Architecture: RISC vs CISC35 min

CPU Pipeline: The 5-Stage Execution Engine35 min

Pipeline Hazards and Modern Solutions30 min

Chapter 4: Memory Systems

Cache Memory: Mapping, Associativity, Replacement35 min

Virtual Memory, Page Tables, and TLB32 min

Chapter 5: I/O and Advanced Topics

I/O Systems, Interrupts, and DMA28 min

Parallel Processing: Multicore and Flynn's Taxonomy30 min

Modern CPU Architectures: ARM, x86-64, Apple Silicon28 min

Chapter 6: Final Project

Final Project: Analyze and Compare CPU Architectures45 min