AiTechWorlds
AiTechWorlds
Imagine an automotive assembly line running at full speed. Station 4 is installing the engine. But suddenly, the engine blocks aren't ready — the casting machine upstream is still machining them. Does Station 4 keep working? It can't. The entire line stalls, waiting for that one part.
In a CPU pipeline, the same problem emerges constantly. An instruction in the Execute stage might need the result of the instruction right before it, which is still in Write Back. The instruction can't proceed. The pipeline stalls. These blocking conditions are called hazards, and handling them is one of the central engineering challenges of CPU design.
Understanding hazards isn't just academic — they determine the difference between a CPU with CPI 1.0 and one with CPI 1.3. They're also responsible for two of the most significant security vulnerabilities in computing history: Meltdown and Spectre.
The pipeline's entire value comes from overlapping instructions. But instructions aren't independent — they share resources, depend on each other's results, and change the program's flow. Whenever the overlap assumption breaks down, you have a hazard.
There are exactly three categories:
A structural hazard occurs when two instructions in different pipeline stages need the same hardware resource simultaneously.
Classic example: Early computers had a single memory with one port. The IF stage (fetching instructions) and the MEM stage (loading/storing data) both need memory access — at the same time. Only one can win.
Solution: Separate instruction and data caches. Modern CPUs have distinct L1 Instruction Cache and L1 Data Cache — two independent memory systems that can be accessed simultaneously. This is why the Harvard architecture (separate instruction and data paths) influenced modern CPU design even though the von Neumann model (unified memory) dominates at the system level.
Another common structural hazard: a single write port on the register file when two instructions both want to write back simultaneously. Solution: add a second write port (costs silicon area but eliminates the stall).
Data hazards are the most common type. They occur when an instruction depends on the result of a previous instruction that hasn't finished yet.
Instruction 2 needs R1, which instruction 1 won't write until WB (cycle 5). But instruction 2 tries to read R1 in ID (cycle 3). The value isn't there yet — two cycles too early.
Instruction 2 writes a register that instruction 1 still needs to read. Less common in classic 5-stage pipelines but critical in out-of-order execution.
Two instructions both write the same register. The second write must happen after the first to preserve correct behavior. Again, more relevant in out-of-order execution.
The simplest solution: freeze the pipeline. Insert NOP (no-operation) instructions — called bubbles — to delay instruction 2 until instruction 1's result is available.
The much better solution. Instead of waiting for the result to be written to the register file and then read back, forward it directly from where it's computed to where it's needed.
The ALU result is available at the end of EX stage. A forwarding path routes it directly to the input of the next instruction's EX stage. No waiting. No wasted cycles.
Forwarding paths needed:
Load-use hazard — the one case forwarding can't fully fix: A load instruction doesn't have its data until the end of MEM stage. If the very next instruction needs that data in EX, even forwarding arrives one cycle late. One bubble is still required. This is the load-use hazard, and compilers try hard to reorder instructions to avoid it.
| Hazard Type | Cause | Detection | Solution | Performance Impact |
|---|---|---|---|---|
| Structural | Two instructions need same hardware | Static design analysis | Duplicate resources (separate I/D caches) | None (design-time fix) |
| RAW | Read before write completes | Compare pipeline register IDs | Forwarding / 1-2 cycle stall | 0–2 cycles per hazard |
| WAR | Write before dependent read | Out-of-order scheduling | Register renaming | Minimal in-order; significant OOO |
| WAW | Two writes to same register | Out-of-order scheduling | Register renaming | Minimal in-order; significant OOO |
| Load-Use | Load result needed immediately | Detect load in EX stage | 1 bubble + forwarding | 1 cycle per occurrence |
| Control | Branch changes PC | Branch instruction in ID/EX | Branch prediction | 1–20 cycles on misprediction |
Control hazards arise from branch and jump instructions. When the CPU fetches instruction 3, it might be inside an if block that won't execute. The CPU has already fetched and started decoding the wrong instructions.
A conditional branch like BEQ R1, R2, target doesn't resolve until the EX stage (when the comparison happens). By that time, two more instructions have already entered the pipeline — instructions that might be from the wrong path.
If those instructions are allowed to complete, the program produces incorrect results. The CPU must flush them — discard their work — and start fetching from the correct address.
Flush cost = pipeline depth at which branch resolves
Rather than stalling, the CPU guesses which path the branch will take and continues fetching from that path speculatively.
Static Prediction:
Dynamic Prediction (the real solution):
1-bit predictor: Remember the last outcome (taken/not-taken). Use it as prediction. Works well for loops but fails on the last iteration.
2-bit saturating counter: Four states — Strongly Taken, Weakly Taken, Weakly Not-Taken, Strongly Not-Taken. Misprediction requires two wrong outcomes to flip the prediction. Loop exit no longer causes persistent misprediction.
Branch History Table (BHT): Indexed by lower bits of branch PC. Each entry holds a 2-bit counter. Thousands of entries in hardware.
Tournament predictors: Use multiple predictors and a meta-predictor to choose between them. Used in Intel's TAGE predictor (Tagged Geometric history length) achieving 95–99% accuracy in modern CPUs.
Modern CPUs don't just solve hazards defensively — they reorganize instruction execution to avoid them proactively.
In out-of-order execution (OOE), instructions enter the pipeline in program order but execute in whatever order their inputs are ready:
This means a long-latency division instruction doesn't block a sequence of independent additions that follow it. OOE effectively hides latency by finding independent work to do.
Intel's ROB (Reorder Buffer) on 13th Gen: 512 entries — meaning up to 512 instructions can be "in flight" simultaneously, out of order.
When the CPU predicts a branch and fetches speculatively, it's executing instructions that might not even belong in the program flow. This is called speculative execution.
In 2018, researchers discovered that speculative execution can be exploited:
Spectre (CVE-2017-5753/5715): Tricks the CPU into speculatively executing instructions across security boundaries. The speculative access leaves traces in the cache (a side channel), which an attacker can measure to infer secret data. Affects virtually all modern processors.
Meltdown (CVE-2017-5754): Exploits the window between a speculative memory access and the permission check. An unprivileged program can speculatively read kernel memory before the CPU realizes the access was illegal. Primarily affected Intel processors.
Both vulnerabilities stem from the same root cause: the CPU caches the results of speculative work, even when that work is later discarded. The cache state is observable, leaking information.
Mitigations include microcode updates, OS patches (KPTI — Kernel Page Table Isolation), and hardware fixes in newer CPU generations — all at a performance cost of 5–30% for affected workloads.
Pipeline hazards are the price of pipeline performance. The three categories — structural, data, and control — each require different solutions: hardware duplication, forwarding/stalling, and branch prediction. Modern CPUs combine all these techniques with out-of-order execution and speculative execution to achieve IPC values far above the theoretical pipeline maximum of 1.0.
The same speculative execution that enables today's multi-GHz performance also created the Spectre and Meltdown vulnerabilities — a reminder that performance and security are permanently in tension at the hardware level.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises