AiTechWorlds
AiTechWorlds
Right now — this exact moment as you read these words — your CPU is executing billions of operations.
It is rendering the pixels of each letter on your screen, checking whether your mouse has moved, watching for keyboard input, managing your Wi-Fi connection, running background security scans, keeping the clock updated, and tracking memory usage.
Not one at a time. All simultaneously.
And between the time it took you to read that paragraph, your CPU completed roughly 20 billion of those individual operations.
How? How does a sliver of silicon the size of a postage stamp do what thousands of human mathematicians could not do in a lifetime?
The answer starts with understanding what a CPU really is — not as a buzzword on a spec sheet, but as a physical, brilliantly engineered object.
CPU stands for Central Processing Unit. It is the primary component that executes all program instructions — every calculation, every decision, every data movement that your computer makes flows through here.
A modern CPU is a silicon chip — a thin wafer of purified silicon crystal, typically measuring between 100 and 300 square millimetres (roughly the size of a postage stamp to a large thumbnail). On that chip, etched with beams of ultraviolet light using a process called photolithography, are billions of microscopic transistors.
| CPU | Year | Transistors | Process Node | Die Size |
|---|---|---|---|---|
| Intel 4004 (first microprocessor) | 1971 | 2,300 | 10,000 nm | 12 mm² |
| Intel Pentium | 1993 | 3,100,000 | 800 nm | 296 mm² |
| Intel Core i7 (Nehalem) | 2008 | 731,000,000 | 45 nm | 263 mm² |
| AMD Ryzen 9 7950X | 2022 | ~13,000,000,000 | 5 nm | 70 mm² |
| Apple M2 | 2022 | 20,000,000,000 | 5 nm | 208 mm² |
| Apple M4 | 2024 | ~28,000,000,000 | 3 nm | ~205 mm² |
Twenty billion transistors — each one a switch that can be on or off — packed into a chip you can cover with your thumb. Each transistor is approximately 5 nanometres across. A human hair is 70,000 nanometres wide. You could line up 14,000 transistors across a single human hair.
The ALU is where all actual computation happens. Every mathematical operation and every logical decision runs through here.
Arithmetic operations:
Logic operations:
Every if-statement in every program, every pixel colour calculation, every financial transaction, every game physics calculation — all of it reduces to ALU operations.
The Control Unit (CU) runs the Fetch-Decode-Execute cycle. It fetches instructions from memory, interprets what they mean, and orchestrates the ALU, registers, and memory to carry them out.
The CU contains the Program Counter (PC) — a register that holds the memory address of the next instruction. After each fetch, it increments automatically to point to the next instruction.
Registers are tiny storage locations built directly into the CPU — not separate chips like RAM, but actual flip-flop circuits woven into the CPU's core.
Registers hold the values the ALU is currently working on — like a calculator's display. The CPU constantly loads values from RAM into registers, operates on them, then writes results back.
Analogy: Registers are like the chef's hands — you can only hold a few things at once, but what you're holding you can work with instantly.
This is where modern CPUs gain their real performance edge.
RAM is fast — but not fast enough. A CPU operating at 3 GHz can request a new value every 0.3 nanoseconds. RAM takes 60–80 nanoseconds to respond. That is a 200× speed mismatch. Without cache, the CPU would spend 99% of its time waiting for memory.
Cache is small, ultra-fast memory built on the CPU die itself. It stores copies of recently used data and instructions so the CPU can access them without waiting for RAM.
Modern CPUs have three levels:
| Cache Level | Location | Typical Size | Access Time | Shared? |
|---|---|---|---|---|
| L1 Cache | Inside each core | 32–128 KB per core | ~1 ns | Per core |
| L2 Cache | Inside each core | 256 KB – 1 MB per core | ~5 ns | Per core |
| L3 Cache | On the CPU die | 8 MB – 64 MB | ~20 ns | All cores share |
| (RAM) | (Off-chip) | 8 GB – 64 GB | ~60–80 ns | All cores |
When the CPU needs data, it checks L1 first. If found (cache hit), it proceeds instantly. If not (cache miss), it checks L2, then L3, then finally goes to RAM. The goal of CPU design is to maximise cache hits.
Clock speed (measured in GHz — gigahertz) tells you how many times per second the CPU's internal clock ticks. Each tick is one clock cycle — the basic unit of CPU time.
Each clock cycle, one (or more, with pipelining) operation completes in each core. A CPU at 3.5 GHz does not mean it does 3.5 billion useful things per second — some operations take multiple cycles. But generally, higher clock speed = faster per-core performance for tasks that run in sequence.
Analogy: Clock speed is how fast each chef moves. A chef working at 5 GHz is faster than one at 3 GHz — but only on tasks where one chef's speed is the bottleneck.
A core is a complete, independent processing unit — its own ALU, Control Unit, registers, and L1/L2 cache. A multi-core CPU is multiple complete CPUs on one chip, sharing the L3 cache and memory controller.
| Core Count | Typical Use |
|---|---|
| 2 cores | Basic computing, budget laptops (2010s era) |
| 4 cores | Standard for general computing (2015–2020) |
| 6–8 cores | Mainstream performance laptops and desktops (2020–2025) |
| 10–16 cores | High-end workstations, creative professionals |
| 24–64 cores | Server CPUs (AMD EPYC, Intel Xeon) |
Analogy: Cores are like workers in a restaurant kitchen. One chef (1 core) can make one dish at a time. Eight chefs (8 cores) can make eight dishes simultaneously. Clock speed is how fast each chef moves. More chefs helps parallel work; faster chefs helps sequential work.
Hyper-Threading (Intel's term) and Simultaneous Multi-Threading or SMT (AMD's term) allow each physical core to present itself as two logical processors (threads) to the operating system.
This works because a single core is rarely 100% utilised every cycle. While core 1's ALU is waiting for data from cache, its second thread can use the ALU's idle time.
In practice, Hyper-Threading adds roughly 15–30% performance improvement for workloads that can use it.
Architecture refers to the fundamental design of how the CPU understands instructions. Different architectures use different instruction sets.
The shift of Apple's Macs from Intel (x86-64) to Apple Silicon (ARM) in 2020 was a major industry moment — Apple's M-series chips proved ARM could match or beat x86-64 in performance while using dramatically less power.
| Specification | Intel Core i9-14900K | AMD Ryzen 9 7950X | Apple M4 Pro |
|---|---|---|---|
| Year | 2023 | 2022 | 2024 |
| Architecture | x86-64 (Raptor Lake) | x86-64 (Zen 4) | ARM (3nm) |
| Cores / Threads | 24 cores / 32 threads | 16 cores / 32 threads | 14 cores / 14 threads |
| Base / Boost Clock | 3.2 / 6.0 GHz | 4.5 / 5.7 GHz | ~4.4 GHz |
| L3 Cache | 36 MB | 64 MB | 24 MB |
| TDP (Power) | 125–253 W | 170 W | ~30 W |
| Best For | Gaming, overclocking | Workstations, content creation | Battery life, macOS, efficiency |
Notice Apple M4 Pro uses only ~30 watts compared to Intel's 253 watts under load — yet delivers comparable or superior performance for many tasks. This is the power of ARM's efficiency-first design.
| CPU Spec | What It Measures | Good Value (2026) | Why It Matters |
|---|---|---|---|
| Clock Speed | Operations per second per core | 3.5–5.0 GHz | Faster single-core speed = snappier app response |
| Core Count | Parallel processing capacity | 8–12 cores (mainstream) | More cores = better multitasking and multi-threaded apps |
| Thread Count | Logical processors seen by OS | 2× core count (with HT/SMT) | Higher thread count helps video encoding, compilation |
| L1/L2 Cache | Per-core fast memory | L1: 64+ KB; L2: 512 KB+ | More cache = fewer slow RAM trips |
| L3 Cache | Shared fast memory | 16–32 MB (mainstream) | Large L3 helps with large datasets |
| TDP (Thermal Design Power) | Heat and power consumption | 15 W (laptop) – 125 W (desktop) | Lower TDP = better battery; higher TDP = more peak performance |
| Process Node | Transistor size | 3–5 nm (cutting edge) | Smaller node = more transistors, better efficiency |
Let us bring everything together with a restaurant:
A fast restaurant has fast chefs, a well-stocked immediate shelf, and enough chefs to handle multiple orders at once. A fast CPU has high clock speed, large cache, and many cores.
You now understand the foundation of computing hardware. From the stored-program concept to the billions of transistors executing your every click — the machine is no longer a mystery. Next, we explore how software instructs this hardware to do your bidding.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises