AiTechWorlds
AiTechWorlds
In 1971, Intel released the 4004 microprocessor. It was the first commercially available CPU on a single chip — a remarkable achievement that fit the processor, memory controller, and I/O interface onto one piece of silicon the size of a thumbnail. It ran at 740 kHz, processed 4 bits at a time, and contained exactly 2,300 transistors.
In 2024, Apple's M4 chip contains 28 billion transistors. Intel's Core Ultra 9 285K contains over 23 billion. AMD's EPYC Genoa, designed for servers, contains 75 billion transistors across multiple chiplets.
That's a 10-million-fold increase in transistor count in 53 years — an engineering achievement with no parallel in human history. Understanding how we got here, and what the three dominant modern architectures — x86-64, ARM, and RISC-V — actually look like, is the final piece of the computer architecture puzzle.
Every advance in CPU capability ultimately rests on the ability to etch smaller transistors onto silicon:
TSMC N3 (3nm, 2022): 292 million transistors per mm² — an almost incomprehensible density. The "nm" node numbers are now marketing labels rather than literal transistor gate lengths, but the density improvements they represent are real.
Why smaller is better:
The x86 architecture began with the Intel 8086 in 1978 — a 16-bit processor designed for the emerging personal computer market. When IBM chose it for the IBM PC in 1981, x86's dominance was cemented by market forces that would prove nearly impossible to dislodge.
x86 accumulated 45 years of backward compatibility baggage: 16-bit real mode, 32-bit protected mode (IA-32, added with the 80386 in 1985), and finally 64-bit mode (AMD64/x86-64), introduced by AMD in 2003 with the Opteron server processor — later adopted by Intel as Intel 64.
The modern x86-64 ISA is famously complex: Variable-length instructions (1–15 bytes), hundreds of legacy instruction forms, CISC encoding that microcode-decodes into RISC-like micro-operations (µops) internally. This is why x86 CPUs contain a "front-end" that decodes complex instructions into simpler internal forms before the out-of-order execution engine sees them.
| Extension | Year | Purpose | Width |
|---|---|---|---|
| MMX | 1997 | First SIMD — integer vectors | 64-bit |
| SSE/SSE2 | 1999–2001 | Floating point SIMD, new XMM regs | 128-bit |
| SSE4.2 | 2008 | String processing, CRC32, POPCNT | 128-bit |
| AVX/AVX2 | 2011–2013 | 256-bit SIMD, FMA, gather | 256-bit |
| AVX-512 | 2016 | 512-bit SIMD, 32 ZMM registers | 512-bit |
| AES-NI | 2010 | Hardware AES encryption/decryption | — |
| SHA-NI | 2018 | Hardware SHA-1/SHA-256 hashing | — |
| AMX | 2023 | Matrix multiply accelerator (AI workloads) | 2D tiles |
Intel's Alder Lake (2021) introduced a radical change: two types of cores on the same die.
The Intel Thread Director hardware communicates with the OS scheduler, telling it which tasks are CPU-intensive vs. background, so the right tasks land on the right cores.
Intel Core i9-14900K (Raptor Lake Refresh, 2023):
ARM Holdings (now Arm Ltd, owned by SoftBank) designs CPU architectures but does not manufacture chips. It licenses its architecture to semiconductor companies who then design their own implementations. This model has made ARM the most widely deployed CPU architecture on Earth — billions of smartphones, tablets, embedded controllers, and increasingly, laptops and servers.
ARMv8-A (introduced 2011) brought 64-bit computing to ARM with the AArch64 execution state:
ARMv9-A (2021) adds:
| Company | Product | Use Case | Notable Feature |
|---|---|---|---|
| Apple | A18 Pro, M4 | iPhone, iPad, Mac | Highest single-thread perf; unified memory |
| Qualcomm | Snapdragon X Elite | Windows PCs, Android | X1 core: 3.8 GHz, 12 cores |
| Samsung | Exynos 2500 | Galaxy phones | In-house design with AMD GPU |
| AWS | Graviton4 | Cloud servers | 96 Neoverse V2 cores, 30% faster than G3 |
| NVIDIA | Grace (GB200) | AI supercomputers | 72 Neoverse V2 cores + Blackwell GPU |
| Apple | M4 (2024) | MacBook Pro, Mac Studio | 28B transistors, 3nm, 16 CPU cores |
Apple's M-series chips (M1 through M4, 2020–2024) represent a fundamental architectural departure from traditional CPU+GPU systems:
RISC-V (pronounced "risk-five") is an open, free instruction set architecture developed at UC Berkeley starting in 2010. Unlike x86 (Intel/AMD proprietary) and ARM (licensed), RISC-V is free to implement — no licensing fees, no royalties, no restrictions.
RISC-V is a clean-slate RISC design incorporating 50 years of ISA lessons:
| Company | Product | Application |
|---|---|---|
| SiFive | HiFive Unmatched | RISC-V development boards |
| Western Digital | SweRV EH1 | HDD/SSD controllers |
| NVIDIA | Falcon cores, GSP | GPU firmware/management processors |
| Titan M2 | Pixel security chip | |
| Alibaba | XuanTie C910 | Cloud computing |
| ESP32-C3 | Espressif | IoT microcontrollers |
| RISC-V International | Various | Academic research CPUs worldwide |
| Attribute | x86-64 | ARMv9-A | RISC-V (RV64GC) |
|---|---|---|---|
| ISA Type | CISC (complex, variable-length) | RISC (fixed 32-bit, mostly) | RISC (fixed 32/16-bit compressed) |
| License | Proprietary (Intel/AMD) | Licensed (royalties to ARM Ltd) | Open source (free) |
| General Registers | 16 × 64-bit | 31 × 64-bit | 32 × 64-bit |
| Instruction Width | 1–15 bytes (variable) | 32-bit (or 16-bit Thumb2) | 32-bit (or 16-bit compressed) |
| SIMD | SSE/AVX/AVX-512 | NEON/SVE2 | V extension (scalable) |
| Primary Market | Desktop, server, laptop | Mobile, embedded, server | IoT, embedded, growing server |
| Performance/Watt | Good (Intel 7, TSMC 3nm) | Excellent (Apple M4) | Varies by implementation |
Modern CPUs no longer require all transistors on a single monolithic die. Chiplet architectures connect multiple smaller dies via high-speed interconnects:
Chiplets solve a fundamental yield problem: a 500mm² monolithic die has much lower yield than eight 60mm² dies — defects that would fail the whole large die now fail only one small chiplet.
Three architectures dominate modern computing: x86-64 (complex, backward-compatible, dominant in desktops and servers), ARM (lean, power-efficient, dominant in mobile and rapidly gaining in laptops and servers), and RISC-V (open, royalty-free, growing in embedded systems and academic research). All three trace their lineage to fundamental architectural choices made in the 1970s–1990s and have evolved through silicon process node improvements from 10 microns to 3 nanometers. The chiplet revolution is extending Moore's Law-era progress beyond what's achievable on monolithic dies, enabling CPUs with tens of billions of transistors that deliver unprecedented performance at competitive power levels.
Get this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises