Your keyboard generates an interrupt signal every time a key is pressed — but also periodically even when idle, to report its status. Your network interface card generates an interrupt for every arriving Ethernet frame. Your CPU's local timer fires an interrupt every millisecond to give the scheduler a chance to preempt the current process. On a busy server, the interrupt rate reaches hundreds of thousands per second.

Without a mechanism to handle this efficiently, the CPU would need to constantly poll every device to check for activity — burning cycles asking "anything for me?" thousands of times per second. That approach existed in early computers and was called polling. It worked until device counts and interrupt rates made it untenable.

Interrupts inverted the relationship: instead of the CPU asking devices if they need attention, devices tell the CPU when they do. This asynchronous notification mechanism is the foundation of responsive, efficient I/O. But building an interrupt handling system that is both fast and correct is one of the harder problems in systems programming — and Linux's solution, refined over three decades, is worth understanding in detail.

Interrupt Types

Hardware Interrupts (IRQs)

Hardware Interrupt Requests are asynchronous signals generated by physical devices. They arrive at the CPU with no relationship to what code is currently executing. There are two sub-categories:

Maskable interrupts can be temporarily disabled by the CPU. When the kernel executes cli (Clear Interrupt Flag) on x86-64, the CPU stops accepting maskable interrupts — they remain pending until sti (Set Interrupt Flag) re-enables them. This is used in critical sections of interrupt handlers where re-entrant interrupts would corrupt state.

Non-maskable interrupts (NMI) cannot be disabled. They are reserved for genuinely critical hardware events: uncorrectable memory errors (ECC), hardware watchdog timeouts, and power failure signals. On modern x86 systems, the Machine Check Architecture (MCA) uses NMIs to report hardware errors. The handler for NMIs must be extraordinarily careful — it cannot assume any kernel state is consistent.

Software Interrupts (Traps)

Traps are synchronous — they occur at a specific point in the executing instruction stream. System calls (syscall instruction), breakpoints (int3), and explicit software interrupts (int 0x80, the legacy 32-bit syscall mechanism) are all traps. They behave like interrupts but are deliberately triggered by the running code.

Exceptions

Exceptions are synchronous signals from the CPU itself, generated when the processor cannot complete an instruction normally:

Divide Error (Vector 0): division by zero; the kernel delivers SIGFPE
Page Fault (Vector 14): access to an unmapped or protected page; triggers demand paging or segfault
General Protection Fault (Vector 13): privilege violation; user code attempted a Ring 0 instruction
Invalid Opcode (Vector 6): attempt to execute an unrecognized instruction

Page faults are the most frequent exception in a normal Linux system and are performance-critical — they occur every time a process accesses memory that has been swapped out or has not yet been faulted in (demand paging).

The Interrupt Controller Hardware

APIC Architecture

On modern x86-64 systems, interrupts are managed by the Advanced Programmable Interrupt Controller (APIC) subsystem.

Local APIC: each CPU core has its own Local APIC. It handles per-core interrupts: the local timer (generates scheduling ticks), performance monitoring interrupts, and Inter-Processor Interrupts (IPIs — how one core signals another to flush TLB entries or trigger a task migration).

I/O APIC: one or more I/O APICs sit on the motherboard and receive interrupt lines from devices (keyboard, USB controller, NVMe drive). The I/O APIC's redirection table maps each device interrupt to a specific vector number and routes it to one or more CPU Local APICs. cat /proc/interrupts shows the routing: which IRQ went to which CPU, how many times.

Linux's IRQ balancing daemon (irqbalance) dynamically adjusts I/O APIC routing to distribute interrupt load across cores. On a heavily loaded server, a single-core receiving all network interrupts becomes a bottleneck — irqbalance distributes them.

The Interrupt Descriptor Table (IDT)

The IDT is an array of 256 gate descriptors. Each entry describes how to handle one interrupt vector. The CPU's IDTR register holds the base address and limit of this table.

Vectors 0–31: reserved for CPU exceptions (defined by Intel/AMD architecture)
Vector 2: NMI
Vector 14: Page Fault
Vectors 32–255: available for hardware interrupts and software use

Each IDT entry contains: the handler address, the code segment selector, the privilege level required to trigger this gate (prevents user code from issuing arbitrary software interrupts), and an interrupt stack table index (IST — allows using a separate known-good stack for NMI and double faults).

Linux's Two-Half Interrupt Architecture

Linux divides interrupt processing into two distinct phases. This is the most important design decision in Linux's interrupt handling and the source of its scalability.

Top Half: The Hardirq Context

When a hardware interrupt fires, the CPU jumps to the IDT handler. Linux's common entry code disables the interrupt being processed (to prevent re-entrancy), saves registers, and calls the registered irq_handler_t.

The top half executes in interrupt context — a special state with strict rules:

Cannot sleep (no blocking operations, no memory allocation with GFP_KERNEL)
Cannot be preempted by the scheduler
Must execute with minimal latency

Because other interrupts may be masked during this window, a slow top half increases overall interrupt latency for the entire system. The canonical top half does exactly two things: saves the data the hardware has ready (a network packet buffer, a keyboard scancode), and signals the hardware that the interrupt has been received (the "ACK" — required or the device will not generate future interrupts).

Bottom Half: Deferred Work

All actual processing — parsing network packets, updating disk I/O accounting, running timer callbacks — happens in the bottom half, where normal kernel context rules apply.

Linux has three mechanisms for bottom-half processing:

Softirqs: the oldest and fastest mechanism. There are 10 fixed softirq types, compiled into the kernel. NET_RX_SOFTIRQ processes received network packets. BLOCK_SOFTIRQ handles block I/O completions. TIMER_SOFTIRQ runs expired timers. Softirqs can run on multiple CPUs simultaneously (they are designed for this) and run in ksoftirqd kernel threads or inline after the hardirq handler returns.

Tasklets: built on top of softirqs. Unlike softirqs, a given tasklet only runs on one CPU at a time (serialized), making them easier to write correctly. They are deprecated in newer kernels in favor of workqueues.

Workqueues: run in a kernel thread context (a real process, with a task_struct). Can sleep, can block on I/O, can allocate memory. The system_wq workqueue is the default. Drivers use schedule_work() to queue functions for deferred execution.

The split exists because interrupt latency is a cascading problem: if a top half takes 500µs, interrupts of that type (and often others) are delayed by 500µs systemwide. The two-half design keeps the critical path minimal.

Real-Time Linux and Interrupt Latency

Standard Linux has a maximum interrupt latency of roughly 100µs under load, but spikes to milliseconds are possible when the kernel holds spinlocks or executes in non-preemptible code paths.

The PREEMPT_RT patch (now partially merged into Linux mainline starting with 5.x series) converts most spinlocks to RT-mutexes (which can be preempted), makes hardirq handlers run in kernel threads (allowing preemption), and makes timer interrupts preemptible. This reduces worst-case latency to under 50µs on typical hardware.

PREEMPT_RT is used in:

Professional audio production (JACK audio daemon requires sub-millisecond latency)
Industrial robots (KUKA uses RT Linux)
Surgical systems (DaVinci surgical robot)
Automotive (some Linux-based infotainment systems with safety partitioning)

The cyclictest tool measures scheduling latency by running a high-priority thread that measures the difference between requested and actual wakeup time. A well-configured RT Linux system shows maximum latency under 100µs; a standard kernel may show 1ms+ spikes.

Interrupt Handling Flow Diagram

Interrupt Type Reference Table

Interrupt Type	Source	Maskable?	Handler Context	Linux Mechanism	Typical Latency
Hardware IRQ (maskable)	NIC, keyboard, USB, timer	Yes (`cli`/`sti`)	Hardirq (interrupt context)	`request_irq()`, `request_threaded_irq()`	1–100 µs
NMI	Hardware watchdog, ECC error	No	NMI context (strict)	`register_nmi_handler()`	< 1 µs (priority)
Page Fault (exception)	MMU on bad memory access	N/A (synchronous)	Exception context	`do_page_fault()` → `handle_mm_fault()`	1–100 µs (+ disk I/O if swap)
General Protection Fault	Privilege violation	N/A (synchronous)	Exception context	Deliver `SIGSEGV` to process	< 1 µs
Softirq	Raised by top half	Effectively (disabled per-CPU)	Softirq context (no sleep)	`raise_softirq()`, `ksoftirqd`	10–200 µs
Workqueue	Kernel code	N/A (thread)	Process context (can sleep)	`schedule_work()`	Milliseconds (thread-scheduled)

Key Takeaways

The interrupt subsystem is where hardware asynchrony meets software concurrency — and where the costs of "handling everything" become visible. Every interrupt that fires displaces whatever was executing, invalidates portions of the instruction cache, and forces the CPU to reload state. At 100,000 interrupts per second on a network-heavy server, this overhead is measurable.

The top-half / bottom-half split is the architectural response to this: keep the latency-critical path minimal, defer everything else. The further distinction between softirqs (fast, no sleep), tasklets (serialized), and workqueues (full process context) gives driver authors a spectrum of options matched to their actual needs.

Knowing how to read /proc/interrupts, how to identify interrupt storms with watch -n 1 cat /proc/interrupts, and how to tune IRQ affinity with irqbalance or manual /proc/irq/N/smp_affinity settings is practical knowledge for any serious Linux systems work.

💬 DiscussionPowered by GitHub Discussions

📱

Get this course's notes on Telegram!

Free cheat sheets, summaries & practice exercises

Get Notes Free →

28 minLesson 3 of 16

Course Contents(16 lessons)

▾

Chapter 1: OS Architecture Internals

OS Kernel Architecture: Monolithic vs Microkernel vs Hybrid25 min

System Calls: The Bridge Between User Space and Kernel28 min

Interrupt Handling: Hardware Interrupts to Kernel Handlers28 min

Chapter 2: Process and Thread Internals

Process Internals: task_struct, PCB, and Kernel Data Structures32 min

Thread Models: POSIX Threads and Kernel Thread Implementation30 min

Context Switching: How the CPU Switches Between Tasks25 min

Chapter 3: Scheduling Internals

CPU Scheduling Deep Dive: Real Algorithms in Production35 min

Linux CFS: The Completely Fair Scheduler Explained30 min

Chapter 4: Memory Management Internals

Linux Memory Management: Zones, Buddy System, Slab Allocator32 min

Demand Paging: Page Fault Handling in Linux35 min

Virtual Memory Areas: mmap, Stack, Heap Internals28 min

Chapter 5: File System Internals

VFS Layer: How Linux Abstracts File Systems30 min

ext4 Internals: Inodes, Extents, and Journaling35 min

Chapter 6: Synchronization and Security

Kernel Synchronization: Spinlocks, Mutexes, RCU32 min

OS Security: Capabilities, Namespaces, cgroups, SELinux28 min

Chapter 7: Final Project

Final Project: OS Internals Analysis and Simulation45 min