AiTechWorlds
AiTechWorlds
Before you ever touched a calculator, a teacher showed you how to add two numbers on paper:
47
+ 35
────
82
You added the ones column: 7 + 5 = 12. You wrote the 2, carried the 1 to the tens column. Then 4 + 3 + 1 (carry) = 8. Done.
You followed a procedure — a physical algorithm carried out by your hand and brain. A half adder performs exactly this carry-and-sum operation in hardware, using just two logic gates. A full adder handles the carry-in too. And a chain of full adders is the addition circuit inside every processor running on the planet right now.
Combinational circuits are circuits whose outputs depend only on the current inputs — no memory, no history. Build them right and they compute instantly (well, within a few hundred picoseconds).
Before a CPU can execute 5 + 3, that addition must be physically realized in silicon. Combinational circuits are the computational fabric of a processor:
if (a > b)Understanding these building blocks shows you how abstract arithmetic becomes physical electron flow.
A half adder adds two single bits (A and B) and produces:
| A | B | Sum | Carry |
|---|---|---|---|
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 1 | 0 | 1 | 0 |
| 1 | 1 | 0 | 1 |
Looking at the truth table: Sum = A XOR B, Carry = A AND B.
Two gates. That's it. A half adder is an XOR gate and an AND gate working in parallel on the same inputs.
Why "half"? It can only add two bits, not three. When chaining multiple adder stages, each stage must also accept the carry from the previous stage — requiring a full adder.
A full adder adds A + B + Carry-In and produces Sum + Carry-Out.
| A | B | Cin | Sum | Cout |
|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 |
| 1 | 0 | 0 | 1 | 0 |
| 1 | 0 | 1 | 0 | 1 |
| 1 | 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 | 1 |
Implementation: Two half adders chained together plus an OR gate:
S1 = A XOR B, C1 = A AND BSum = S1 XOR Cin, C2 = S1 AND CinCout = C1 OR C2Total: 5 gates (2 XOR, 2 AND, 1 OR).
Chain N full adders together and you have an N-bit adder. The carry-out of each stage becomes the carry-in of the next — it ripples forward.
A 4-bit ripple carry adder adds A[3:0] + B[3:0]:
FA[0]: A0, B0, Cin=0 → S0, C0
FA[1]: A1, B1, Cin=C0 → S1, C1
FA[2]: A2, B2, Cin=C1 → S2, C2
FA[3]: A3, B3, Cin=C2 → S3, Cout
The problem: The final result isn't valid until the carry ripples through all stages. For a 32-bit adder, that's 32 × (full adder delay) — too slow for high-performance CPUs operating at 5+ GHz.
The CLA solves the ripple delay by pre-computing carries in parallel.
For each bit position, define:
G_i = A_i AND B_i — this stage will always generate a carryP_i = A_i XOR B_i — this stage will pass a carry through if receivedThen carries can be computed directly:
C1 = G0 + P0·C0C2 = G1 + P1·G0 + P1·P0·C0C3 = G2 + P2·G1 + P2·P1·G0 + P2·P1·P0·C0All carries are now computed in 2 gate delays regardless of bit width. Intel's x86 ALUs use CLA structures (organized in 4-bit blocks) to keep addition fast.
A multiplexer selects one of N data inputs and forwards it to the output, based on select control lines.
A 2:1 MUX (2 inputs, 1 output, 1 select line):
Y = (A AND NOT S) OR (B AND S)
| S | Y |
|---|---|
| 0 | A |
| 1 | B |
A 4:1 MUX needs 2 select lines (2² = 4 combinations). An 8:1 MUX needs 3 select lines.
CPU use: Multiplexers sit everywhere in the datapath — selecting between the ALU output and memory data, choosing which register to read, routing operands to functional units.
A DEMUX does the opposite of a MUX: takes one input and routes it to one of N outputs based on select lines.
A 1:4 DEMUX (1 input, 4 outputs, 2 select lines):
Use: Routing a shared bus to multiple devices.
A 2-to-4 decoder takes a 2-bit binary input and activates exactly one of four outputs:
| A | B | Y0 | Y1 | Y2 | Y3 |
|---|---|---|---|---|---|
| 0 | 0 | 1 | 0 | 0 | 0 |
| 0 | 1 | 0 | 1 | 0 | 0 |
| 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 1 | 0 | 0 | 0 | 1 |
A 3-to-8 decoder decodes 3 bits into 8 outputs (used in RAM address decoding — 3 address bits select 1 of 8 memory banks).
CPU use: The instruction decoder reads the opcode field (e.g., 6 bits → 64 possible instructions) and activates the corresponding control signal.
The inverse of a decoder. Priority encoder: if multiple inputs are active, the highest-priority input wins.
Use: Interrupt controllers — when multiple devices request CPU attention simultaneously, the priority encoder determines which gets serviced first.
A 1-bit comparator checks equality: A XNOR B (output 1 if equal).
An N-bit comparator cascades 1-bit equality checks. For A > B:
Flags produced: Equal (=), Greater Than (>), Less Than (<)
CPU use: Branch instructions (if a > b goto label) use comparator outputs to decide whether to jump.
| Circuit | Inputs | Outputs | Core Function | Real CPU Use |
|---|---|---|---|---|
| Half Adder | A, B | Sum, Carry | Adds 2 bits | Building block for full adder |
| Full Adder | A, B, Cin | Sum, Cout | Adds 3 bits | ALU integer addition |
| Ripple Carry Adder | A[n], B[n], Cin | S[n], Cout | N-bit addition (slow) | Simple embedded CPUs |
| Carry Lookahead Adder | A[n], B[n], Cin | S[n], Cout | N-bit addition (fast) | Intel/AMD x86 ALU |
| Multiplexer (MUX) | D[n], Select | Y | Route one of N inputs | Datapath routing, register selection |
| Demultiplexer (DEMUX) | D, Select | Y[n] | Route input to one of N outputs | Bus-to-device routing |
| Decoder | A[n] | Y[2^n] | Binary → one-hot | Instruction decode, memory select |
| Encoder | Y[2^n] | A[n] | One-hot → binary | Interrupt priority |
| Comparator | A[n], B[n] | EQ, GT, LT | Numerical comparison | Branch decisions, sorting |
==, >, < at the hardware level — used by every conditional branch instructionGet this course's notes on Telegram!
Free cheat sheets, summaries & practice exercises