Intel 8086
The Intel 8086 is a landmark 16‑bit microprocessor introduced by Intel in 1978. It was the first in the x86 family and established many architectural conventions that continue to influence modern personal computing.
With a hybrid internal design—featuring 16‑bit registers and a 16‑bit arithmetic logic unit (ALU) paired with a 20‑bit external address bus—the 8086 could directly address 1 megabyte of memory, a significant leap over its 8‑bit predecessors.
Although there were definitely other CPUs in use in the 1980s, the vast majority of microcomputers people had at home or at the office used either a MOS 6502 (or one of its variants), a Zilog Z80, an early member of the Intel 8086 family, or a Motorola 68000.
Contents
History
After the release of the Intel 8080 CPU, Intel began working on the iAPX 432 project. It was an ambitious 32‑bit design—aimed at supporting advanced, high‑level programming features in hardware—which took several years and a large team to develop, partly because it awaited further improvements in chip density per Moore’s Law.
Meanwhile, to quickly counter the competition, Intel rushed a simpler, lower‑risk design: the 8086. This chip, developed as an incremental evolution of the 8080 and managed by a separate team, was ready for mass market in 1978.
The chip’s design was partly influenced by the need to maintain some backward compatibility with 8‑bit software while also providing a richer instruction set for high‑level languages such as Pascal and PL/M.
Although the IBM PC later used the nearly identical 8088 (which featured an 8‑bit external data bus for cost savings), the 8086 itself became the architectural blueprint for the x86 family, directly influencing later processors.
As for the iAPX 432, it turned out to be a commercial failure, and was discontinued in 1986. Intel then tried to venture into RISC CPUs in the late 1980s with the i860 and i960. But it was ultimately unsuccessful.
Architecture
Most sources claim that the 8086 has about 29,000 transistors. But actually, it has only 19,618 transistors. Source
To put it into perspective, 64KB of DRAM contains 524,288 transistors, as 1 bit of DRAM needs 1 transistor.
Fun fact: The original IBM PC came with 16KB of memory. Source
Microcode
Whereas the Z80 and the 6502 CPUs use a Decode ROM (PLA), the 8086 uses microcode instead.
To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode. In other words, microcode forms another layer between the machine instructions and the hardware.
The 8086's microcode ROM holds 512 micro-instructions, each 21 bits wide. The microcode engine is assisted by two smaller ROMs: the "Group Decode ROM" to categorize machine instructions, and the "Translation ROM" to branch to microcode subroutines for address calculation and other roles.
See: Group Decode ROM viewer How the 8086 processor's microcode engine works 8086 microcode disassembled
Reverse-engineering the: multiplication algorithm division microcode string operations conditional jumps register codes ModR/M addressing microcode instructions length flags circuitry interrupt circuitry HALT circuitry ALU circuitry in the Intel 8086 processor
Block Diagrams
Internally, the 8086 features a 16‑bit Execution Unit (EU) that performs arithmetic, logic, and control functions, while simultaneously a separate Bus Interface Unit (BIU) handles all data transfers and external communications.
The BIU includes a 6‑byte prefetch queue (4-byte for 8088). The EU fetches instructions from the prefetch queue (not directly from memory). It has no direct connection to the external system bus, relying entirely on the BIU for data and instruction access.
Since EU and BIU are independent, whenever the EU starts decoding and executing fetched instructions, the BIU actively fetches additional instruction bytes to keep the queue filled.
Only the BIU differs between the 8088 and 8086. As the EU is the same for both, the programming instructions are exactly the same for each. Programs written for the 8086 can be run on the 8088 without any changes.
See: The 8086 processor's microcode pipeline from die analysis Intel 8088 processor's instruction prefetch circuitry Inside the Intel 8088 processor's bus interface state machine
Memory Segmentation
To overcome the 16‑bit limitation of its registers while still addressing 1 MB of memory, the 8086 employs a segmented memory model.
In this scheme, the BIU forms memory addresses by shifting a 16‑bit segment register four bits to the left and then adding a 16‑bit offset. This results in a 20‑bit physical address.
Although this model can be seen as complex, it allowed small programs (fitting within a 64KB segment) to be loaded at a fixed offset, simplifying relocation in many cases.
See: Reverse-engineering the 8086 processor's address and data pin circuits
Register File
Register | Size | Description | Notes |
---|---|---|---|
AX (Accumulator) | 16-bit | Primary register for arithmetic, logic, I/O. | Can be accessed as two 8-bit registers: AH (High) and AL (Low). Often an implied operand. |
BX (Base) | 16-bit | General-purpose, often used as a base pointer for memory addressing. | Can be accessed as BH and BL. Only GP register usable as an offset in memory addressing (e.g., `[BX]`). |
CX (Count) | 16-bit | General-purpose, often used as a loop counter (`LOOP` instruction) and for string operations (`REP` prefixes). | Can be accessed as CH and CL. |
DX (Data) | 16-bit | General-purpose, used for I/O port addressing (`IN`, `OUT`), and holds high word in 16x16 multiplication / 32/16 division. | Can be accessed as DH and DL. |
SP (Stack Pointer) | 16-bit | Points to the top of the current stack (offset within SS). | Used implicitly by `PUSH`, `POP`, `CALL`, `RET`, interrupts. |
BP (Base Pointer) | 16-bit | Points to data within the stack segment (offset within SS). | Often used to access function parameters and local variables on the stack. |
SI (Source Index) | 16-bit | Used as a source pointer offset (usually within DS) for string operations. Can be used as a general-purpose index register. | Default segment is DS, can be overridden. |
DI (Destination Index) | 16-bit | Used as a destination pointer offset (usually within ES) for string operations. Can be used as a general-purpose index register. | Default segment is ES for string ops, can be overridden. |
IP (Instruction Pointer) | 16-bit | Holds the offset address of the next instruction to be executed within the current Code Segment (CS). | Analogous to Program Counter (PC). Cannot be directly manipulated by most instructions (modified by jumps, calls, etc.). Physical address = (CS * 16) + IP. |
FLAGS | 16-bit | Contains status and control flags: Status Flags: * bit 0 - CF (Carry Flag) * bit 2 - PF (Parity Flag) * bit 4 - AF (Auxiliary Carry Flag) * bit 6 - ZF (Zero Flag) * bit 7 - SF (Sign Flag) * bit 11 - OF (Overflow Flag) Control Flags: * bit 8 - TF (Trap Flag) * bit 9 - IF (Interrupt Enable Flag) * bit 10 - DF (Direction Flag) (Other bits are undefined/reserved in 8086) |
AF used for BCD arithmetic. DF controls string op direction (inc/dec SI/DI). TF enables single-stepping. IF enables maskable interrupts. |
CS (Code Segment) | 16-bit | Points to the base address of the current code segment. | Used with IP to find the next instruction. |
DS (Data Segment) | 16-bit | Points to the base address of the current data segment. | Default segment for most data access. |
SS (Stack Segment) | 16-bit | Points to the base address of the current stack segment. | Used with SP and BP. |
ES (Extra Segment) | 16-bit | Points to the base address of an extra data segment. | Often used as the destination segment for string operations (with DI). |
Instruction Set
As a complex instruction set computer (CISC), the 8086 supports a rich array of operations, including multiple addressing modes such as register, immediate, and memory addressing.
The 8086's instruction set was designed with a new concept, the "ModR/M" byte, which usually follows the opcode byte. The ModR/M byte specifies the memory addressing mode and the register (or registers) to use, allowing that information to be moved out of the opcode.
Although most operations execute on 16‑bit operands, the chip allows manipulation of 8‑bit data as well—an important feature for compatibility with legacy 8‑bit software.
See: Complete 8086 instruction set Tracing the roots of the 8086 instruction set to the Datapoint 2200 minicomputer
Mnemonic | Description | Operation | Flags Affected | |||||
---|---|---|---|---|---|---|---|---|
OF | SF | ZF | AF | PF | CF | |||
AAA | ASCII Adjust After Addition | Adjust AL after BCD addition | U | U | U | * | U | * |
AAD | ASCII Adjust Before Division | Adjust AX before BCD division | U | * | * | U | * | U |
AAM | ASCII Adjust After Multiply | Adjust AX after BCD multiplication | U | * | * | U | * | U |
AAS | ASCII Adjust After Subtraction | Adjust AL after BCD subtraction | U | U | U | * | U | * |
ADC | Add with Carry | Destination + Source + CF → Destination | * | * | * | * | * | * |
ADD | Add | Destination + Source → Destination | * | * | * | * | * | * |
AND | Logical AND | Destination ∧ Source → Destination | 0 | * | * | U | * | 0 |
CALL | Call Procedure | Push IP (and CS); Target → IP (and CS) | – | – | – | – | – | – |
CBW | Convert Byte to Word | Sign extend AL into AH | – | – | – | – | – | – |
CLC | Clear Carry Flag | 0 → CF | – | – | – | – | – | 0 |
CLD | Clear Direction Flag | 0 → DF | – | – | – | – | – | – |
CLI | Clear Interrupt Flag | 0 → IF | – | – | – | – | – | – |
CMC | Complement Carry Flag | ¬CF → CF | – | – | – | – | – | * |
CMP | Compare | Destination - Source (Flags set, result discarded) | * | * | * | * | * | * |
CMPSB | Compare String Byte | Compare byte [DS:SI] with [ES:DI]; Update SI, DI | * | * | * | * | * | * |
CMPSW | Compare String Word | Compare word [DS:SI] with [ES:DI]; Update SI, DI | * | * | * | * | * | * |
CWD | Convert Word to Double Word | Sign extend AX into DX:AX | – | – | – | – | – | – |
DAA | Decimal Adjust After Addition | Adjust AL after packed BCD addition | U | * | * | * | * | * |
DAS | Decimal Adjust After Subtraction | Adjust AL after packed BCD subtraction | U | * | * | * | * | * |
DEC | Decrement by 1 | Destination - 1 → Destination | * | * | * | * | * | – |
DIV | Unsigned Divide | AX / Src(Byte) → AL (Q), AH (R) DX:AX / Src(Word) → AX (Q), DX (R) |
U | U | U | U | U | U |
ESC | Escape (to coprocessor) | Used for floating-point/coprocessor instructions | – | – | – | – | – | – |
HLT | Halt | Halt processor until interrupt or reset | – | – | – | – | – | – |
IDIV | Signed Divide | AX / Src(Byte) → AL (Q), AH (R) DX:AX / Src(Word) → AX (Q), DX (R) |
U | U | U | U | U | U |
IMUL | Signed Multiply | AL * Src(Byte) → AX AX * Src(Word) → DX:AX |
* | U | U | U | U | * |
IN | Input from Port | Port → AL or AX | – | – | – | – | – | – |
INC | Increment by 1 | Destination + 1 → Destination | * | * | * | * | * | – |
INT | Interrupt | Push Flags, CS, IP; Vector → CS:IP | Clears TF, IF | – | – | – | – | – |
INTO | Interrupt on Overflow | If OF=1 then INT 4 | Clears TF, IF if trap | – | – | – | – | – |
IRET | Interrupt Return | Pop IP, CS, Flags | * | * | * | * | * | * |
Jcc | Conditional Jump (e.g., JE, JNE, JG...) | If condition is met then IP + disp → IP | – | – | – | – | – | – |
JMP | Unconditional Jump | Target → IP (and possibly CS) | – | – | – | – | – | – |
LAHF | Load AH from Flags | Low byte of Flags → AH | – | – | – | – | – | – |
LDS | Load Pointer using DS | mem → reg; mem+2 → DS | – | – | – | – | – | – |
LEA | Load Effective Address | Effective Address of Source → Destination Register | – | – | – | – | – | – |
LES | Load Pointer using ES | mem → reg; mem+2 → ES | – | – | – | – | – | – |
LOCK | Lock Bus Prefix | Assert LOCK# signal during next instruction | – | – | – | – | – | – |
LODSB | Load String Byte | [DS:SI] → AL; Update SI | – | – | – | – | – | – |
LODSW | Load String Word | [DS:SI] → AX; Update SI | – | – | – | – | – | – |
LOOP | Loop | CX - 1 → CX; If CX ≠ 0 then Jump | – | – | – | – | – | – |
LOOPE / LOOPZ | Loop while Equal / Zero | CX - 1 → CX; If CX ≠ 0 and ZF=1 then Jump | – | – | – | – | – | – |
LOOPNE / LOOPNZ | Loop while Not Equal / Not Zero | CX - 1 → CX; If CX ≠ 0 and ZF=0 then Jump | – | – | – | – | – | – |
MOV | Move | Source → Destination | – | – | – | – | – | – |
MOVSB | Move String Byte | Move byte [DS:SI] to [ES:DI]; Update SI, DI | – | – | – | – | – | – |
MOVSW | Move String Word | Move word [DS:SI] to [ES:DI]; Update SI, DI | – | – | – | – | – | – |
MUL | Unsigned Multiply | AL * Src(Byte) → AX AX * Src(Word) → DX:AX |
* | U | U | U | U | * |
NEG | Negate (Two's Complement) | 0 - Destination → Destination | * | * | * | * | * | * |
NOP | No Operation | No operation | – | – | – | – | – | – |
NOT | Logical NOT (One's Complement) | ¬Destination → Destination | – | – | – | – | – | – |
OR | Logical OR | Destination ∨ Source → Destination | 0 | * | * | U | * | 0 |
OUT | Output to Port | AL or AX → Port | – | – | – | – | – | – |
POP | Pop Word from Stack | [SS:SP] → Destination; SP + 2 → SP | – | – | – | – | – | – |
POPF | Pop Flags from Stack | [SS:SP] → Flags; SP + 2 → SP | * | * | * | * | * | * |
PUSH | Push Word onto Stack | SP - 2 → SP; Source → [SS:SP] | – | – | – | – | – | – |
PUSHF | Push Flags onto Stack | SP - 2 → SP; Flags → [SS:SP] | – | – | – | – | – | – |
RCL | Rotate Left through Carry | Rotate Destination left, CF fills LSB, MSB fills CF | * | – | – | – | – | * |
RCR | Rotate Right through Carry | Rotate Destination right, CF fills MSB, LSB fills CF | * | – | – | – | – | * |
REP | String Repeat Prefix | Repeat following string op while CX ≠ 0 | – | – | – | – | – | – |
REPE / REPZ | Repeat While Equal / Zero Prefix | Repeat following string op while CX ≠ 0 and ZF=1 | – | – | – | – | – | – |
REPNE / REPNZ | Repeat While Not Equal / Not Zero Prefix | Repeat following string op while CX ≠ 0 and ZF=0 | – | – | – | – | – | – |
RET | Return from Procedure | Pop IP (and CS) from stack | – | – | – | – | – | – |
ROL | Rotate Left | Rotate Destination left, MSB fills LSB and CF | * | – | – | – | – | * |
ROR | Rotate Right | Rotate Destination right, LSB fills MSB and CF | * | – | – | – | – | * |
SAHF | Store AH into Flags | AH → Low byte of Flags | * | * | * | * | * | * |
SAL / SHL | Shift Arithmetic/Logical Left | Shift Destination left, 0 fills LSB, MSB fills CF | * | * | * | U | * | * |
SAR | Shift Arithmetic Right | Shift Destination right, MSB preserved, LSB fills CF | * | * | * | U | * | * |
SBB | Subtract with Borrow | Destination - Source - CF → Destination | * | * | * | * | * | * |
SCASB | Scan String Byte | Compare AL with byte [ES:DI]; Update DI | * | * | * | * | * | * |
SCASW | Scan String Word | Compare AX with word [ES:DI]; Update DI | * | * | * | * | * | * |
SHR | Shift Logical Right | Shift Destination right, 0 fills MSB, LSB fills CF | * | 0 | * | U | * | * |
STC | Set Carry Flag | 1 → CF | – | – | – | – | – | 1 |
STD | Set Direction Flag | 1 → DF | – | – | – | – | – | – |
STI | Set Interrupt Flag | 1 → IF | – | – | – | – | – | – |
STOSB | Store String Byte | AL → [ES:DI]; Update DI | – | – | – | – | – | – |
STOSW | Store String Word | AX → [ES:DI]; Update DI | – | – | – | – | – | – |
SUB | Subtract | Destination - Source → Destination | * | * | * | * | * | * |
TEST | Logical Compare (AND) | Destination ∧ Source (Flags set, result discarded) | 0 | * | * | U | * | 0 |
WAIT | Wait | Wait for TEST# pin active (for coprocessor sync) | – | – | – | – | – | – |
XCHG | Exchange | Source ↔ Destination | – | – | – | – | – | – |
XLAT / XLATB | Translate Byte | AL → [DS:BX + AL] | – | – | – | – | – | – |
XOR | Logical Exclusive OR | Destination ⊕ Source → Destination | 0 | * | * | U | * | 0 |
Note: Some instructions like LOOPE and LOOPZ are mnemonics for the same opcode. They are provided to match different programming contexts: LOOPE when thinking in terms of equality (e.g., a comparison was equal), LOOPZ when thinking in terms of zero (e.g., result was zero).
Secret Instruction
The secret instruction is SALC (Set AL register to Carry). Its opcode is 0xD6. Intel put this in all its x86 processors but didn't document it, using it as a trap. If a manufacturer cloned an Intel processor, the presence of the SALC instruction would prove that the clone stole Intel's microcode.
Intel sued NEC for making 8086 clones, claiming that NEC ripped off Intel's microcode. NEC claimed they wrote their own microcode. NEC's chip didn't have the secret SALC instruction and Intel lost the case.
See: Undocumented 8086 instructions, explained by the microcode
8087 Floating Point Unit
Intel introduced the 8087 chip in 1980 to improve floating-point performance on 8086/8088 computers.
Since early microprocessors were designed to operate on integers, arithmetic on floating point numbers was slow, and transcendental operations such as trig or logarithms were even worse. But the 8087 co-processor greatly improved floating point speed, up to 100 times faster.
The benefits of floating point hardware are so great that Intel started integrating the floating-point unit into the processor with the 80486DX in 1989.
See: Inside the die High-density ROM Extracting ROM constants Fast bit shifter 8087 FPU reverse engineered
Links
- Intel 8086 at the English-language Wikipedia
- Intel 386 oral history panel
- Learn Assembly Programming with ChibiAliens Multi-platform 8086 tutorial
- Intel Microprocessors Practical Reference
- 8086 Programmer's Pocket Reference Guide
- Micro Chart 8086 CPU reference card
- Pin diagram of 8086
- Tom Harte's SingleStepTests