Ken Shirriff's blog

Reverse-engineering the ModR/M addressing microcode in the Intel 8086 processor

One interesting aspect of a computer's instruction set is its addressing modes, how the computer determines the address for a memory access. The Intel 8086 (1978) used the ModR/M byte, a special byte following the opcode, to select the addressing mode.1 The ModR/M byte has persisted into the modern x86 architecture, so it's interesting to look at its roots and original implementation.

In this post, I look at the hardware and microcode in the 8086 that implements ModR/M2 and how the 8086 designers fit multiple addressing modes into the 8086's limited microcode ROM. One technique was a hybrid approach that combined generic microcode with hardware logic that filled in the details for a particular instruction. A second technique was modular microcode, with subroutines for various parts of the task.

I've been reverse-engineering the 8086 starting with the silicon die. The die photo below shows the chip under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles bus and memory activity as well as instruction prefetching, while the Execution Unit (EU) executes instructions and microcode. Both units play important roles in memory addressing.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

8086 addressing modes

Let's start with an addition instruction, ADD dst,src, which adds a source value to a destination value and stores the result in the destination.3 What are the source and destination? Memory? Registers? The addressing mode answers this question.

You can use a register as the source and another register as the destination. The instruction below uses the AX register as the destination and the BX register as the source. Thus, it adds BX to AX and puts the result in AX.

ADD AX, BX           Add the contents of the BX register to the AX register

A memory access is indicated with square brackets around the "effective address"4 to access. For instance, [1234] means the memory location with address 1234, while [BP] means the memory location that the BP register points to. For a more complicated addressing mode, [BP+SI+1234] means the memory location is determined by adding the BP and SI registers to the constant 1234 (known as the displacement). On the 8086, you can use memory for the source or the destination, but not both. Here are some examples of using memory as a source:

ADD AX, [1234]       Add the contents of memory location 1234 to AX register
ADD CX, [BP]         Add memory pointed to by BP register to CX register
ADD DX, [BX+SI+1234] Source memory address is BX + SI + constant 1234

Here are examples with memory as the destination:

ADD [1234], AX       Add AX to the contents of memory location 1234
ADD [BP], CX         Add CX to memory pointed to by BP register
ADD [BX+SI+1234], DX Destination memory address is BX + SI + constant 1234

You can also operate on bytes instead of words, using a byte register and accessing a memory byte:

ADD AL, [SI+1234]    Add to the low byte of AX register
ADD AH, [BP+DI+1234] Add to the high byte of AX register

As you can see, the 8086 supports many different addressing schemes. To understand how they are implemented, we must first look at how instructions encode the addressing schemes in the ModR/M byte.

The ModR/M byte

The ModR/M byte follows many opcodes to specify the addressing mode. This byte is fairly complicated but I'll try to explain it in this section. The diagram below shows how the byte is split into three fields:5 mod selects the overall mode, reg selects a register, and r/m selects either a register or memory mode.

mod		reg			r/m
7	6	5	4	3	2	1	0

I'll start with the register-register mode, where the mod bits are 11 and the reg and r/m fields each select one of eight registers, as shown below. The instruction ADD AX,BX would use reg=011 to select BX and r/m=000 to select AX, so the ModR/M byte would be 11011000. (The register assignment depends on whether the instruction operates on words, bytes, or segment registers. For instance, in a word instruction, 001 selects the CX register, while in a byte instruction, 001 selects the CL register, the low byte of CX.)

The register assignments, from MCS-86 Assembly Language Reference Guide.

The next addressing mode specifies a memory argument and a register argument. In this case, the mod bits are 00, the reg field specifies a register as described above, and the r/m field specifies a memory address according to the table below. For example, the instruction ADD [SI],CX would use reg=001 to select CX and r/m=100 to select [SI], so the ModR/M byte would be 00001100.

r/m	Operand Address
000	[BX+SI]
001	[BX+DI]
010	[BP+SI]
011	[BP+DI]
100	[SI]
101	[DI]
110	[BP]
111	[BX]

The next mode, 01, adds an 8-bit signed displacement to the address. This displacement consists of one byte following the ModR/M byte. This supports addressing modes such as [BP+5]. The mode 10 is similar except the displacement is two bytes long, for addressing modes such as [BP+DI+0x1234].

The table below shows the meaning of all 256 values for the ModR/M byte. The mod bits are colored red, the reg bits green, and the r/m bits blue. Note the special case "disp16" to support a 16-bit fixed address.

The ModR/M values. Note that this table would be trivial if it used octal rather than hexadecimal. Based on Table 6-13 in the ASM386 Assembly Language Reference.

The register combinations for memory accesses may seem random but they were designed to support the needs of high-level languages, such as arrays and data structures. The idea is to add a base register, an index register, and/or a fixed displacement to determine the address.6 The base register can indicate the start of an array, the index register holds the offset in the array, and the displacement provides the offset of a field in the array entry. The base register is BX for data or BP for information on the stack. The index registers are SI (Source Index) and DI (Destination Index).7

Some addressing features are handled by the opcode, not the ModR/M byte. For instance, the ModR/M byte doesn't distinguish between ADD AX,[SI] and ADD [SI],AX. Instead, the two variants are distinguished by bit 1 of the instruction, the D or "direction" bit.8 Moreover, many instructions have one opcode that operates on words and another that operates on bytes, distinguished by bit 0 of the opcode, the W or word bit.

The D and W bits are an example of orthogonality in the 8086 instruction set, allowing features to be combined in various combinations. For instance, the addressing modes combine 8 types of offset computation with three sizes of displacements and 8 target registers. Arithmetic instructions combine these addressing modes with eight ALU operations, each of which can act on a byte or a word, with two possible memory directions. All of these combinations are implemented with one block of microcode, implementing a large instruction set with a small amount of microcode. (The orthogonality of the 8086 shouldn't be overstated, though; it has many special cases and things that don't quite fit.)

An overview of 8086 microcode

Most people think of machine instructions as the basic steps that a computer performs. However, many processors (including the 8086) have another layer of software underneath: microcode. With microcode, instead of building the control circuitry from complex logic gates, the control logic is largely replaced with code. To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode.

The 8086 uses a hybrid approach: although it uses microcode, much of the instruction functionality is implemented with gate logic. This approach removed duplication from the microcode and kept the microcode small enough for 1978 technology. In a sense, the microcode is parameterized. For instance, the microcode can specify a generic Arithmetic/Logic Unit (ALU) operation and a generic register. The gate logic examines the instruction to determine which specific operation to perform and the appropriate register.

A micro-instruction in the 8086 is encoded into 21 bits as shown below. Every micro-instruction has a move from a source register to a destination register, each specified with 5 bits. The meaning of the remaining bits depends on the type field. A "short jump" is a conditional jump within the current block of 16 micro-instructions. An ALU operation sets up the arithmetic-logic unit to perform an operation. Bookkeeping operations are anything from flushing the prefetch queue to ending the current instruction. A memory operation triggers a bus cycle to read or write memory. A "long jump" is a conditional jump to any of 16 fixed microcode locations (specified in an external table called the Translation ROM). Finally, a "long call" is a conditional subroutine call to one of 16 locations. For more about 8086 microcode, see my microcode blog post.

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

Some examples of microcode for addressing

In this section, I'll take a close look at a few addressing modes and how they are implemented in microcode. In the next section, I'll summarize all the microcode for addressing modes.

A register-register operation

Let's start by looking at a register-to-register instruction, before we get into the complications of memory accesses: ADD BX,AX which adds AX to BX, storing the result in BX. This instruction has the opcode value 01 and ModR/M value C3 (hex).

Before the microcode starts, the hardware performs some decoding of the opcode. The Group Decode ROM (below) classifies an instruction into multiple categories: this instruction contains a D bit, a W bit, and an ALU operation, and has a ModR/M byte. Fields from the opcode and ModR/M bytes are extracted and stored in various internal registers. The ALU operation type (ADD) is stored in the ALU opr register. From the ModR/M byte, the reg register code (AX) is stored in the N register, and the r/m register code (BX) is stored in the M register. (The M and N registers are internal registers that are invisible to the programmer; each holds a 5-bit register code that specifies a register.9)

This diagram shows the Group Decode ROM. The Group Decode ROM is more of a PLA (programmable logic array) with two layers of NOR gates. Its input lines are at the lower left and its outputs are at the upper right.

Once the preliminary decoding is done, the microcode below for this ALU instruction is executed.10 (There are three micro-instructions, so the instruction takes three clock cycles.) Each micro-instruction contains a move and an action. First, the register specified by M (i.e. BX) is moved to the ALU's temporary A register (tmpA). Meanwhile, the ALU is configured to perform the appropriate operation on tmpA; XI indicates that the ALU operation is specified by the instruction bits, i.e. ADD).

The second instruction moves the register specified by N (i.e. AX) to the ALU's tmpB register. The action NX indicates that this is the next-to-last micro-instruction so the microcode engine can start processing the next machine instruction. The last micro-instruction stores the ALU's result (Σ) in the register indicated by M (i.e. BX). The status flags are updated because of the F. WB,RNI (Run Next Instruction) indicates that this is the end and the microcode engine can process the next machine instruction. The WB prefix would skip the actions if a memory writeback were pending (which is not the case).

  move       action
M → tmpA     XI tmpA   ALU rm↔r: BX to tmpA
N → tmpB     WB,NX      AX to tmpB
Σ → M        WB,RNI F   result to BX, run next instruction.

This microcode packs a lot into three micro-instructions. Note that it is very generic: the microcode doesn't know what ALU operation is being performed or which registers are being used. Instead, the microcode deals with abstract registers and operations, while the hardware fills in the details using bits from the instructions. The same microcode is used for eight different ALU operations. And as we'll see, it supports multiple addressing modes.

Using memory as the destination

Memory operations on the 8086 involve both microcode and hardware. A memory operation uses two internal registers: IND (Indirect) holds the memory address, while OPR (Operand) holds the word that is read or written. A typical memory micro-instruction is R DS,P0, which starts a read from the Data Segment with a "Plus 0" on the IND register afterward. The Bus Interface Unit carries out this operation by adding the segment register to compute the physical address, and then running the memory bus cycles.

With that background, let's look at the instruction ADD [SI],AX, which adds AX to the memory location indexed by SI. As before, the hardware performs some analysis of the instruction (hex 01 04). In the ModR/M byte, mod=00 (memory, no displacement), reg=000 (AX), and R/M=100 ([SI]). The N register is loaded with the code for AX as before. The M register, however, is loaded with OPR (the memory data register) since the Group Decode ROM determines that the instruction has a memory addressing mode.

The microcode below starts in an effective address microcode subroutine for the [SI] mode. The first line of the microcode subroutine computes the effective address simply by loading the tmpA register with SI. It jumps to the micro-routine EAOFFSET which ends up at EALOAD (for reasons that will be described below), which loads the value from memory. Specifically, EALOAD puts the address in IND, reads the value from memory, puts the value into tmpB, and returns from the subroutine.

SI → tmpA   JMP EAOFFSET [SI]: put SI in tmpA
tmpA → IND  R DS,P0      EALOAD: read memory
OPR → tmpB  RTN  
M → tmpA    XI tmpA      ALU rm↔r: OPR to tmpA
N → tmpB    WB,NX         AX to tmpB
Σ → M       WB,RNI F      result to BX, run next instruction.
            W DS,P0 RNI   writes result to memory

Microcode execution continues with the ALU rm↔r routine described above, but with a few differences. The M register indicates OPR, so the value read from memory is put into tmpA. As before, the N register specifies AX, so that register is put into tmpB. In this case, the WB,NX determines that the result will be written back to memory so it skips the NXT operation. The ALU's result (Σ) is stored in OPR as directed by M. The WB,RNI is skipped so microcode execution continues. The W DS,P0 micro-instruction writes the result (in OPR) to the memory address in IND. At this point, RNI terminates the microcode sequence.

A lot is going on here to add two numbers! The main point is that the same microcode runs as in the register case, but the results are different due to the M register and the conditional WB code. By running different subroutines, different effective address computations can be performed.

Using memory as the source

Now let's look at how the microcode uses memory as a source, as in the instruction ADD AX,[SI]. This instruction (hex 03 04) has the same ModR/M byte as before, so the N register holds AX and the M register holds OPR. However, because the opcode has the D bit set, the M and N registers are swapped when accessed. Thus, when the microcode uses M, it gets the value AX from N, and vice versa. (Yes, this is confusing.)

The microcode starts the same as the previous example, reading [SI] into tmpB and returning to the ALU code. However, since the meaning of M and N are reversed, the AX value goes into tmpA while the memory value goes into tmpB. (This switch doesn't matter for addition, but would matter for subtraction.) An important difference is that there is no writeback to memory, so WB,NX starts processing the next machine instruction. In the last micro-instruction, the result is written to M, indicating the AX register. Finally, WB,RNI runs the next machine instruction.

SI → tmpA   JMP EAOFFSET [SI]: put SI in tmpA
tmpA → IND  R DS,P0      EALOAD: read memory
OPR → tmpB  RTN  
M → tmpA    XI tmpA      ALU rm↔r: AX to tmpA
N → tmpB    WB,NX         OPR to tmpB
Σ → M       WB,RNI F      result to AX, run next instruction.

The main point is that the same microcode handles memory as a source and a destination, simply by setting the D bit. First, the D bit reverses the operands by swapping M and N. Second, the WB conditionals prevent the writeback to memory that happened in the previous case.

Using a displacement

The memory addressing modes optionally support a signed displacement of one or two bytes. Let's look at the instruction ADD AX,[SI+0x1234]. In hex, this instruction is 03 84 34 12, where the last two bytes are the displacement, reversed because the 8086 uses little-endian numbers. The mod bits are 10, indicating a 16-bit displacement, but the other bits are the same as in the previous example.

Microcode execution again starts with the [SI] subroutine. However, the jump to EAOFFSET goes to [i] this time, to handle the displacement offset. (I'll explain how, shortly.) This code loads the offset as two bytes from the instruction prefetch queue (Q) into the tmpB register. It adds the offset to the previous address in tmpA and puts the sum Σ in tmpA, computing the effective address. Then it jumps to EAFINISH (EALOAD). From there, the code continues as earlier, reading an argument from memory and computing the sum.

SI → tmpA   JMP EAOFFSET [SI]: put SI in tmpA
Q → tmpBL   JMPS MOD1 12 [i]: load from queue, conditional jump
Q → tmpBH     
Σ → tmpA    JMP EAFINISH 12:
tmpA → IND  R DS,P0      EALOAD: read memory
OPR → tmpB  RTN  
M → tmpA    XI tmpA      ALU rm↔r: AX to tmpA
N → tmpB    WB,NX         OPR to tmpB
Σ → M       WB,RNI F      result to AX, run next instruction.

For the one-byte displacement case, the conditional MOD1 will jump over the fetch of the second displacement byte. When the first byte is loaded into the low byte of tmpB, it was sign-extended into the high byte.14 Thus, the one-byte displacement case uses the same microcode but ends up with a sign-extended 1-byte displacement in tmpB.

The Translation ROM

Now let's take a closer look at the jumps to EAOFFSET, EAFINISH, and the effective address subroutines, which use something called the Translation ROM. The Translation ROM converts the 5-bit jump tag in a micro-instruction into a 13-bit microcode address. It also provides the addresses of the effective address subroutines. As will be seen below, there are some complications.11

The Translation ROM as it appears on the die. The metal layer has been removed to expose the silicon and polysilicon underneath. The left half decodes the inputs to select a row. The right half outputs the corresponding microcode address.

The effective address micro-routines

Register calculations

The Translation ROM has an entry for the addressing mode calculations such as [SI] and [BP+DI], generally indicated by the r/m bits, the three low bits of the ModR/M byte. Each routine computes the effective address and puts it into the ALU's temporary A register and jumps to EAOFFSET, which adds any displacement offset. The microcode below shows the four simplest effective address calculations, which just load the appropriate register into tmpA.

SI → tmpA   JMP EAOFFSET   [SI]: load SI into tmpA
DI → tmpA   JMP EAOFFSET   [DI]: load SI into tmpA
BP → tmpA   JMP EAOFFSET   [BP]: load BP into tmpA
BX → tmpA   JMP EAOFFSET   [BX]: load BX into tmpA

For the cases below, an addition is required, so the registers are loaded into the ALU's temporary A and temporary B registers. The effective address is the sum (indicated by Σ), which is moved to temporary A.12 These routines are carefully arranged in memory so [BX+DI] and [BP+SI] each execute one micro-instruction and then jump into the middle of the other routines, saving code.13

BX → tmpA         [BX+SI]: get regs
SI → tmpB         1:
Σ → tmpA   JMP EAOFFSET  

BP → tmpA         [BP+DI]: get regs
DI → tmpB         4:
Σ → tmpA   JMP EAOFFSET  

BX → tmpA  JMPS 4 [BX+DI]: short jump to 4
BP → tmpA  JMPS 1 [BP+SI]: short jump to 1

The `EAOFFSET` and `EAFINISH` targets

After computing the register portion of the effective address, the routines above jump to EAOFFSET, but this is not a fixed target. Instead, the Translation ROM selects one of three target microcode addresses based on the instruction and the ModR/M byte:
If there's a displacement, the microcode jumps to [i] to add the displacement value.
If there is no displacement but a memory read, the microcode otherwise jumps to EALOAD to load the memory contents.
If there is no displacement and no memory read should take place, the microcode jumps to EADONE.
In other words, the microcode jump is a three-way branch that is implemented by the Translation ROM and is transparent to the microcode.

For a displacement, the [i] immediate code below loads a 1-byte or 2-byte displacement into the tmpB register and adds it to the tmpA register, as described earlier. At the end of a displacement calculation, the microcode jumps to the EAFINISH tag, which is another branching target. Based on the instruction, the Translation ROM selects one of two microcode targets: EALOAD to load from memory, or EADONE to skip the load.

Q → tmpBL   JMPS MOD1 12 [i]: get byte(s)
Q → tmpBH         
Σ → tmpA    JMP EAFINISH 12: add displacement

The EALOAD microcode below reads the value from memory, using the effective address in tmpA. It puts the result in tmpB. The RTN micro-instruction returns to the microcode that implements the original machine instruction.

tmpA → IND  R DS,P0   EALOAD: read from tmpA address
OPR → tmpB  RTN        store result in tmpB, return

The EADONE routine puts the effective address in IND, but it doesn't read from the memory location. This supports machine instructions such as MOV (some moves) and LEA (Load Effective Address) that don't read from memory

tmpA → IND  RTN   EADONE: store effective address in IND

To summarize, the microcode runs different subroutines and different paths, depending on the addressing mode, executing the appropriate code. The Translation ROM selects the appropriate control flow path.

Special cases

There are a couple of special cases in addressing that I will discuss in this section.

Supporting a fixed address

It is common to access a fixed memory address, but the standard addressing modes use a base or index register. The 8086 replaces the mode of [BP] with no displacement with 16-bit fixed addressing. In other words, a ModR/M byte with the pattern 00xxx110 is treated specially. (This special case is the orange disp16 line in the ModR/M table earlier.) This is implemented in the Translation ROM which has additional rows to detect this pattern and execute the immediate word [iw] microcode below instead. This microcode fetches a word from the instruction prefetch queue (Q) into the tmpA register, a byte at a time. It jumps to EAFINISH instead of EAOFFSET because it doesn't make sense to add another displacement.

Q → tmpAL          [iw]: get bytes
Q → tmpAH  JMP EAFINISH

Selecting the segment

Memory accesses in the 8086 are relative to one of the 64-kilobyte segments: Data Segment, Code Segment, Stack Segment, or Extra Segment. Most addressing modes use the Data Segment by default. However, addressing modes that use the BP register use the Stack Segment by default. This is a sensible choice since the BP (Base Pointer) register is intended for accessing values on the stack.

This special case is implemented in the Translation ROM. It has an extra output bit that indicates that the addressing mode should use the Stack Segment. Since the Translation ROM is already decoding the addressing mode to select the right microcode routine, adding one more output bit is straightforward. This bit goes to the segment register selection circuitry, changing the default segment. This circuitry also handles prefixes that change the segment. Thus, segment register selection is handled in hardware without any action by the microcode.

Conclusions

I hope you have enjoyed this tour through the depths of 8086 microcode. The effective address calculation in the 8086 uses a combination of microcode and logic circuitry to implement a variety of addressing methods. Special cases make the addressing modes more useful, but make the circuitry more complicated. This shows the CISC (Complex Instruction Set Computer) philosophy of x86, making the instructions complicated but highly functional. In contrast, the RISC (Reduced Instruction Set Computer) philosophy takes the opposite approach, making the instructions simpler but allowing the processor to run faster. RISC vs. CISC was a big debate of the 1980s, but isn't as relevant nowadays.

People often ask if microcode could be updated on the 8086. Microcode was hardcoded into the ROM, so it could not be changed. This became a big problem for Intel with the famous Pentium floating-point division bug. The Pentium chip turned out to have a bug that resulted in rare but serious errors when dividing. Intel recalled the defective processors in 1994 and replaced them at a cost of $475 million. Starting with the Pentium Pro (1995), microcode could be patched at boot time, a useful feature that persists in modern CPUs.

I've written multiple posts on the 8086 so far and plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

There are additional addressing modes that don't use a ModR/M byte. For instance, immediate instructions use a constant in the instruction. For instance ADD AX,42 adds 42 to the AX register. Other instructions implicitly define the addressing mode. I'm ignoring these instructions for now. ↩
The 8086 supports more addressing modes than the ModR/M byte provides, by using separate opcodes. For instance, arithmetic instructions can take an "immediate" value, an 8- or 16-bit value specified as part of the instruction. Other instructions operate on specific registers rather than memory or access memory through the stack. For this blog post, I'm focusing on the ModR/M modes and ignoring the other instructions. Also, although I'm discussing the 8086, this blog post applies to the Intel 8088 processor as well. The 8088 has an 8-bit bus, a smaller prefetch queue, and some minor internal changes, but for this post you can consider them to be the same. ↩
My assembly code examples are based on Intel ASM86 assembly syntax. There's a completely different format of x86 assembly language known as AT&T syntax. Confusingly, it reverses the source and destination. For example, in AT&T syntax, addw %bx, %cx% stores the result in CX. AT&T syntax is widely used, for instance in Linux code. The AT&T syntax is based on earlier PDP-11 assembly code. ↩
The term "effective address" dates back to the 1950s, when computers moved beyond fixed memory addresses and started using index registers. The earliest uses that I could find are from 1955 for the IBM 650 data processing machine and the IBM 704 mainframe. The "Load Effective Address" instruction, which provides the effective address as a value instead of performing the memory access, was perhaps introduced in the IBM System/360 (1964) under the name "Load Address". It has been a part of many subsequent processors including the 8086. ↩
Note that the ModR/M byte has the bits grouped in threes (as do many instructions). This is due to the octal heritage of the 8086, dating back through the 8080 and the 8008 to the Datapoint 2200 (which used TTL chips to decode groups of three bits). Although the 8086 instruction set is invariably described in hexadecimal, it makes much more sense when viewed in octal. See x86 is an octal machine for details. ↩
The 8086's addressing schemes are reminiscent of the IBM System/360 (1964). In particular, System/360 had a "RX" instruction format that accessed memory through a base register plus an index register plus a displacement, using another register for the other argument. This is very similar to the 8086's base + index + displacement method. The System/360's "RR" (register-register) instruction format accessed two registers, much like the register mode of the ModR/M byte. The details are very different, though, between the two systems. See the IBM System/360 Principles of Operation for more details. ↩
The motivation behind the ModR/M options is discussed in The 8086/8088 Primer by 8086 designer Steve Morse, pages 23-33. ↩
The D bit is usually called the register direction bit, but the designer of the 8086 instruction set calls it the destination field; see The 8086/8088 Primer, Steve Morse, page 28. To summarize:
If the bit is 0, the result is stored into the location indicated by the mod and r/m fields while the register specified by reg is the source.
If the bit is 1, the result is stored into the register indicated by the reg field.

For the W word bit, 0 indicates a byte operation and 1 indicates a word operation.

One curious side-effect of the D bit is that an instruction like ADD AX,BX can be implemented in two ways since both arguments are registers. The reg field can specify AX while the r/m field specifies BX or vice versa, depending on the D bit. Different 8086 assemblers can be "fingerprinted" based on their decisions in these ambiguous cases. ↩
The M and N registers hold a 5-bit code. This code indicates a 16-bit register (e.g. AX or IND), an 8-bit register (e.g. AL), or a special value (e.g. Σ, the ALU result; ZEROS, all zero bits; or F, the flags). The 3-bit register specification is mapped onto the 5-bit code depending on whether the W bit is set (byte or word register), or if the operation is specifying a segment register. ↩
The microcode listings are based on Andrew Jenner's disassembly. I have made some modifications to (hopefully) make it easier to understand. ↩
You can also view the Translation ROM as a PLA (Programmable Logic Array) constructed from two layers of NOR gates. The conditional entries make it seem more like a PLA than a ROM. Technically, it can be considered a ROM since a single row is active at a time. I'm using the name "Translation ROM" because that's what Intel calls it in the patents. ↩
Normally, an ALU operation requires a micro-instruction to specify the desired ALU operation and temporary register. For the address addition, the ALU operation is not explicitly specified because it uses the ALU's default, of adding tmpA and tmpB. The ALU is reset to this default at the beginning of each machine instruction. ↩
A microcode jump takes an extra clock cycle for the microcode address register to get updated. This is why, for instance, [BP+DI] takes 7 clock cycles but [BX+DI] takes 8 clock cycles. Thus, the 8086 implementers took the tradeoff of slowing down some addressing modes by a clock cycle in order to save a few micro-instructions in the small microcode ROM.

This table shows the clock cycles required for effective address calculations. From MCS-86 Assembly Language Reference Guide.

↩
A one-byte signed number can be sign-extended into a two-byte signed number. This is done by copying the top bit (the sign) from the low byte and filling the top byte with that bit. For example, 0x64 is sign-extended to 0x0064 (+100), while 0x9c is sign-extended to 0xff9c (-100). ↩

Reverse-engineering the interrupt circuitry in the Intel 8086 processor

Interrupts have been an important part of computers since the mid-1950s,1 providing a mechanism to interrupt a program's execution. Interrupts allows the computer to handle time-critical tasks such as I/O device operations. In this blog post, I look at the interrupt features in the Intel 8086 (1978) and how they are implemented in silicon, a combination of interesting circuitry and microcode.

I've been reverse-engineering the 8086 starting with the silicon die. The die photo below shows the chip under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins; relevant pins are marked in yellow. I've labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles bus activity, while the Execution Unit (EU) executes instructions and microcode. Both parts are extensively involved in interrupt handling.

Interrupts in the 8086

The idea behind an interrupt is to stop the current flow of execution, run an interrupt handler to perform a task, and then continue execution where it left off. An interrupt is like a subroutine call in some ways; it pushes the current segment register and program counter on the stack and continues at a new address. However, there are a few important differences. First, the address of the interrupt handler is obtained indirectly, through an interrupt vector table. Interrupts are numbered 0 through 255, and each interrupt has an entry in the vector table that gives the address of the code to handle the interrupt. Second, an interrupt pushes the processor flags to the stack, so they can be restored after the interrupt. Finally, an interrupt clears the interrupt and trap flags, blocking more interrupts while handling the interrupt.

The 8086 provides several types of interrupts, some generated by hardware and some generated by software. For hardware interrupts, the INTR pin on the chip generates a maskable interrupt when activated, while the NMI pin on the chip generates a higher-priority non-maskable interrupt.2 Typically, most interrupts use the INTR pin, signaling things such as a timer, keyboard request, real-time clock, or a disk needing service. The NMI interrupt is designed for things such as parity error or an impending power failure, which are so critical they can't be delayed. The 8086 also has a RESET pin that resets the CPU. Although not technically an interrupt, the RESET action has many features in common with interrupts, so I'll discuss it here.

On the software side, the 8086 has multiple types of interrupts generated by different instructions. The INT n instruction creates an interrupt of the specified type (0 to 255). These software interrupts were used in the IBM PC to execute a function in the BIOS, the layer underneath the operating system. These functions could be everything from a floppy disk operation to accessing the printer. The one-byte INT 3 instruction creates a breakpoint interrupt for debugging. The divide instructions generate an interrupt if a divide-by-zero or overflow occurs. The INTO instruction (Interrupt if Overflow) generates an interrupt if the overflow flag is set. To support single-step mode in debuggers, the Trap flag generate an interrupt on every instruction.

The diagram below shows how the vector table is implemented. Each of the 256 interrupt types has an entry holding the address of the interrupt handler (the code segment value and the instruction pointer (program counter) value). In the next section, I'll show below how the microcode loads the vector from the table and switches execution to that interrupt handler.

This diagram shows where the interrupt vectors are stored in memory. From iAPX 86/88 User's Manual, Figure 4-18.

Microcode

Most of the operations in the 8086 are implemented in microcode, a low-level layer of code that sits between the machine code instructions and the chip's hardware. I'll explain a few features of the 8086's microcode that are important for the interrupt code. Each micro-instruction is 21 bits long, as shown below. The first part of the micro-instruction specifies a move between a source register and a destination register; these may be special-purpose internal registers, not just the registers visible to the programmer. The meaning of the remaining bits depends on the type of micro-instruction, but includes jumps within the microcode, ALU (Arithmetic/Logic Unit) operations, and memory operations.

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

For a memory access, microcode issues a memory read or write micro-instruction. Memory accesses use two internal registers: the IND (Indirect) register holds the address in the segment, while the OPR (Operand) register holds the word that is read or written. A micro-instruction such as W SS,P2 writes the OPR register to the memory address specified by the IND register and the segment register (SS indicates the stack segment). The IND register can also be incremented or decremented (P2 indicates "Plus 2").

The 8086's Bus Interface Unit (BIU) handles the memory request in hardware, while the microcode waits. The BIU has an adder to combine the segment address and the offset to obtain the "absolute" address. It also has a constant ROM to increment or decrement the IND register. Memory accesses are complicated in the 8086 and take at least four clock cycles,3 called T1, T2, T3, and T4. An interrupt acknowledge is almost the same as a memory read, except the IAK bit is set in the microcode, causing some behavior changes.

The interaction between microcode and the ALU (Arithmetic/Logic Unit) will also be important. The ALU has three temporary registers that hold the arguments for operations, called temporary A, B, and C. These registers are invisible to the programmer. The first argument for, say, an addition can come from any of the three registers, while the second argument is always from the temporary B register. Performing an ALU operation takes at least two micro-instructions. First, the ALU is configured to perform an operation, for example, ADD or DEC2 (decrement by 2). The result is then read from the ALU, denoted as the Σ register.

Software interrupts

The main microcode for interrupt handling is shown below.4 Each line specifies a move operation and an action, with my comments in green. On entry to INTR interrupt handler, the OPR operand register holds the interrupt type. This chunk of microcode looks up the interrupt handler in the vector table, pushes the status flags onto the stack, and then branches to a routine FARCALL2 to perform a subroutine call the interrupt handler.

       move            action
19d  OPR → tmpBL     SUSP         INTR: OPR to tmpB(low), suspend prefetch
19e  0 → tmpbH       ADD tmpB      0 to tmpB(high), add tmpB to tmpB
19f  Σ → tmpB                      ALU sum to tmpB, add tmpB to tmpB
1a0  Σ → IND         R S0,P2       ALU sum to IND, read from memory, IND+=2
1a1  OPR → tmpB      DEC2 tmpC     memory to tmpB, set up decrement tmpC
1a2  SP → tmpC       R S0,P0       SP to tmpC, read from memory
1a3  OPR → tmpA                    memory to tmpA
1a4  F → OPR         CITF          Flags to OPR, clear interrupt and trap flags
1a5  Σ → IND         W SS,P0       ALU dec to IND, Write to memory
1a6  IND → tmpC      JMP FARCALL2  IND to tmpC, branch to FARCALL2

In more detail, the microcode routine starts at 19d by moving the interrupt number from the OPR register to the low byte of the ALU's temporary B register. The SUSP action suspends instruction prefetching since we'll start executing instructions from a new location. Next, line 19e zeros out the top byte of the temporary B register and tells the ALU to add temporary B to itself. The next micro-instruction puts the ALU result (indicated by Σ) into temporary B, doubling the value.

Line 1a0 calculates another sum (doubling) from the ALU and stores it in the IND register. In other words, the interrupt number has been multiplied by 4, yielding an address into the vector table. The interrupt handle address is read from the vector table: R S0,P2 operation reads from memory, segment 0, and performs a "Plus 2" on the IND register. Line 1a1 puts the result (OPR) into the temporary B register.

Line 1a2 stores the current stack pointer register into temporary C. It also performs a second read to get the handler code segment from the vector table. Line 1a3 stores this in the temporary A register. Line 1a4 puts the flags (F) into the OPR register. It also clears the interrupt and trap flags (CITF), blocking further interrupts.

Line 1a5 puts the ALU result (the decremented stack pointer) into the IND register. (This ALU operation was set up back in line 1a1.) To push the flags on the stack, W SS,P0 writes OPR to the Stack segment and does a "Plus 0" on the IND register. Finally, line 1a6 stores the IND register (the new top-of-stack) into the temporary C register and jumps to the FARCALL2 micro-routine.5

Understanding microcode can be tricky, since it is even more low-level than machine instructions, but hopefully this discussion gives you a feel for it. Everything is broken down into very small steps, even more basic than machine instructions. Microcode is a bit like a jigsaw puzzle, carefully fit together to ensure everything is at the right place at the right time, as efficiently as possible.

Subroutine call microcode: FARCALL2

Next, I'll describe the FARCALL2 microcode. Because of its segment registers, the 8086 has two types of calls (and jumps): a near call is a subroutine call within the same code segment, while a far call is a subroutine call to a different code segment. A far call target is specified with two words: the new code segment register and the new program counter.

The FARCALL2 micro-routine performs a subroutine call to a particular segment and offset. At entry to FARCALL2, the target code segment in temporary A, the offset is in temporary B, and the decremented stack pointer will be provided by the ALU. The microcode below pushes the code segment register to the stack, updates the code segment register with the new value, and then jumps to NEARCALL to finish the subroutine call.

06c  Σ → IND      CORR        FARCALL2: ALU (IND-2) to IND, correct PC
06d  CS → OPR     W SS,M2      CS to OPR, write to memory, IND-=2
06e  tmpA → CS    PASS tmpC    tmpA to CS, ALU passthrough
06f  PC → OPR     JMP NEARCALL PC to OPR, branch to NEARCALL

For a subroutine call, the program counter is saved so execution can resume where it left off. But because of prefetching, the program counter in the 8086 points to the next instruction to fetch, not the next instruction to execute. To fix this, the CORR (Correction) micro-instruction corrects the program counter value by subtracting the length of the prefetch queue. Line 06c also puts the decremented stack location into IND using the ALU decrement operation set up way back at line 1a1.

Line 06d puts the code segment value (CS) into the OPR register and then writes it to the stack segment, performing a "Minus 2" on IND. In other words, the CS register is pushed onto the stack. Line 06e stores the new value (from temporary A) into the CS register. It also sets up the ALU to pass the value of the temporary C register as its result. Finally, line 06f puts the (corrected) program counter into the OPR register and jumps to the NEARCALL routine.

Subroutine call microcode: NEARCALL

The NEARCALL micro-routine does a near subroutine call, updating the program counter but not the segment register. At entry, the target address is in temporary B, the IND register indicates the top of the stack, and OPR holds the program counter.

077  tmpB → PC    FLUSH      NEARCALL: tmpB to PC, restart prefetch
078  IND → tmpC               IND to tmpC
079  Σ → IND                  ALU to IND
07a  Σ → SP       W SS,P0 RNI ALU to SP, write PC to memory, run next instruction

Line 077 puts temporary B into the program counter. The FLUSH operation flushes the stale instructions from the prefetch queue and starts prefetching from the new PC address. Line 078 puts IND (i.e. the new stack pointer value) into temporary C. Line 079 puts this value into the IND register and line 07a puts this value into the SP register. (The ALU was configured at line 06e to pass the temporary C value unmodified.)

Line 07a pushes the PC to the stack by writing OPR (the old program counter) to the stack segment. Finally, RNI (Run Next Instruction) ends this microcode sequence and causes the 8086 to run the next machine instruction, the first instruction of the interrupt handler.

Starting an interrupt

The above microcode handles a generic interrupt. But there's one more piece: setting up the interrupt type for the instruction. For instance, the INT ib machine instruction has the interrupt type in the second byte of the opcode. This machine instruction has the two micro-instructions below. The microcode loads the type from the instruction prefetch queue (Q) and puts it into temporary B and then OPR. Then it jumps to the INTR microcode discussed earlier.

1a8  Q → tmpB             INT ib: load a byte from the queue
1a9  tmpB → OPR  JMP INTR  Put the byte in OPR and jump to INTR

Several instructions require specific interrupt numbers, and the microcode uses a tricky technique to obtain these numbers. The numbers are obtained from a special pseudo-register called CR, which is all zeros except the three low bits come from the microcode address.6 The microcode is carefully arranged in memory so the micro-instruction is at the right address to generate the necessary value. For instance, in the microcode below, entry point INT1 will load the number 1 into OPR, entry point INT2 will load the number 2 into OPR, and INT0 will load 0 into OPR. Each line then jumps to the main INTR interrupt microcode.

198  CR → OPR     JMP INTR      INT1: num to OPR, branch to INTR
199  CR → OPR     JMP INTR      INT2: num to OPR, branch to INTR
...
1a7  CR → OPR     JMP INTR      INT0: num to OPR, branch to INTR

The microcode for the INT 3 and INTO (not to be confused with INT0) machine instructions has some wasted micro-instructions to ensure that the CR → OPR is at the right address. This wastes a couple of cycles and a couple of micro-instructions.7

Return from interrupt

The IRET interrupt is used to return from interrupts. It pops the program counter, code segment register, and flags from the stack, so execution can continue at the point where the interrupt happened. It calls the microcode subroutine FARRET to pop the code segment register and the PC from the stack. (I won't go into FARRET in this post.) Then it pops the flags from the stack, updates the Stack Pointer, and runs the next instruction.

0c8               CALL FARRET IRET: call Far Return
0c9               R SS,P2      read from stack, IND+=2
0ca  OPR → F                   mem to Flags
0cb  IND → SP     RNI          IND to stack pointer, run next instruction

External hardware interrupts

As well as software interrupts, the 8086 has hardware interrupts. The 8086 chip has pins for INTR and NMI; pulling the pin high causes a hardware interrupt. This section discusses the hardware circuitry and the microcode that handles these interrupts.

The interrupt pin circuit

The schematic below shows the input circuitry for the INTR pin; the NMI, RESET, and TEST pins use the same circuit. The function of this circuit is to clean up the input and ensure that it is synchronized with the clock. The chip's INTR pin is connected to a protection diode to drain a negative voltage to ground. Next, the signal goes through three inverters, probably to force a marginal voltage to either 0 or 1. Finally, the signal goes through an edge-triggered flip-flop to synchronize it with the clock. The flip-flop is constructed from two set-reset latches, the first gated by clk' and the second gated by clk. At the output of each stage is a "superbuffer", two transistors that produce a higher-current output than a regular inverter. This flip-flop circuit is unusual for the 8086; most flip-flops and latches are constructed from dynamic logic with pass transistors, which is much more compact. The more complicated circuitry on the INTR input probably protects against metastability and other problems that could occur with poor-quality input signals.

Schematic of the input circuitry for the INTR pin.

The interrupt logic circuitry

The chip has a block of interrupt logic to receive interrupts, handle interrupt priorities, and execute an interrupt at the appropriate time. This circuitry is in the top right part of the chip, on the opposite side of the chip from the interrupt pins. The schematic below shows this circuitry.

The interrupt logic circuitry activates the microcode interrupt code at the appropriate time.

The top chunk of logic latches an NMI (non-maskable interrupt) until it runs or it is cleared by a reset.8 The first flip-flop helps convert an NMI input into a one-clock pulse. The second flip-flop holds the NMI until it runs.

The middle chunk of logic handles traps. If the trap flag is high, the latch will hold the trap request until it can take place. The latch is loaded on First Clock (FC), which indicates the start of a new instruction. The NOR gate blocks the trap if there is an interrupt or NMI, which has higher priority.9

The third chunk of logic schedules the interrupt. Three things can delay an interrupt: an interrupt delay micro-instruction, an instruction that modifies a segment register, or an instruction prefix.10 If not delayed, the interrupt (NMI, trap, or INTR pin) will run at the start of the next instruction (i.e. FC).11 The microcode interrupt code is run for these cases as well as a reset. Note that the reset is not gated by First Clock, but can run at any time.

The interrupt signal from this circuitry loads a hardcoded interrupt address into the microcode address latches, depending on the type of interrupt.12 This happens for an interrupt at First Clock, while a reset can happen any time in the instruction cycle. A trap goes to the INT1 microcode routine described earlier, while an NMI interrupt goes to INT2 microcode routine. The microcode for the INTR interrupt will be discussed in the next section.

The interrupt signal also goes to the Group Decode ROM (via the instruction register), where it blocks regular instruction decoding. Finally, the interrupt signal goes to a circuit called the loader, where it prevents fetching of the next instruction from the prefetch queue.

The external INTR interrupt

The INTR interrupt has some special behavior to communicate with the device that triggered the interrupt: the 8086 performs two bus cycles to acknowledge the interrupt and to obtain the interrupt number from the device. This is implemented with a combination of microcode and the bus interface logic. The bus cycles are similar to memory read cycles, but with some behavior specific to interrupts.

The INTR interrupt has its own microcode, shown below. The first micro-instruction zeros the IND memory address register and then performs a special IAK bus cycle.13 This is similar to a memory read but asserts the INTA interrupt acknowledge line so the device knows that its interrupt has been received. Next, prefetching is suspended. The third line performs a second IAK bus cycle and the external device puts the interrupt number onto the bus. The interrupt number is read into the ORD register, just like a memory read. At this point, the code falls through into the interrupt microcode described previously.

19a  0 → IND   IAK S0,P0  IRQ: 0 to IND, run interrupt bus cycle
19b            SUSP        suspend prefetch
19c            IAK S0,P0   run interrupt bus cycle
19d            ...        The INTR routine discussed earlier

The bus cycle

The diagram below provides timing details of the two interrupt acknowledge bus cycles. Each cycle is similar to a memory read bus cycle, going through the T1 through T4 states, starting with an ALE (Address Latch Enable) signal. The main difference is the interrupt acknowledge bus cycle also raises the INTA (Interrupt Acknowledge) to let the requesting device know that its interrupt has been acknowledged.14 On the second cycle, the device provides an 8-bit type vector that provides the interrupt number. The 8086 also issues a LOCK signal to lock the bus from other uses during this sequence. The point of this is that the 8086 goes through a fairly complex bus sequence when handling a hardware interrupt. The microcode triggers these two bus cycles with the IAK micro-operation, but the bus interface circuitry goes through the various states of the bus cycle in hardware, without involving the microcode.

This diagram shows the interrupt acknowledge sequence. From Intel 8086 datasheet.

The circuitry to control the bus cycle is complicated with many flip-flops and logic gates; the diagram below shows the flip-flops. I plan to write about the bus cycle circuitry in detail later, but for now, I'll give an extremely simplified description. Internally, there is a T0 state before T1 to provide a cycle to set up the bus operation. The bus timing states are controlled by a chain of flip-flops configured like a shift register with additional logic: the output from the T0 flip-flop is connected to the input of the T1 flip-flop and likewise with T2 and T3, forming a chain. A bus cycle is started by putting a 1 into the input of the T0 flip-flop. When the CPU's clock transitions, the flip-flop latches this signal, indicating the (internal) T0 bus state. On the next clock cycle, this 1 signal goes from the T0 flip-flop to the T1 flip-flop, creating the externally-visible T1 state. Likewise, the signal passes to the T2 and T3 flip-flops in sequence, creating the bus cycle. Some special-case logic changes the behavior for an interrupt versus a read.15

The read/write control circuitry on the die with the flip-flops labeled. Metal and polysilicon were removed to show the underlying silicon.

Reset

The reset pin resets the CPU to an initial state. This isn't an interrupt, but much of the circuitry is the same, so I'll describe it for completeness. The reset microcode below initializes the segment registers, program counter, and flags to 0, except the code segment is initialized to 0xffff. Thus, after a reset, instruction execution will start at absolute address 0xffff0. The reset line is also connected to numerous flip-flops and circuits in the 8086 to ensure that they are initialized to the proper state. These initializations happen outside of the microcode.

1e4  0 → DS     SUSP   RESET: 0 to DS, suspend prefetch
1e5  ONES → CS          FFFF to CS
1e6  0 → PC     FLUSH   0 to PC, start prefetch
1e7  0 → F              0 to Flags
1e8  0 → ES             0 to ES
1e9  0 → SS     RNI     0 to SS, run next instruction

A bit of history

The 8086's interrupt system inherits a lot from the Intel 8008 processor. Interrupts were a bit of an afterthought on the 8008 so the interrupt handling was primitive and designed to simplify implementation.17 In particular, an interrupt response acts like an instruction fetch except the interrupting device "jams" an instruction on the bus. To support this, the 8008 provided one-byte RST (Restart) instructions that would call a fixed location. The Intel 8080 improved the 8008, but kept this model of performing an instruction fetch cycle that received a "jammed" instruction for an interrupt. With more pins available, the 8080 added the INTA Interrupt Acknowledge pin.

The approach of "jamming" an instruction onto the bus for an interrupt is rather unusual. Other contemporary microprocessors such as the 6800, 6502, or Intel 8048 used an interrupt vector approach, which is much more standard: an interrupt vector table held pointers to the interrupt service routines.

The 8086 switched to an interrupt vector table, but retained some 8080 interrupt characteristics for backward compatibility. In particular, the 8086 performs a memory cycle very much like an instruction fetch, but instead of an instruction, it receives an interrupt number. The 8086 performs two interrupt ack bus cycles but ignores the first one, which lets the same hardware work with the 8080 and 8086.16

Conclusions

This is another blog post that I expected would be quick and easy, but there's a lot going on in the 8086's interrupt system, both in hardware and microcode. The 8086 has some strange characteristics, such as acknowledging interrupts twice, but these features make more sense when looking at the 8086's history and what it inherited from the 8008 and 8080.

Notes and references

The first computer to provide interrupts was probably the UNIVAC 1103A (1956). The book "Computer Architecture" by Blaauw and Brooks discusses different approaches to interrupts in great detail, pages 418-434. A history of interrupts is in this article. ↩
The maskable interrupt can be blocked in software, while the non-maskable interrupt cannot be blocked. ↩
A typical memory access takes four clock cycles. However, slow memory can add wait states, adding as many clock cycles as necessary. Moreover, accessing a word from an unaligned (odd) address results in two complete bus cycles to access the two bytes, since the bus can only read an aligned word at a time. Thus, memory accesses can take much more than four cycles. ↩
The 8086's microcode was disassembled by Andrew Jenner (link) from my die photos, so the microcode listings are based on his disassembly. ↩
The microcode jumps use a level of indirection because there isn't room in the micro-instruction for the full micro-address. Instead, the micro-instruction has a four-bit tag specifying the desired routine. The Translation ROM holds the corresponding micro-address for the routine, which is loaded into the microcode address register. ↩
The CR transfer source loads the low three bits of the microcode address. Its implementation is almost the same as the ZERO source, which loads zero. A signal zeroes bits 15-3 for both sources. The bottom 3 bits are pulled low for the ZERO source or if the corresponding microcode bit is 0. By the time this load happens, the microcode counter has incremented, so the value is one more than the address with the micro-instruction. Also note that it uses the raw 13-bit microcode address which is 9 bits plus four counter bits. The address decoder converts this to the 9-bit "physical" microcode address that I'm showing. The point is that the 3 lower bits from my microcode listing won't give the right value. ↩
The jump in the microcode is why the one-byte INT 3 instruction takes 52 clocks while the two-byte INT nn instruction takes 51 clocks. You'd expect INT nn to be slower because it has an extra byte to execute, but the microcode layout for INT 3 makes it slower. ↩
There's a subtle difference between the NMI and the INTR interrupts. Once the NMI pin goes high, the interrupt is scheduled, even if the pin goes low. For a regular interrupt, the INTR pin must be high at the start of an instruction. Thus, NMI is latched but INTR is not. ↩
Since the 8086 has multiple interrupt sources, you might wonder how multiple interrupts are handled at the same time. The chip makes sure the interrupts are handled correctly, according to their priorities. The diagram below shows what happens if trapping (single-step) is enabled, a divide causes a divide-by-0 exception, and an external interrupt arrives.

Processing simultaneous interrupts. From iAPX 86/88 User's Manual, Figure 2-31.

↩
The interrupt delay micro-instruction is used for the WAIT machine instruction. I think that a long string of prefix instructions will delay an interrupt (even an NMI) for an arbitrarily long time. ↩
Interrupts usually happen after the end of a machine instruction, rather than interrupting an instruction during execution. There are a couple of exceptions, however, for instructions that can take a very long time to complete (block copies) or could take forever (WAIT). The solution is that the microcode for these instructions checks to see if an interrupt is pending, so the instruction can explicitly stop and the interrupt can be handled. ↩
The microcode address is 13 bits long: a special bit, 8 instruction bits, and four counter bits. For an interrupt, it is set to 1r0000000.00ab, where r indicates a reset and ab indicate an interrupt of various types:
Trap: goes to vector 1, INT1 addr 198 10000000.00
NMI: goes to vector 2, INT2 addr 199 100000000.01 (Bit b is NMI)
INTR: goes to IRQ addr 19a 100000000.10, vector specified by device. Bit a is intr-enable', blocked by NMI.
This logic takes into account the relative priorities of the different interrupts. This address is initialized through a special multiplexer path for interrupts that loads bits directly into the microcode address latch. ↩
The IAK micro-instruction is the same as a read micro-instruction except the IAK (Interrupt Acknowledge) bit is set. This bit controls the differences between a read micro-instruction and an interrupt acknowledge micro-instruction.

The logic that makes the bus cycle an interrupt acknowledge rather than a read is a bit strange. A signal indicates that the cycle is an I/O operation or an interrupt acknowledge. This is determined by the instruction decoder (for I/O) or bit P of the microcode (for an interrupt acknowledge). This signal is used for the S2 status pin. Later, this signal is ANDed with the read/write signal to determine that it is an interrupt acknowledge and not an I/O. This probably optimized signal generation, but it seems inconvenient to merge two signals together and then split them out later. ↩
The 8086 has two different modes, giving its pins different meanings in the different modes. In "Minimum" mode, the control pins have simple functions. In particular, the interrupt is acknowledged using the INTA pin. In "Maximum" mode, the control pins are encoded to provide more state information to a bus controller. In this mode, the interrupt acknowledge state is encoded and signaled over the S2, S1, and S0 state pins. I'm discussing minimum mode; the sequence is the same in maximum mode but uses different pins. ↩
To prevent another device from grabbing the bus during interrupt acknowledgement, the LOCK pin is activated. The hardware for this LOCK signal toggles the internal lock latch on each of the interrupt ALE signals, so the lock is enabled on the first one and disabled on the second. The interrupt ack signal also prevents the address lines from being enabled during the interrupt ack. ↩
An 8086/8088 system will typically use an external chip, the 8259A Programmable Interrupt Controller. This chip extends the interrupt functionality of the 8086 by providing 8 priority-based hardware interrupts.

The Intel 8259A Programmable Interrupt Controller chip was designed to receive interrupt requests from multiple devices, prioritize interrupts, and direct the 8080 or 8086 processor accordingly. When used with the 8080, the interrupt controller chip will issue a CALL instruction to execute the specified interrupt handler. In particular, when the 8080 responds to an interrupt with INTA pulses, the interrupt controller chip will first put a CALL instruction on the bus, and then will respond to the next two INTA pulses with the address. For the 8086, the interrupt controller ignores the first INTA pulse and responds to the second INTA pulse with the 8-bit pointer. The point of this is that for both processors, the interrupt controller freezes its state on the first INTA and sends the interrupt-specific byte on the second INTA. Thus, even though the interrupt controller responds to the 8080 with an instruction and responds to the 8086 with an interrupt code, the underlying timing and logic are mostly the same. ↩
The article Intel Microprocessors: 8008 to 8086 provides some history on interrupts in the 8008. Also see Intel 8008 Microprocessor Oral History Panel, pages 5-6. Most of the 8008's features were inherited from the Datapoint 2200 desktop computer, but the interrupts were not part of the Datapoint 2200. Instead, Intel decided to add interrupt functionality. ↩

Reverse-engineering an electromechanical Central Air Data Computer

Determining the airspeed and altitude of a fighter plane is harder than you'd expect. At slower speeds, pressure measurements can give the altitude, air speed, and other "air data". But as planes approach the speed of sound, complicated equations are needed to accurately compute these values. The Bendix Central Air Data Computer (CADC) solved this problem for military planes such as the F-101 and the F-111 fighters, and the B-58 bomber.1 This electromechanical marvel was crammed full of 1955 technology: gears, cams, synchros, and magnetic amplifiers. In this blog post I look inside the CADC, describe the calculations it performed, and explain how it performed these calculations mechanically.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the complex mechanisms inside. Click this image (or any other) for a larger version.

This analog computer performs calculations using rotating shafts and gears, where the angle of rotation indicates a numeric value. Differential gears perform addition and subtraction, while cams implement functions. The CADC is electromechanical, with magnetic amplifiers providing feedback signals and three-phase synchros providing electrical outputs. It is said to contain 46 synchros, 511 gears, 820 ball bearings, and a total of 2,781 major parts. The photo below shows a closeup of the gears.

A closeup of the complex gears inside the CADC,

What it does

For over a century, aircraft have determined airspeed from air pressure. A port in the side of the plane provides the static air pressure,2 which is the air pressure outside the aircraft. A pitot tube points forward and receives the "total" air pressure, a higher pressure due to the speed of the airplane forcing air into the tube. (In the photo below, you can see the long pitot tube sticking out from the nose of a F-101.) The airspeed can be determined from the ratio of these two pressures, while the altitude can be determined from the static pressure.

The F-101 "Voodoo", USAF photo.

But as you approach the speed of sound, the fluid dynamics of air change and the calculations become very complicated. With the development of supersonic fighter planes in the 1950s, simple mechanical instruments were no longer sufficient. Instead, an analog computer to calculate the "air data" (airspeed, altitude, and so forth) from the pressure measurements. One option would be for each subsystem (instruments, weapons control, engine control, etc) to compute the air data separately. However, it was more efficient to have one central system perform the computation and provide the data electrically to all the subsystems that need it. This system was called a Central Air Data Computer or CADC.

The Bendix CADC has two pneumatic inputs through tubes: the static pressure3 and the total pressure. It also receives the total temperature from a platinum temperature probe. From these, it computes many outputs: true air speed, Mach number, log static pressure, differential pressure, air density, air density × the speed of sound, total temperature, and log true free air temperature.

The CADC implemented a surprisingly complex set of functions derived from fluid dynamics equations describing the behavior of air at various speeds and conditions. First, the Mach number is computed from the ratio of total pressure to static pressure. Different equations are required for subsonic and supersonic flight. Although this equation looks difficult to solve mathematically, fundamentally M is a function of one variable ($P_t / P_s$), and this function is encoded in the shape of a cam. (You are not expected to understand the equations below. They are just to illustrate the complexity of what the CADC does.)

\[M<1:\] \[~~~\frac{P_t}{P_s} = ( 1+.2M^2)^{3.5}\]

\[M > 1:\]

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

Next, the temperature is determined from the Mach number and the temperature indicated by a temperature probe.

\[T = \frac{T_{ti}}{1 + .2 M^2} \]

The indicated airspeed and other outputs are computed in turn, but I won't go through all the equations. Although these equations may seem ad hoc, they can be derived from fluid dynamics principles. These equations were standardized in the 1950s by various government organizations including the National Bureau of Standards and NACA (the precursor of NASA). While the equations are complicated, they can be computed with mechanical means.

How it is implemented

The Air Data Computer is an analog computer that determines various functions of the static pressure, total pressure and temperature. An analog computer was selected for this application because the inputs are analog and the outputs are analog, so it seemed simplest to keep the computations analog and avoid conversions. The computer performs its computations mechanically, using the rotation angle of shafts to indicate values. For the most part, values are represented logarithmically, which allows multiplication and division to be implemented by adding and subtracting rotations. A differential gear mechanism provides the underlying implementation of addition and subtraction. Specially-shaped cams provide the logarithmic and exponential conversions as necessary. Other cams implement various arbitrary functions.

The diagram below, from patent 2,969,210, shows some of the operations. At the left, transducers convert the pressure and temperature inputs from physical quantities into shaft rotations, applying a log function in the process. Subtracting the two pressures with a differential gear mechanism (X-in-circle symbol) produces the log of the pressure ratios. Cam "CCD 12" generates the Mach number from this log pressure ratio, still expressed as a shaft rotation. A synchro transmitter converts the shaft rotation into a three-phase electrical output from the CADC. The remainder of the diagram uses more cams and differentials to produce the other outputs. Next, I'll discuss how these steps are implemented.

A diagram showing how values are computed by the CADC. Source: Patent 2969910.

The pressure transducer

The CADC receives the static and total pressure through tubes connected to the front of the CADC. (At the lower right, one of these tubes is visible.) Inside the CADC, two pressure transducers convert the pressures into rotational signals. The pressure transducers are the black domed cylinders at the top of the CADC.

The pressure transducers are the two black domes at the top. The circuit boards next to each pressure transducer are the amplifiers. The yellowish transformer-like devices with three windings are the magnetic amplifiers.

Each pressure transducer contains a pair of bellows that expand and contract as the applied pressure changes. They are connected to opposite sides of a shaft so they cause small rotations of the shaft.

Inside the pressure transducer. The two disc-shaped bellows are connected to opposite sides of a shaft so the shaft rotates as the bellows expand or contract.

The pressure transducer has a tricky job: it must measure tiny pressure changes, but it must also provide a rotational signal that has enough torque to rotate all the gears in the CADC. To accomplish this, the pressure transducer uses a servo loop. The bellows produce a small shaft motion that is detected by an inductive pickup. This signal is amplified and drives a motor with enough power to move all the gears. The motor is also geared to counteract the movement of the bellows. This creates a feedback loop so the motor's rotation tracks the air pressure, but provides much more force. A cam is used so the output corresponds to the log of the input pressure.

This diagram shows the structure of the transducer. From "Air Data Computer Mechanization."

Each transducer signal is amplified by three circuit boards centered around a magnetic amplifier, a transformer-like amplifier circuit that was popular before high-power transistors came along. The photo below shows how the amplifier boards are packed next to the transducers. The boards are complex, filled with resistors, capacitors, germanium transistors, diodes, relays, and other components.

This end-on view of the CADC shows the pressure transducers, the black cylinders. Next to each pressure transducer is a complex amplifier consisting of multiple boards with transistors and other components. The magnetic amplifiers are the yellowish transformer-like components.

Temperature

The external temperature is an important input to the CADC since it affects the air density. A platinum temperature probe provides a resistance4 that varies with temperature. The resistance is converted to rotation by an electromechanical transducer mechanism. Like the pressure transducer, the temperature transducer uses a servo mechanism with an amplifier and feedback loop. For the temperature transducer, though, the feedback signal is generated by a resistance bridge using a potentiometer driven by the motor. By balancing the potentiometer's resistance with the platinum probe's resistance, a shaft rotation is produced that corresponds to the temperature. The cam is configured to produce the log of the temperature as output.

This diagram shows the structure of the temperature transducer. From "Air Data Computer Mechanization."

The temperature transducer section of the CADC is shown below. The feedback potentiometer is the red cylinder at the lower right. Above it is a metal-plate adjustment cam, which will be discussed below. The CADC is designed in a somewhat modular way, with the temperature section implemented as a removable wedge-shaped unit, the lower two-thirds of the photo. The temperature transducer, like the pressure transducer, has three boards of electronics to implement the feedback amplifier and drive the motor.

The temperature transducer section of the CADC.

The differential

The differential gear assembly is a key component of the CADC's calculations, as it performs addition or subtraction of rotations: the rotation of the output shaft is the sum or difference of the input shafts, depending on the direction of rotation.5 When rotations are expressed logarithmically, addition and subtraction correspond to multiplication and division. This differential is constructed as a spur-gear differential. It has inputs at the top and bottom, while the body of the differential rotates to produce the sum. The two visible gears in the body mesh with the internal input gears, which are not visible. The output is driven by the body through a concentric shaft.

A closeup of a differential mechanism.

The cams

The CADC uses cams to implement various functions. Most importantly, cams perform logarithms and exponentials. Cams also implement more complex functions of one variable such as ${M}/{\sqrt{1 + .2 M^2}}$. The photo below shows a cam (I think exponential) with the follower arm in front. As the cam rotates, the follower moves in and out according to the cam's radius, providing the function value.

A cam inside the CADC implements a function.

The cams are combined with a differential in a clever way to make the cam shape more practical, as shown below.6 The input (23) drives the cam (30) and the differential (37-41). The follower (32) tracks the cam and provides a second input (35) to the differential. The sum from the differential produces the output (26).

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

The warped plate cam

Some functions are implemented by warped metal plates acting as cams. This type of cam can be adjusted by turning the 20 setscrews to change the shape of the plate. A follower rides on the surface of the cam and provides an input to a differential underneath the plate. The differential adds the cam position to the input rotation, producing a modified rotation, as with the solid cam. The pressure transducer, for instance, uses a cam to generate the desired output function from the bellows deflection. By using a cam, the bellows can be designed for good performance without worrying about its deflection function.

A closeup of a warped-plate cam.

The synchro outputs

Most of the outputs from the CADC are synchro signals.7 A synchro is an interesting device that can transmit a rotational position electrically over three wires. In appearance, a synchro is similar to an electric motor, but its internal construction is different, as shown below. In use, two synchros have their stator windings connected together, while the rotor windings are driven with AC. Rotating the shaft of one synchro causes the other to rotate to the same position. I have a video showing synchros in action here.

Cross-section diagram of a synchro showing the rotor and stators.

Internally, a synchro has a moving rotor winding and three fixed stator windings. When AC is applied to the rotor, voltages are developed on the stator windings depending on the position of the rotor. These voltages produce a torque that rotates the synchros to the same position. In other words, the rotor receives power (26 V, 400 Hz in this case), while the three stator wires transmit the position. The diagram below shows how a synchro is represented schematically, with rotor and stator coils.

The schematic symbol for a synchro.

Before digital systems, synchros were very popular for transmitting signals electrically through an aircraft. For instance, a synchro could transmit an altitude reading to a cockpit display or a targeting system. For the CADC, most of the outputs are synchro signals, which convert the rotational values of the CADC to electrical signals. The three stator windings from the synchro inside the CADC are wired to an external synchro that receives the rotation. For improved resolution, many of these outputs use two synchros: a coarse synchro and a fine synchro. The two synchros are typically geared in an 11:1 ratio, so the fine synchro rotates 11 times as fast as the coarse synchro. Over the output range, the coarse synchro may turn 180°, providing the approximate output, while the fine synchro spins multiple times to provide more accuracy.

The front of the CADC has multiple output synchros with anti-backlash springs.

The air data system

The CADC is one of several units in the system, as shown in the block diagram below.8 The outputs of the CADC go to another box called the Air Data Converter, which is the interface between the CADC and the aircraft systems that require the air data values: fire control, engine control, navigation system, cockpit display instruments, and so forth. The motivation for this separation is that different aircraft types have different requirements for signals: the CADC remains the same and only the converter needed to be customized. Some aircraft required "up to 43 outputs including potentiometers, synchros, digitizers, and switches."

This block diagram shows how the Air Data Computer integrates with sensors and other systems. The unlabeled box on the right is the converter. From MIL-C-25653C(USAF).

The CADC was also connected to a cylindrical unit called the "Static pressure and angle of attack compensator." This unit compensates for errors in static pressure measurements due to the shape of the aircraft by producing the "position error correction". Since the compensation factor depended on the specific aircraft type, the compensation was computed outside the Central Air Data Computer, again keeping the CADC generic. This correction factor depends on the Mach number and angle of attack, and was implemented as a three-dimensional cam. The cam's shape (and thus the correction function) was determined empirically, rather than from fundamental equations.

The CADC was wired to other components through five electrical connectors as shown in the photo below.9 At the bottom are the pneumatic connections for static pressure and total pressure. At the upper right is a small elapsed time meter.

The front of the CADC has many mil-spec round connectors.

Conclusions

The Bendix MG-1A Central Air Data Computer is an amazingly complex piece of electromechanical hardware. It's hard to believe that this system of tiny gears was able to perform reliable computations in the hostile environment of a jet plane, subjected to jolts, accelerations, and vibrations. But it was the best way to solve the problem at the time,10 showing the ingenuity of the engineers who developed it.

The CADC inside its case. From the outside, its mechanical marvels are hidden.

I plan to continue reverse-engineering the Bendix CADC and hope to get it operational,11 so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. Until then, you can check out CuriousMarc's video below to see more of the CADC. Thanks to Joe for providing the CADC. Thanks to Nancy Chen for obtaining a hard-to-find document for me.

Notes and references

I haven't found a definitive list of which planes used this CADC. Based on various sources, I believe it was used in the F-86, F-101, F-104, F-105, F-106, and F-111, and the B-58 bomber. ↩
The static air pressure can also be provided by holes in the side of the pitot tube. I couldn't find information indicating exactly how these planes received static pressure. ↩
The CADC also has an input for the "position error correction". This provides a correction factor because the measured static pressure may not exactly match the real static pressure. The problem is that the static pressure is measured from a port on the aircraft. Distortions in the airflow may cause errors in this measurement. A separate box, the "compensator", determines the correction factor based on the angle of attack. ↩
The platinum temperature probe is type MA-1, defined by specification MIL-P-25726. It apparently has a resistance of 50 Ω at 0 °C. ↩
Strictly speaking, the output of the differential is the sum of the inputs divided by two. I'm ignoring the factor of 2 because the gear ratios can easily cancel it out. ↩
Cams are extensively used in the CADC to implement functions of one variable, including exponentiation and logarithms. The straightforward way to use a cam is to read the value of the function off the cam directly, with the radius of the cam at each angle representing the value. This approach encounters a problem when the cam wraps around, since the cam's profile will suddenly jump from one value to another. This poses a problem for the cam follower, which may get stuck on this part of the cam unless there is a smooth transition zone. Another problem is that the cam may have a large range between the minimum and maximum outputs. (Consider an exponential output, for instance.) Scaling the cam to a reasonable size will lose accuracy in the small values. The cam will also have a steep slope for the large values, making it harder to track the profile.

The solution is to record the difference between the input and the output in the cam. A differential then adds the input value to the cam value to produce the desired value. The clever part is that by scaling the input so it matches the output at the start and end of the range, the difference function drops to zero at both ends. Thus, the cam profile matches when the angle wraps around, avoiding the sudden transition. Moreover, the difference between the input and the output is much smaller than the raw output, so the cam values can be more accurate. (This only works because the output functions are increasing functions; this approach wouldn't work for a sine function, for instance.)

This diagram, from Patent 2969910, shows how a cam implements a complex function.

The diagram above shows how this works in practice. The input is $log~ dP/P_s$ and the output is $log~M / \sqrt{1+.2KM^2}$. (This is a function of Mach number used for the temperature computation; K is 1.) The small humped curve at the bottom is the cam correction. Although the input and output functions cover a wide range, the difference that is encoded in the cam is much smaller and drops to zero at both ends. ↩
The US Navy made heavy use of synchros for transmitting signals throughout ships. The synchro diagrams are from two US Navy publications: US Navy Synchros (1944) and Principles of Synchros, Servos, and Gyros (2012). These are good documents if you want to learn more about synchros. The diagram below shows how synchros could be used on a ship.

A Navy diagram illustrating synchros controlling a gun on a battleship.

↩
To summarize the symbols, the outputs are: log T_FAT: true free air temperature (the ambient temperature without friction and compression); log P_s: static pressure; M: Mach number; Q_c: differential pressure; ρ: air density; ρa: air density times the speed of sound; V_t: true airspeed. T_t: total temperature (higher due to compression of the air). Inputs are: T_T: total temperature (higher due to compression of the air). P_ti: indicated total pressure (higher due to velocity); P_si: indicated static pressure; log P_si/P_s: the position error correction from the compensator. The compensator uses input α_i: angle of attack; and produces α_T: true angle of attack; a_T: speed of sound. ↩
The electrical connectors on the CADC have the following functions: J614: outputs to the converter, J601: outputs to the converter, J603: AC power (115 V, 400 Hz), J602: to/from the compensator, and J604: input from the temperature probe. ↩
An interesting manual way to calculate air data was with a circular slide rule, designed for navigation and air data calculation. It gave answers for various combinations of pressure, temperature, Mach number, true airspeed, and so forth. See the MB-2A Air Navigation Computer instructions for details. Also see patent 2528518. I'll also point out that from the late 1800s through the 1940s and on, the term "computer" was used for any sort of device that computed a value, from an adding machine to a slide rule (or even a person). The meaning is very different from the modern usage of "computer". ↩
It was very difficult to find information about the CADC. The official military specification is MIL-C-25653C(USAF). After searching everywhere, I was finally able to get a copy from the Technical Reports & Standards unit of the Library of Congress. The other useful document was in an obscure conference proceedings from 1958: "Air Data Computer Mechanization" (Hazen), Symposium on the USAF Flight Control Data Integration Program, Wright Air Dev Center US Air Force, Feb 3-4, 1958, pp 171-194. ↩

Reverse-engineering the ModR/M addressing microcode in the Intel 8086 processor

8086 addressing modes

The ModR/M byte

An overview of 8086 microcode

Some examples of microcode for addressing

A register-register operation

Using memory as the destination

Using memory as the source

Using a displacement

The Translation ROM

The effective address micro-routines

Register calculations

The EAOFFSET and EAFINISH targets

Special cases

Supporting a fixed address

Selecting the segment

Conclusions

Notes and references

Reverse-engineering the interrupt circuitry in the Intel 8086 processor

Interrupts in the 8086

Microcode

Software interrupts

Subroutine call microcode: FARCALL2

Subroutine call microcode: NEARCALL

Starting an interrupt

Return from interrupt

External hardware interrupts

The interrupt pin circuit

The interrupt logic circuitry

The external INTR interrupt

The bus cycle

Reset

A bit of history

Conclusions

Notes and references

Reverse-engineering an electromechanical Central Air Data Computer

What it does

How it is implemented

The pressure transducer

Temperature

The differential

The cams

The warped plate cam

The synchro outputs

The air data system

Conclusions

Notes and references

The `EAOFFSET` and `EAFINISH` targets