Reverse-engineering the interrupt circuitry in the Intel 8086 processor

Interrupts have been an important part of computers since the mid-1950s,1 providing a mechanism to interrupt a program's execution. Interrupts allows the computer to handle time-critical tasks such as I/O device operations. In this blog post, I look at the interrupt features in the Intel 8086 (1978) and how they are implemented in silicon, a combination of interesting circuitry and microcode.

I've been reverse-engineering the 8086 starting with the silicon die. The die photo below shows the chip under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins; relevant pins are marked in yellow. I've labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles bus activity, while the Execution Unit (EU) executes instructions and microcode. Both parts are extensively involved in interrupt handling.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Interrupts in the 8086

The idea behind an interrupt is to stop the current flow of execution, run an interrupt handler to perform a task, and then continue execution where it left off. An interrupt is like a subroutine call in some ways; it pushes the current segment register and program counter on the stack and continues at a new address. However, there are a few important differences. First, the address of the interrupt handler is obtained indirectly, through an interrupt vector table. Interrupts are numbered 0 through 255, and each interrupt has an entry in the vector table that gives the address of the code to handle the interrupt. Second, an interrupt pushes the processor flags to the stack, so they can be restored after the interrupt. Finally, an interrupt clears the interrupt and trap flags, blocking more interrupts while handling the interrupt.

The 8086 provides several types of interrupts, some generated by hardware and some generated by software. For hardware interrupts, the INTR pin on the chip generates a maskable interrupt when activated, while the NMI pin on the chip generates a higher-priority non-maskable interrupt.2 Typically, most interrupts use the INTR pin, signaling things such as a timer, keyboard request, real-time clock, or a disk needing service. The NMI interrupt is designed for things such as parity error or an impending power failure, which are so critical they can't be delayed. The 8086 also has a RESET pin that resets the CPU. Although not technically an interrupt, the RESET action has many features in common with interrupts, so I'll discuss it here.

On the software side, the 8086 has multiple types of interrupts generated by different instructions. The INT n instruction creates an interrupt of the specified type (0 to 255). These software interrupts were used in the IBM PC to execute a function in the BIOS, the layer underneath the operating system. These functions could be everything from a floppy disk operation to accessing the printer. The one-byte INT 3 instruction creates a breakpoint interrupt for debugging. The divide instructions generate an interrupt if a divide-by-zero or overflow occurs. The INTO instruction (Interrupt if Overflow) generates an interrupt if the overflow flag is set. To support single-step mode in debuggers, the Trap flag generate an interrupt on every instruction.

The diagram below shows how the vector table is implemented. Each of the 256 interrupt types has an entry holding the address of the interrupt handler (the code segment value and the instruction pointer (program counter) value). In the next section, I'll show below how the microcode loads the vector from the table and switches execution to that interrupt handler.

This diagram shows where the interrupt vectors are stored in memory.  From iAPX 86/88 User's Manual, Figure 4-18.

This diagram shows where the interrupt vectors are stored in memory. From iAPX 86/88 User's Manual, Figure 4-18.

Microcode

Most of the operations in the 8086 are implemented in microcode, a low-level layer of code that sits between the machine code instructions and the chip's hardware. I'll explain a few features of the 8086's microcode that are important for the interrupt code. Each micro-instruction is 21 bits long, as shown below. The first part of the micro-instruction specifies a move between a source register and a destination register; these may be special-purpose internal registers, not just the registers visible to the programmer. The meaning of the remaining bits depends on the type of micro-instruction, but includes jumps within the microcode, ALU (Arithmetic/Logic Unit) operations, and memory operations.

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

For a memory access, microcode issues a memory read or write micro-instruction. Memory accesses use two internal registers: the IND (Indirect) register holds the address in the segment, while the OPR (Operand) register holds the word that is read or written. A micro-instruction such as W SS,P2 writes the OPR register to the memory address specified by the IND register and the segment register (SS indicates the stack segment). The IND register can also be incremented or decremented (P2 indicates "Plus 2").

The 8086's Bus Interface Unit (BIU) handles the memory request in hardware, while the microcode waits. The BIU has an adder to combine the segment address and the offset to obtain the "absolute" address. It also has a constant ROM to increment or decrement the IND register. Memory accesses are complicated in the 8086 and take at least four clock cycles,3 called T1, T2, T3, and T4. An interrupt acknowledge is almost the same as a memory read, except the IAK bit is set in the microcode, causing some behavior changes.

The interaction between microcode and the ALU (Arithmetic/Logic Unit) will also be important. The ALU has three temporary registers that hold the arguments for operations, called temporary A, B, and C. These registers are invisible to the programmer. The first argument for, say, an addition can come from any of the three registers, while the second argument is always from the temporary B register. Performing an ALU operation takes at least two micro-instructions. First, the ALU is configured to perform an operation, for example, ADD or DEC2 (decrement by 2). The result is then read from the ALU, denoted as the Σ register.

Software interrupts

The main microcode for interrupt handling is shown below.4 Each line specifies a move operation and an action, with my comments in green. On entry to INTR interrupt handler, the OPR operand register holds the interrupt type. This chunk of microcode looks up the interrupt handler in the vector table, pushes the status flags onto the stack, and then branches to a routine FARCALL2 to perform a subroutine call the interrupt handler.

       move            action
19d  OPR → tmpBL     SUSP         INTR: OPR to tmpB(low), suspend prefetch
19e  0 → tmpbH       ADD tmpB      0 to tmpB(high), add tmpB to tmpB
19f  Σ → tmpB                      ALU sum to tmpB, add tmpB to tmpB
1a0  Σ → IND         R S0,P2       ALU sum to IND, read from memory, IND+=2
1a1  OPR → tmpB      DEC2 tmpC     memory to tmpB, set up decrement tmpC
1a2  SP → tmpC       R S0,P0       SP to tmpC, read from memory
1a3  OPR → tmpA                    memory to tmpA
1a4  F → OPR         CITF          Flags to OPR, clear interrupt and trap flags
1a5  Σ → IND         W SS,P0       ALU dec to IND, Write to memory
1a6  IND → tmpC      JMP FARCALL2  IND to tmpC, branch to FARCALL2

In more detail, the microcode routine starts at 19d by moving the interrupt number from the OPR register to the low byte of the ALU's temporary B register. The SUSP action suspends instruction prefetching since we'll start executing instructions from a new location. Next, line 19e zeros out the top byte of the temporary B register and tells the ALU to add temporary B to itself. The next micro-instruction puts the ALU result (indicated by Σ) into temporary B, doubling the value.

Line 1a0 calculates another sum (doubling) from the ALU and stores it in the IND register. In other words, the interrupt number has been multiplied by 4, yielding an address into the vector table. The interrupt handle address is read from the vector table: R S0,P2 operation reads from memory, segment 0, and performs a "Plus 2" on the IND register. Line 1a1 puts the result (OPR) into the temporary B register.

Line 1a2 stores the current stack pointer register into temporary C. It also performs a second read to get the handler code segment from the vector table. Line 1a3 stores this in the temporary A register. Line 1a4 puts the flags (F) into the OPR register. It also clears the interrupt and trap flags (CITF), blocking further interrupts.

Line 1a5 puts the ALU result (the decremented stack pointer) into the IND register. (This ALU operation was set up back in line 1a1.) To push the flags on the stack, W SS,P0 writes OPR to the Stack segment and does a "Plus 0" on the IND register. Finally, line 1a6 stores the IND register (the new top-of-stack) into the temporary C register and jumps to the FARCALL2 micro-routine.5

Understanding microcode can be tricky, since it is even more low-level than machine instructions, but hopefully this discussion gives you a feel for it. Everything is broken down into very small steps, even more basic than machine instructions. Microcode is a bit like a jigsaw puzzle, carefully fit together to ensure everything is at the right place at the right time, as efficiently as possible.

Subroutine call microcode: FARCALL2

Next, I'll describe the FARCALL2 microcode. Because of its segment registers, the 8086 has two types of calls (and jumps): a near call is a subroutine call within the same code segment, while a far call is a subroutine call to a different code segment. A far call target is specified with two words: the new code segment register and the new program counter.

The FARCALL2 micro-routine performs a subroutine call to a particular segment and offset. At entry to FARCALL2, the target code segment in temporary A, the offset is in temporary B, and the decremented stack pointer will be provided by the ALU. The microcode below pushes the code segment register to the stack, updates the code segment register with the new value, and then jumps to NEARCALL to finish the subroutine call.

06c  Σ → IND      CORR        FARCALL2: ALU (IND-2) to IND, correct PC
06d  CS → OPR     W SS,M2      CS to OPR, write to memory, IND-=2
06e  tmpA → CS    PASS tmpC    tmpA to CS, ALU passthrough
06f  PC → OPR     JMP NEARCALL PC to OPR, branch to NEARCALL

For a subroutine call, the program counter is saved so execution can resume where it left off. But because of prefetching, the program counter in the 8086 points to the next instruction to fetch, not the next instruction to execute. To fix this, the CORR (Correction) micro-instruction corrects the program counter value by subtracting the length of the prefetch queue. Line 06c also puts the decremented stack location into IND using the ALU decrement operation set up way back at line 1a1.

Line 06d puts the code segment value (CS) into the OPR register and then writes it to the stack segment, performing a "Minus 2" on IND. In other words, the CS register is pushed onto the stack. Line 06e stores the new value (from temporary A) into the CS register. It also sets up the ALU to pass the value of the temporary C register as its result. Finally, line 06f puts the (corrected) program counter into the OPR register and jumps to the NEARCALL routine.

Subroutine call microcode: NEARCALL

The NEARCALL micro-routine does a near subroutine call, updating the program counter but not the segment register. At entry, the target address is in temporary B, the IND register indicates the top of the stack, and OPR holds the program counter.

077  tmpB → PC    FLUSH      NEARCALL: tmpB to PC, restart prefetch
078  IND → tmpC               IND to tmpC
079  Σ → IND                  ALU to IND
07a  Σ → SP       W SS,P0 RNI ALU to SP, write PC to memory, run next instruction

Line 077 puts temporary B into the program counter. The FLUSH operation flushes the stale instructions from the prefetch queue and starts prefetching from the new PC address. Line 078 puts IND (i.e. the new stack pointer value) into temporary C. Line 079 puts this value into the IND register and line 07a puts this value into the SP register. (The ALU was configured at line 06e to pass the temporary C value unmodified.)

Line 07a pushes the PC to the stack by writing OPR (the old program counter) to the stack segment. Finally, RNI (Run Next Instruction) ends this microcode sequence and causes the 8086 to run the next machine instruction, the first instruction of the interrupt handler.

Starting an interrupt

The above microcode handles a generic interrupt. But there's one more piece: setting up the interrupt type for the instruction. For instance, the INT ib machine instruction has the interrupt type in the second byte of the opcode. This machine instruction has the two micro-instructions below. The microcode loads the type from the instruction prefetch queue (Q) and puts it into temporary B and then OPR. Then it jumps to the INTR microcode discussed earlier.

1a8  Q → tmpB             INT ib: load a byte from the queue
1a9  tmpB → OPR  JMP INTR  Put the byte in OPR and jump to INTR

Several instructions require specific interrupt numbers, and the microcode uses a tricky technique to obtain these numbers. The numbers are obtained from a special pseudo-register called CR, which is all zeros except the three low bits come from the microcode address.6 The microcode is carefully arranged in memory so the micro-instruction is at the right address to generate the necessary value. For instance, in the microcode below, entry point INT1 will load the number 1 into OPR, entry point INT2 will load the number 2 into OPR, and INT0 will load 0 into OPR. Each line then jumps to the main INTR interrupt microcode.

198  CR → OPR     JMP INTR      INT1: num to OPR, branch to INTR
199  CR → OPR     JMP INTR      INT2: num to OPR, branch to INTR
...
1a7  CR → OPR     JMP INTR      INT0: num to OPR, branch to INTR

The microcode for the INT 3 and INTO (not to be confused with INT0) machine instructions has some wasted micro-instructions to ensure that the CR → OPR is at the right address. This wastes a couple of cycles and a couple of micro-instructions.7

Return from interrupt

The IRET interrupt is used to return from interrupts. It pops the program counter, code segment register, and flags from the stack, so execution can continue at the point where the interrupt happened. It calls the microcode subroutine FARRET to pop the code segment register and the PC from the stack. (I won't go into FARRET in this post.) Then it pops the flags from the stack, updates the Stack Pointer, and runs the next instruction.

0c8               CALL FARRET IRET: call Far Return
0c9               R SS,P2      read from stack, IND+=2
0ca  OPR → F                   mem to Flags
0cb  IND → SP     RNI          IND to stack pointer, run next instruction

External hardware interrupts

As well as software interrupts, the 8086 has hardware interrupts. The 8086 chip has pins for INTR and NMI; pulling the pin high causes a hardware interrupt. This section discusses the hardware circuitry and the microcode that handles these interrupts.

The interrupt pin circuit

The schematic below shows the input circuitry for the INTR pin; the NMI, RESET, and TEST pins use the same circuit. The function of this circuit is to clean up the input and ensure that it is synchronized with the clock. The chip's INTR pin is connected to a protection diode to drain a negative voltage to ground. Next, the signal goes through three inverters, probably to force a marginal voltage to either 0 or 1. Finally, the signal goes through an edge-triggered flip-flop to synchronize it with the clock. The flip-flop is constructed from two set-reset latches, the first gated by clk' and the second gated by clk. At the output of each stage is a "superbuffer", two transistors that produce a higher-current output than a regular inverter. This flip-flop circuit is unusual for the 8086; most flip-flops and latches are constructed from dynamic logic with pass transistors, which is much more compact. The more complicated circuitry on the INTR input probably protects against metastability and other problems that could occur with poor-quality input signals.

Schematic of the input circuitry for the INTR pin.

Schematic of the input circuitry for the INTR pin.

The interrupt logic circuitry

The chip has a block of interrupt logic to receive interrupts, handle interrupt priorities, and execute an interrupt at the appropriate time. This circuitry is in the top right part of the chip, on the opposite side of the chip from the interrupt pins. The schematic below shows this circuitry.

The interrupt logic circuitry activates the microcode interrupt code at the appropriate time.

The interrupt logic circuitry activates the microcode interrupt code at the appropriate time.

The top chunk of logic latches an NMI (non-maskable interrupt) until it runs or it is cleared by a reset.8 The first flip-flop helps convert an NMI input into a one-clock pulse. The second flip-flop holds the NMI until it runs.

The middle chunk of logic handles traps. If the trap flag is high, the latch will hold the trap request until it can take place. The latch is loaded on First Clock (FC), which indicates the start of a new instruction. The NOR gate blocks the trap if there is an interrupt or NMI, which has higher priority.9

The third chunk of logic schedules the interrupt. Three things can delay an interrupt: an interrupt delay micro-instruction, an instruction that modifies a segment register, or an instruction prefix.10 If not delayed, the interrupt (NMI, trap, or INTR pin) will run at the start of the next instruction (i.e. FC).11 The microcode interrupt code is run for these cases as well as a reset. Note that the reset is not gated by First Clock, but can run at any time.

The interrupt signal from this circuitry loads a hardcoded interrupt address into the microcode address latches, depending on the type of interrupt.12 This happens for an interrupt at First Clock, while a reset can happen any time in the instruction cycle. A trap goes to the INT1 microcode routine described earlier, while an NMI interrupt goes to INT2 microcode routine. The microcode for the INTR interrupt will be discussed in the next section.

The interrupt signal also goes to the Group Decode ROM (via the instruction register), where it blocks regular instruction decoding. Finally, the interrupt signal goes to a circuit called the loader, where it prevents fetching of the next instruction from the prefetch queue.

The external INTR interrupt

The INTR interrupt has some special behavior to communicate with the device that triggered the interrupt: the 8086 performs two bus cycles to acknowledge the interrupt and to obtain the interrupt number from the device. This is implemented with a combination of microcode and the bus interface logic. The bus cycles are similar to memory read cycles, but with some behavior specific to interrupts.

The INTR interrupt has its own microcode, shown below. The first micro-instruction zeros the IND memory address register and then performs a special IAK bus cycle.13 This is similar to a memory read but asserts the INTA interrupt acknowledge line so the device knows that its interrupt has been received. Next, prefetching is suspended. The third line performs a second IAK bus cycle and the external device puts the interrupt number onto the bus. The interrupt number is read into the ORD register, just like a memory read. At this point, the code falls through into the interrupt microcode described previously.

19a  0 → IND   IAK S0,P0  IRQ: 0 to IND, run interrupt bus cycle
19b            SUSP        suspend prefetch
19c            IAK S0,P0   run interrupt bus cycle
19d            ...        The INTR routine discussed earlier

The bus cycle

The diagram below provides timing details of the two interrupt acknowledge bus cycles. Each cycle is similar to a memory read bus cycle, going through the T1 through T4 states, starting with an ALE (Address Latch Enable) signal. The main difference is the interrupt acknowledge bus cycle also raises the INTA (Interrupt Acknowledge) to let the requesting device know that its interrupt has been acknowledged.14 On the second cycle, the device provides an 8-bit type vector that provides the interrupt number. The 8086 also issues a LOCK signal to lock the bus from other uses during this sequence. The point of this is that the 8086 goes through a fairly complex bus sequence when handling a hardware interrupt. The microcode triggers these two bus cycles with the IAK micro-operation, but the bus interface circuitry goes through the various states of the bus cycle in hardware, without involving the microcode.

This diagram shows the interrupt acknowledge sequence. From Intel 8086 datasheet.

This diagram shows the interrupt acknowledge sequence. From Intel 8086 datasheet.

The circuitry to control the bus cycle is complicated with many flip-flops and logic gates; the diagram below shows the flip-flops. I plan to write about the bus cycle circuitry in detail later, but for now, I'll give an extremely simplified description. Internally, there is a T0 state before T1 to provide a cycle to set up the bus operation. The bus timing states are controlled by a chain of flip-flops configured like a shift register with additional logic: the output from the T0 flip-flop is connected to the input of the T1 flip-flop and likewise with T2 and T3, forming a chain. A bus cycle is started by putting a 1 into the input of the T0 flip-flop. When the CPU's clock transitions, the flip-flop latches this signal, indicating the (internal) T0 bus state. On the next clock cycle, this 1 signal goes from the T0 flip-flop to the T1 flip-flop, creating the externally-visible T1 state. Likewise, the signal passes to the T2 and T3 flip-flops in sequence, creating the bus cycle. Some special-case logic changes the behavior for an interrupt versus a read.15

The read/write control circuitry on the die with the flip-flops labeled. Metal and polysilicon were removed to show the underlying silicon.

The read/write control circuitry on the die with the flip-flops labeled. Metal and polysilicon were removed to show the underlying silicon.

Reset

The reset pin resets the CPU to an initial state. This isn't an interrupt, but much of the circuitry is the same, so I'll describe it for completeness. The reset microcode below initializes the segment registers, program counter, and flags to 0, except the code segment is initialized to 0xffff. Thus, after a reset, instruction execution will start at absolute address 0xffff0. The reset line is also connected to numerous flip-flops and circuits in the 8086 to ensure that they are initialized to the proper state. These initializations happen outside of the microcode.

1e4  0 → DS     SUSP   RESET: 0 to DS, suspend prefetch
1e5  ONES → CS          FFFF to CS
1e6  0 → PC     FLUSH   0 to PC, start prefetch
1e7  0 → F              0 to Flags
1e8  0 → ES             0 to ES
1e9  0 → SS     RNI     0 to SS, run next instruction

A bit of history

The 8086's interrupt system inherits a lot from the Intel 8008 processor. Interrupts were a bit of an afterthought on the 8008 so the interrupt handling was primitive and designed to simplify implementation.17 In particular, an interrupt response acts like an instruction fetch except the interrupting device "jams" an instruction on the bus. To support this, the 8008 provided one-byte RST (Restart) instructions that would call a fixed location. The Intel 8080 improved the 8008, but kept this model of performing an instruction fetch cycle that received a "jammed" instruction for an interrupt. With more pins available, the 8080 added the INTA Interrupt Acknowledge pin.

The approach of "jamming" an instruction onto the bus for an interrupt is rather unusual. Other contemporary microprocessors such as the 6800, 6502, or Intel 8048 used an interrupt vector approach, which is much more standard: an interrupt vector table held pointers to the interrupt service routines.

The 8086 switched to an interrupt vector table, but retained some 8080 interrupt characteristics for backward compatibility. In particular, the 8086 performs a memory cycle very much like an instruction fetch, but instead of an instruction, it receives an interrupt number. The 8086 performs two interrupt ack bus cycles but ignores the first one, which lets the same hardware work with the 8080 and 8086.16

Conclusions

This is another blog post that I expected would be quick and easy, but there's a lot going on in the 8086's interrupt system, both in hardware and microcode. The 8086 has some strange characteristics, such as acknowledging interrupts twice, but these features make more sense when looking at the 8086's history and what it inherited from the 8008 and 8080.

I've written multiple posts on the 8086 so far and plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. Thanks to pwg on HN for suggesting interrupts as a topic.

Notes and references

  1. The first computer to provide interrupts was probably the UNIVAC 1103A (1956). The book "Computer Architecture" by Blaauw and Brooks discusses different approaches to interrupts in great detail, pages 418-434. A history of interrupts is in this article

  2. The maskable interrupt can be blocked in software, while the non-maskable interrupt cannot be blocked. 

  3. A typical memory access takes four clock cycles. However, slow memory can add wait states, adding as many clock cycles as necessary. Moreover, accessing a word from an unaligned (odd) address results in two complete bus cycles to access the two bytes, since the bus can only read an aligned word at a time. Thus, memory accesses can take much more than four cycles. 

  4. The 8086's microcode was disassembled by Andrew Jenner (link) from my die photos, so the microcode listings are based on his disassembly. 

  5. The microcode jumps use a level of indirection because there isn't room in the micro-instruction for the full micro-address. Instead, the micro-instruction has a four-bit tag specifying the desired routine. The Translation ROM holds the corresponding micro-address for the routine, which is loaded into the microcode address register. 

  6. The CR transfer source loads the low three bits of the microcode address. Its implementation is almost the same as the ZERO source, which loads zero. A signal zeroes bits 15-3 for both sources. The bottom 3 bits are pulled low for the ZERO source or if the corresponding microcode bit is 0. By the time this load happens, the microcode counter has incremented, so the value is one more than the address with the micro-instruction. Also note that it uses the raw 13-bit microcode address which is 9 bits plus four counter bits. The address decoder converts this to the 9-bit "physical" microcode address that I'm showing. The point is that the 3 lower bits from my microcode listing won't give the right value. 

  7. The jump in the microcode is why the one-byte INT 3 instruction takes 52 clocks while the two-byte INT nn instruction takes 51 clocks. You'd expect INT nn to be slower because it has an extra byte to execute, but the microcode layout for INT 3 makes it slower. 

  8. There's a subtle difference between the NMI and the INTR interrupts. Once the NMI pin goes high, the interrupt is scheduled, even if the pin goes low. For a regular interrupt, the INTR pin must be high at the start of an instruction. Thus, NMI is latched but INTR is not. 

  9. Since the 8086 has multiple interrupt sources, you might wonder how multiple interrupts are handled at the same time. The chip makes sure the interrupts are handled correctly, according to their priorities. The diagram below shows what happens if trapping (single-step) is enabled, a divide causes a divide-by-0 exception, and an external interrupt arrives.

    Processing simultaneous interrupts. From iAPX 86/88 User's Manual, Figure 2-31.

    Processing simultaneous interrupts. From iAPX 86/88 User's Manual, Figure 2-31.

     

  10. The interrupt delay micro-instruction is used for the WAIT machine instruction. I think that a long string of prefix instructions will delay an interrupt (even an NMI) for an arbitrarily long time. 

  11. Interrupts usually happen after the end of a machine instruction, rather than interrupting an instruction during execution. There are a couple of exceptions, however, for instructions that can take a very long time to complete (block copies) or could take forever (WAIT). The solution is that the microcode for these instructions checks to see if an interrupt is pending, so the instruction can explicitly stop and the interrupt can be handled. 

  12. The microcode address is 13 bits long: a special bit, 8 instruction bits, and four counter bits. For an interrupt, it is set to 1r0000000.00ab, where r indicates a reset and ab indicate an interrupt of various types:
    Trap: goes to vector 1, INT1 addr 198 10000000.00
    NMI: goes to vector 2, INT2 addr 199 100000000.01 (Bit b is NMI)
    INTR: goes to IRQ addr 19a 100000000.10, vector specified by device. Bit a is intr-enable', blocked by NMI.
    This logic takes into account the relative priorities of the different interrupts. This address is initialized through a special multiplexer path for interrupts that loads bits directly into the microcode address latch. 

  13. The IAK micro-instruction is the same as a read micro-instruction except the IAK (Interrupt Acknowledge) bit is set. This bit controls the differences between a read micro-instruction and an interrupt acknowledge micro-instruction.

    The logic that makes the bus cycle an interrupt acknowledge rather than a read is a bit strange. A signal indicates that the cycle is an I/O operation or an interrupt acknowledge. This is determined by the instruction decoder (for I/O) or bit P of the microcode (for an interrupt acknowledge). This signal is used for the S2 status pin. Later, this signal is ANDed with the read/write signal to determine that it is an interrupt acknowledge and not an I/O. This probably optimized signal generation, but it seems inconvenient to merge two signals together and then split them out later. 

  14. The 8086 has two different modes, giving its pins different meanings in the different modes. In "Minimum" mode, the control pins have simple functions. In particular, the interrupt is acknowledged using the INTA pin. In "Maximum" mode, the control pins are encoded to provide more state information to a bus controller. In this mode, the interrupt acknowledge state is encoded and signaled over the S2, S1, and S0 state pins. I'm discussing minimum mode; the sequence is the same in maximum mode but uses different pins. 

  15. To prevent another device from grabbing the bus during interrupt acknowledgement, the LOCK pin is activated. The hardware for this LOCK signal toggles the internal lock latch on each of the interrupt ALE signals, so the lock is enabled on the first one and disabled on the second. The interrupt ack signal also prevents the address lines from being enabled during the interrupt ack. 

  16. An 8086/8088 system will typically use an external chip, the 8259A Programmable Interrupt Controller. This chip extends the interrupt functionality of the 8086 by providing 8 priority-based hardware interrupts.

    The Intel 8259A Programmable Interrupt Controller chip was designed to receive interrupt requests from multiple devices, prioritize interrupts, and direct the 8080 or 8086 processor accordingly. When used with the 8080, the interrupt controller chip will issue a CALL instruction to execute the specified interrupt handler. In particular, when the 8080 responds to an interrupt with INTA pulses, the interrupt controller chip will first put a CALL instruction on the bus, and then will respond to the next two INTA pulses with the address. For the 8086, the interrupt controller ignores the first INTA pulse and responds to the second INTA pulse with the 8-bit pointer. The point of this is that for both processors, the interrupt controller freezes its state on the first INTA and sends the interrupt-specific byte on the second INTA. Thus, even though the interrupt controller responds to the 8080 with an instruction and responds to the 8086 with an interrupt code, the underlying timing and logic are mostly the same. 

  17. The article Intel Microprocessors: 8008 to 8086 provides some history on interrupts in the 8008. Also see Intel 8008 Microprocessor Oral History Panel, pages 5-6. Most of the 8008's features were inherited from the Datapoint 2200 desktop computer, but the interrupts were not part of the Datapoint 2200. Instead, Intel decided to add interrupt functionality. 

Reverse-engineering an electromechanical Central Air Data Computer

Determining the airspeed and altitude of a fighter plane is harder than you'd expect. At slower speeds, pressure measurements can give the altitude, air speed, and other "air data". But as planes approach the speed of sound, complicated equations are needed to accurately compute these values. The Bendix Central Air Data Computer (CADC) solved this problem for military planes such as the F-101 and the F-111 fighters, and the B-58 bomber.1 This electromechanical marvel was crammed full of 1955 technology: gears, cams, synchros, and magnetic amplifiers. In this blog post I look inside the CADC, describe the calculations it performed, and explain how it performed these calculations mechanically.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the complex mechanisms inside. Click this image (or any other) for a larger version.

The Bendix MG-1A Central Air Data Computer with the case removed, showing the complex mechanisms inside. Click this image (or any other) for a larger version.

This analog computer performs calculations using rotating shafts and gears, where the angle of rotation indicates a numeric value. Differential gears perform addition and subtraction, while cams implement functions. The CADC is electromechanical, with magnetic amplifiers providing feedback signals and three-phase synchros providing electrical outputs. It is said to contain 46 synchros, 511 gears, 820 ball bearings, and a total of 2,781 major parts. The photo below shows a closeup of the gears.

A closeup of the complex gears inside the CADC,

A closeup of the complex gears inside the CADC,

What it does

For over a century, aircraft have determined airspeed from air pressure. A port in the side of the plane provides the static air pressure,2 which is the air pressure outside the aircraft. A pitot tube points forward and receives the "total" air pressure, a higher pressure due to the speed of the airplane forcing air into the tube. (In the photo below, you can see the long pitot tube sticking out from the nose of a F-101.) The airspeed can be determined from the ratio of these two pressures, while the altitude can be determined from the static pressure.

The F-101 "Voodoo", USAF photo.

The F-101 "Voodoo", USAF photo.

But as you approach the speed of sound, the fluid dynamics of air change and the calculations become very complicated. With the development of supersonic fighter planes in the 1950s, simple mechanical instruments were no longer sufficient. Instead, an analog computer to calculate the "air data" (airspeed, altitude, and so forth) from the pressure measurements. One option would be for each subsystem (instruments, weapons control, engine control, etc) to compute the air data separately. However, it was more efficient to have one central system perform the computation and provide the data electrically to all the subsystems that need it. This system was called a Central Air Data Computer or CADC.

The Bendix CADC has two pneumatic inputs through tubes: the static pressure3 and the total pressure. It also receives the total temperature from a platinum temperature probe. From these, it computes many outputs: true air speed, Mach number, log static pressure, differential pressure, air density, air density × the speed of sound, total temperature, and log true free air temperature.

The CADC implemented a surprisingly complex set of functions derived from fluid dynamics equations describing the behavior of air at various speeds and conditions. First, the Mach number is computed from the ratio of total pressure to static pressure. Different equations are required for subsonic and supersonic flight. Although this equation looks difficult to solve mathematically, fundamentally M is a function of one variable ($P_t / P_s$), and this function is encoded in the shape of a cam. (You are not expected to understand the equations below. They are just to illustrate the complexity of what the CADC does.)

\[M<1:\] \[~~~\frac{P_t}{P_s} = ( 1+.2M^2)^{3.5}\]

\[M > 1:\]

\[~~~\frac{P_t}{P_s} = \frac{166.9215M^7}{( 7M^2-1)^{2.5}}\]

Next, the temperature is determined from the Mach number and the temperature indicated by a temperature probe.

\[T = \frac{T_{ti}}{1 + .2 M^2} \]

The indicated airspeed and other outputs are computed in turn, but I won't go through all the equations. Although these equations may seem ad hoc, they can be derived from fluid dynamics principles. These equations were standardized in the 1950s by various government organizations including the National Bureau of Standards and NACA (the precursor of NASA). While the equations are complicated, they can be computed with mechanical means.

How it is implemented

The Air Data Computer is an analog computer that determines various functions of the static pressure, total pressure and temperature. An analog computer was selected for this application because the inputs are analog and the outputs are analog, so it seemed simplest to keep the computations analog and avoid conversions. The computer performs its computations mechanically, using the rotation angle of shafts to indicate values. For the most part, values are represented logarithmically, which allows multiplication and division to be implemented by adding and subtracting rotations. A differential gear mechanism provides the underlying implementation of addition and subtraction. Specially-shaped cams provide the logarithmic and exponential conversions as necessary. Other cams implement various arbitrary functions.

The diagram below, from patent 2,969,210, shows some of the operations. At the left, transducers convert the pressure and temperature inputs from physical quantities into shaft rotations, applying a log function in the process. Subtracting the two pressures with a differential gear mechanism (X-in-circle symbol) produces the log of the pressure ratios. Cam "CCD 12" generates the Mach number from this log pressure ratio, still expressed as a shaft rotation. A synchro transmitter converts the shaft rotation into a three-phase electrical output from the CADC. The remainder of the diagram uses more cams and differentials to produce the other outputs. Next, I'll discuss how these steps are implemented.

A diagram showing how values are computed by the CADC. Source: Patent 2969910A">Patent 2969910.

A diagram showing how values are computed by the CADC. Source: Patent 2969910.

The pressure transducer

The CADC receives the static and total pressure through tubes connected to the front of the CADC. (At the lower right, one of these tubes is visible.) Inside the CADC, two pressure transducers convert the pressures into rotational signals. The pressure transducers are the black domed cylinders at the top of the CADC.

The pressure transducers are the two black domes at the top. The circuit boards next to each pressure transducer are the amplifiers. The yellowish transformer-like devices with three windings are the magnetic amplifiers.

The pressure transducers are the two black domes at the top. The circuit boards next to each pressure transducer are the amplifiers. The yellowish transformer-like devices with three windings are the magnetic amplifiers.

Each pressure transducer contains a pair of bellows that expand and contract as the applied pressure changes. They are connected to opposite sides of a shaft so they cause small rotations of the shaft.

Inside the pressure transducer. The two disc-shaped bellows are connected to opposite sides of a shaft so the shaft rotates as the bellows expand or contract.

Inside the pressure transducer. The two disc-shaped bellows are connected to opposite sides of a shaft so the shaft rotates as the bellows expand or contract.

The pressure transducer has a tricky job: it must measure tiny pressure changes, but it must also provide a rotational signal that has enough torque to rotate all the gears in the CADC. To accomplish this, the pressure transducer uses a servo loop. The bellows produce a small shaft motion that is detected by an inductive pickup. This signal is amplified and drives a motor with enough power to move all the gears. The motor is also geared to counteract the movement of the bellows. This creates a feedback loop so the motor's rotation tracks the air pressure, but provides much more force. A cam is used so the output corresponds to the log of the input pressure.

This diagram shows the structure of the transducer. From "Air Data Computer Mechanization."

This diagram shows the structure of the transducer. From "Air Data Computer Mechanization."

Each transducer signal is amplified by three circuit boards centered around a magnetic amplifier, a transformer-like amplifier circuit that was popular before high-power transistors came along. The photo below shows how the amplifier boards are packed next to the transducers. The boards are complex, filled with resistors, capacitors, germanium transistors, diodes, relays, and other components.

This end-on view of the CADC shows the pressure transducers, the black cylinders. Next to each pressure transducer is a complex amplifier consisting of multiple boards with transistors and other components. The magnetic amplifiers are the yellowish transformer-like components.

This end-on view of the CADC shows the pressure transducers, the black cylinders. Next to each pressure transducer is a complex amplifier consisting of multiple boards with transistors and other components. The magnetic amplifiers are the yellowish transformer-like components.

Temperature

The external temperature is an important input to the CADC since it affects the air density. A platinum temperature probe provides a resistance4 that varies with temperature. The resistance is converted to rotation by an electromechanical transducer mechanism. Like the pressure transducer, the temperature transducer uses a servo mechanism with an amplifier and feedback loop. For the temperature transducer, though, the feedback signal is generated by a resistance bridge using a potentiometer driven by the motor. By balancing the potentiometer's resistance with the platinum probe's resistance, a shaft rotation is produced that corresponds to the temperature. The cam is configured to produce the log of the temperature as output.

This diagram shows the structure of the temperature transducer. From "Air Data Computer Mechanization."

This diagram shows the structure of the temperature transducer. From "Air Data Computer Mechanization."

The temperature transducer section of the CADC is shown below. The feedback potentiometer is the red cylinder at the lower right. Above it is a metal-plate adjustment cam, which will be discussed below. The CADC is designed in a somewhat modular way, with the temperature section implemented as a removable wedge-shaped unit, the lower two-thirds of the photo. The temperature transducer, like the pressure transducer, has three boards of electronics to implement the feedback amplifier and drive the motor.

The temperature transducer section of the CADC.

The temperature transducer section of the CADC.

The differential

The differential gear assembly is a key component of the CADC's calculations, as it performs addition or subtraction of rotations: the rotation of the output shaft is the sum or difference of the input shafts, depending on the direction of rotation.5 When rotations are expressed logarithmically, addition and subtraction correspond to multiplication and division. This differential is constructed as a spur-gear differential. It has inputs at the top and bottom, while the body of the differential rotates to produce the sum. The two visible gears in the body mesh with the internal input gears, which are not visible. The output is driven by the body through a concentric shaft.

A closeup of a differential mechanism.

A closeup of a differential mechanism.

The cams

The CADC uses cams to implement various functions. Most importantly, cams perform logarithms and exponentials. Cams also implement more complex functions of one variable such as ${M}/{\sqrt{1 + .2 M^2}}$. The photo below shows a cam (I think exponential) with the follower arm in front. As the cam rotates, the follower moves in and out according to the cam's radius, providing the function value.

A cam inside the CADC implements a function.

A cam inside the CADC implements a function.

The cams are combined with a differential in a clever way to make the cam shape more practical, as shown below.6 The input (23) drives the cam (30) and the differential (37-41). The follower (32) tracks the cam and provides a second input (35) to the differential. The sum from the differential produces the output (26).

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

This diagram, from Patent 2969910, shows how the cam and follower are connected to a differential.

The warped plate cam

Some functions are implemented by warped metal plates acting as cams. This type of cam can be adjusted by turning the 20 setscrews to change the shape of the plate. A follower rides on the surface of the cam and provides an input to a differential underneath the plate. The differential adds the cam position to the input rotation, producing a modified rotation, as with the solid cam. The pressure transducer, for instance, uses a cam to generate the desired output function from the bellows deflection. By using a cam, the bellows can be designed for good performance without worrying about its deflection function.

A closeup of a warped-plate cam.

A closeup of a warped-plate cam.

The synchro outputs

Most of the outputs from the CADC are synchro signals.7 A synchro is an interesting device that can transmit a rotational position electrically over three wires. In appearance, a synchro is similar to an electric motor, but its internal construction is different, as shown below. In use, two synchros have their stator windings connected together, while the rotor windings are driven with AC. Rotating the shaft of one synchro causes the other to rotate to the same position. I have a video showing synchros in action here.

Cross-section diagram of a synchro showing the rotor and stators.

Cross-section diagram of a synchro showing the rotor and stators.

Internally, a synchro has a moving rotor winding and three fixed stator windings. When AC is applied to the rotor, voltages are developed on the stator windings depending on the position of the rotor. These voltages produce a torque that rotates the synchros to the same position. In other words, the rotor receives power (26 V, 400 Hz in this case), while the three stator wires transmit the position. The diagram below shows how a synchro is represented schematically, with rotor and stator coils.

The schematic symbol for a synchro.

The schematic symbol for a synchro.

Before digital systems, synchros were very popular for transmitting signals electrically through an aircraft. For instance, a synchro could transmit an altitude reading to a cockpit display or a targeting system. For the CADC, most of the outputs are synchro signals, which convert the rotational values of the CADC to electrical signals. The three stator windings from the synchro inside the CADC are wired to an external synchro that receives the rotation. For improved resolution, many of these outputs use two synchros: a coarse synchro and a fine synchro. The two synchros are typically geared in an 11:1 ratio, so the fine synchro rotates 11 times as fast as the coarse synchro. Over the output range, the coarse synchro may turn 180°, providing the approximate output, while the fine synchro spins multiple times to provide more accuracy.

The front of the CADC has multiple output synchros with anti-backlash springs.

The front of the CADC has multiple output synchros with anti-backlash springs.

The air data system

The CADC is one of several units in the system, as shown in the block diagram below.8 The outputs of the CADC go to another box called the Air Data Converter, which is the interface between the CADC and the aircraft systems that require the air data values: fire control, engine control, navigation system, cockpit display instruments, and so forth. The motivation for this separation is that different aircraft types have different requirements for signals: the CADC remains the same and only the converter needed to be customized. Some aircraft required "up to 43 outputs including potentiometers, synchros, digitizers, and switches."

This block diagram shows how the Air Data Computer integrates with sensors and other systems. The unlabeled box on the right is the converter. From MIL-C-25653C(USAF).

This block diagram shows how the Air Data Computer integrates with sensors and other systems. The unlabeled box on the right is the converter. From MIL-C-25653C(USAF).

The CADC was also connected to a cylindrical unit called the "Static pressure and angle of attack compensator." This unit compensates for errors in static pressure measurements due to the shape of the aircraft by producing the "position error correction". Since the compensation factor depended on the specific aircraft type, the compensation was computed outside the Central Air Data Computer, again keeping the CADC generic. This correction factor depends on the Mach number and angle of attack, and was implemented as a three-dimensional cam. The cam's shape (and thus the correction function) was determined empirically, rather than from fundamental equations.

The CADC was wired to other components through five electrical connectors as shown in the photo below.9 At the bottom are the pneumatic connections for static pressure and total pressure. At the upper right is a small elapsed time meter.

The front of the CADC has many mil-spec round connectors.

The front of the CADC has many mil-spec round connectors.

Conclusions

The Bendix MG-1A Central Air Data Computer is an amazingly complex piece of electromechanical hardware. It's hard to believe that this system of tiny gears was able to perform reliable computations in the hostile environment of a jet plane, subjected to jolts, accelerations, and vibrations. But it was the best way to solve the problem at the time,10 showing the ingenuity of the engineers who developed it.

The CADC inside its case. From the outside, its mechanical marvels are hidden.

The CADC inside its case. From the outside, its mechanical marvels are hidden.

I plan to continue reverse-engineering the Bendix CADC and hope to get it operational,11 so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. Until then, you can check out CuriousMarc's video below to see more of the CADC. Thanks to Joe for providing the CADC. Thanks to Nancy Chen for obtaining a hard-to-find document for me.

Notes and references

  1. I haven't found a definitive list of which planes used this CADC. Based on various sources, I believe it was used in the F-86, F-101, F-104, F-105, F-106, and F-111, and the B-58 bomber. 

  2. The static air pressure can also be provided by holes in the side of the pitot tube. I couldn't find information indicating exactly how these planes received static pressure. 

  3. The CADC also has an input for the "position error correction". This provides a correction factor because the measured static pressure may not exactly match the real static pressure. The problem is that the static pressure is measured from a port on the aircraft. Distortions in the airflow may cause errors in this measurement. A separate box, the "compensator", determines the correction factor based on the angle of attack. 

  4. The platinum temperature probe is type MA-1, defined by specification MIL-P-25726. It apparently has a resistance of 50 Ω at 0 °C. 

  5. Strictly speaking, the output of the differential is the sum of the inputs divided by two. I'm ignoring the factor of 2 because the gear ratios can easily cancel it out. 

  6. Cams are extensively used in the CADC to implement functions of one variable, including exponentiation and logarithms. The straightforward way to use a cam is to read the value of the function off the cam directly, with the radius of the cam at each angle representing the value. This approach encounters a problem when the cam wraps around, since the cam's profile will suddenly jump from one value to another. This poses a problem for the cam follower, which may get stuck on this part of the cam unless there is a smooth transition zone. Another problem is that the cam may have a large range between the minimum and maximum outputs. (Consider an exponential output, for instance.) Scaling the cam to a reasonable size will lose accuracy in the small values. The cam will also have a steep slope for the large values, making it harder to track the profile.

    The solution is to record the difference between the input and the output in the cam. A differential then adds the input value to the cam value to produce the desired value. The clever part is that by scaling the input so it matches the output at the start and end of the range, the difference function drops to zero at both ends. Thus, the cam profile matches when the angle wraps around, avoiding the sudden transition. Moreover, the difference between the input and the output is much smaller than the raw output, so the cam values can be more accurate. (This only works because the output functions are increasing functions; this approach wouldn't work for a sine function, for instance.)

    This diagram, from Patent 2969910, shows how a cam implements a complex function.

    This diagram, from Patent 2969910, shows how a cam implements a complex function.

    The diagram above shows how this works in practice. The input is \(log~ dP/P_s\) and the output is \(log~M / \sqrt{1+.2KM^2}\). (This is a function of Mach number used for the temperature computation; K is 1.) The small humped curve at the bottom is the cam correction. Although the input and output functions cover a wide range, the difference that is encoded in the cam is much smaller and drops to zero at both ends. 

  7. The US Navy made heavy use of synchros for transmitting signals throughout ships. The synchro diagrams are from two US Navy publications: US Navy Synchros (1944) and Principles of Synchros, Servos, and Gyros (2012). These are good documents if you want to learn more about synchros. The diagram below shows how synchros could be used on a ship.

    A Navy diagram illustrating synchros controlling a gun on a battleship.

    A Navy diagram illustrating synchros controlling a gun on a battleship.

     

  8. To summarize the symbols, the outputs are: log TFAT: true free air temperature (the ambient temperature without friction and compression); log Ps: static pressure; M: Mach number; Qc: differential pressure; ρ: air density; ρa: air density times the speed of sound; Vt: true airspeed. Tt: total temperature (higher due to compression of the air). Inputs are: TT: total temperature (higher due to compression of the air). Pti: indicated total pressure (higher due to velocity); Psi: indicated static pressure; log Psi/Ps: the position error correction from the compensator. The compensator uses input αi: angle of attack; and produces αT: true angle of attack; aT: speed of sound. 

  9. The electrical connectors on the CADC have the following functions: J614: outputs to the converter, J601: outputs to the converter, J603: AC power (115 V, 400 Hz), J602: to/from the compensator, and J604: input from the temperature probe. 

  10. An interesting manual way to calculate air data was with a circular slide rule, designed for navigation and air data calculation. It gave answers for various combinations of pressure, temperature, Mach number, true airspeed, and so forth. See the MB-2A Air Navigation Computer instructions for details. Also see patent 2528518. I'll also point out that from the late 1800s through the 1940s and on, the term "computer" was used for any sort of device that computed a value, from an adding machine to a slide rule (or even a person). The meaning is very different from the modern usage of "computer". 

  11. It was very difficult to find information about the CADC. The official military specification is MIL-C-25653C(USAF). After searching everywhere, I was finally able to get a copy from the Technical Reports & Standards unit of the Library of Congress. The other useful document was in an obscure conference proceedings from 1958: "Air Data Computer Mechanization" (Hazen), Symposium on the USAF Flight Control Data Integration Program, Wright Air Dev Center US Air Force, Feb 3-4, 1958, pp 171-194. 

Silicon reverse-engineering: the Intel 8086 processor's flag circuitry

Status flags are a key part of most processors, indicating if an arithmetic result is negative, zero, or has a carry, for instance. In this post, I take a close look at the flag circuitry in the Intel 8086 processor (1978), the chip that launched the PC revolution.1 Looking at the silicon die of the 8086 reveals how its flags are implemented. The 8086's flag circuitry is surprisingly complicated, full of corner cases and special handling. Moreover, I found an undocumented zero register that is used by the microcode.

The die photo below shows the 8086 microprocessor under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; the ones that are important to this discussion are darker and will be discussed in detail below. The Arithmetic/Logic Unit (ALU, lower left) is split in two. The circuitry for the flags is in the middle, giving it access to the ALU's results for the low byte and the high byte. I've marked each flag latch in red in the diagram below. They appear to be randomly scattered, but there are reasons for this layout.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Flags and arithmetic operations

The 8086 supports three types of arithmetic: unsigned arithmetic, signed arithmetic, and BCD (Binary-Coded Decimal) and this is a key to understanding the flags. Unsigned arithmetic uses standard binary values: a byte holds an integer value from 0 to 255, while a 16-bit word holds a value from 0 to 65535. When adding, a carry indicates that the result is too big to fit in a byte or word. (I'll use byte operations to keep the examples small; operations on words are similar.) For instance, suppose you add hex 0x60 + 0x30. The result, 0x90, fits in a byte so there is no carry. But adding 0x90 + 0x90 yields 0x120. This result doesn't fit in a byte, so the result is 0x20 with the carry flag set to 1. The carry allows additions to be chained together, like doing long decimal addition on paper. For subtraction, the carry bit indicates a borrow.

The second type of arithmetic is 2's complement, which supports negative numbers. In a signed byte, 0x00 to 0x7f represent 0 to 127, while 0x80 to 0xff represent -128 to -1. If the top bit of a signed value is set, the value is negative; this is what the sign flag indicates. The clever thing about 2's complement arithmetic is that the same instructions are used for unsigned arithmetic and 2's complement arithmetic. The only thing that changes is the interpretation. As an example of signed arithmetic, 0xff + 0x05 = 0x04 corresponds to -1 + 5 = 4. Signed arithmetic can result in overflow, though. For example, suppose you add 112 + 112: 0x70 + 0x70 = 0xe0. Although that is fine in unsigned arithmetic, in signed arithmetic that result is unexpectedly -32. The problem is that the result doesn't fit in a single signed byte. In this case, the overflow flag is set to indicate that the result overflowed. In other words, the carry flag indicates that an unsigned result doesn't fit in a byte or word, while the overflow flag indicates that a signed result doesn't fit.

The third type of arithmetic is BCD (Binary-Coded Decimal), which stores a decimal digit as a 4-bit binary value. Thus, two digits can be packed into a byte. For instance, adding 12 + 34 = 46 corresponds to 0x12 + 0x34 = 0x46 with BCD. After adding or subtracting BCD values, a special instruction is needed to perform any necessary adjustment.2 This instruction needs to know if there was a carry from the lower digit to the upper digit, i.e. a carry from bit 4 to bit 3. Many systems call this a half-carry, since it is the carry out of a half-byte, but Intel calls it the auxiliary carry flag.

The diagram below summarizes the 8086's flags. The overflow, sign, auxiliary carry, and carry flags were discussed above. The zero flag simply indicates that the result of an operation was zero. The parity flag counts the number of 1 bits in a result byte and the flag is set if the number of 1 bits is even. At the left are the three control flags. The trap flag turns on single-stepping mode. The direction flag controls the direction of string operations. Finally, the interrupt flag enables interrupts.

The control and status flags in the 8086. Diagram from iAPX 86/88 Users Manual fig 2.9.

The control and status flags in the 8086. Diagram from iAPX 86/88 Users Manual fig 2.9.

The status flags are often used with the CMP (Compare) instruction, which performs a subtraction without storing the result. Although this may seem pointless, the status flags show the relationship between the values. For instance, the zero flag will be set if the two values are equal. Other flag combinations indicate "less than", "greater than", and other useful conditions. Loops and if statements use conditional jump instructions that test these flags. (I wrote more about 8086 conditional jumps here.)

Microcode and flags

Most people think of machine instructions as the basic steps that a computer performs. However, many processors (including the 8086) have another layer of software underneath: microcode. Instead of building the processor's control logic out of flip-flops and gates, microcode replaces much of the control logic with code. To execute a machine instruction, the computer internally executes several simpler micro-instructions, specified by the microcode. The main advantage of microcode is that it turns the design of control circuitry into a programming task instead of a difficult logic design task.

An 8086 micro-instruction is encoded into 21 bits as shown below. Every micro-instruction contains a move from a source register to a destination register, each specified with 5 bits. The meaning of the remaining bits depends on the type field, which is two or three bits long. For the current discussion, the most relevant part of the microcode is the Flag bit F at the end, which indicates that the micro-instruction will update the flags.3

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

The encoding of a micro-instruction into 21 bits. Based on NEC v. Intel: Will Hardware Be Drawn into the Black Hole of Copyright?

As an example, the microcode below implements the INC (increment) and DEC (decrement) instructions. The first micro-instruction moves a word from the register specified by the instruction (indicated by M) to the ALU's temporary B register. It sets up the ALU to perform the operation specified by the instruction (indicated by XI), and indicates that the next micro-instruction (NX) is the last for this machine instruction. The second micro-instruction moves the ALU result (Σ) to the specified register (M), tells the system to run the next instruction RNI, and causes the flags (F) to be updated from the ALU result. Thus, the flags are updated with the results of the increment or decrement.

   move       action
1 M→tmpb   XI    tmpb, NX
2 Σ→M      RNI   F

This microcode is rather generic: it doesn't explicitly specify the register or the ALU operation. Instead, the gate logic determines them from the machine instruction. This illustrates the 8086's hybrid approach: although the 8086 uses microcode, the microcode is parameterized and much of the instruction functionality is implemented with gate logic. When the microcode specifies a generic Arithmetic/Logic Unit (ALU) operation, the gate logic determines from the instruction which ALU (Arithmetic/Logic Unit) operation to perform (in this case, increment or decrement). The gate logic also determines from the instruction bits which register to modify. Finally, the microcode says to update the flags, but the ALU determines how to update the flags. This hybrid approach kept the microcode small enough for 1978 technology; the microcode above supports 16 different increment and decrement instructions.

Microcode can also read or write the flags as a whole, treating the flags as a register. The first micro-instruction below stores the flags to memory (via the OPerand Register), while the second micro-instruction below loads the flags from memory. The first micro-instruction is part of the microcode for PUSHF (push flags to the stack) and interrupt handling. The second micro-instruction is used for POPF (pop flags from the stack), the interrupt return code, and the reset code. Similar micro-instructions are used for LAHF (Load AH from Flags) and SAHF (Store AH to Flags).

  F→OPR
  OPR→F

Microcode can also modify some flags directly with the micro-operations CCOF (Clear Carry and Overflow Flags), SCOF (Set Carry and Overflow Flags), and CITF (Clear Interrupt and Trap Flags). The first two are used in the microcode for multiplication and division, while the third is used in the interrupt handler.

Finally, some machine instructions are implemented directly in logic and do not use microcode at all. The CMC (Complement Carry), CLC (Clear Carry), STC (Set Carry), CLI (Clear Interrupt), STI (Set Interrupt), CLD (Clear Direction), and STD (Set Direction) instructions modify the flags directly without running any microcode. (During instruction decoding, the Group Decode ROM indicates that these instructions are implemented with logic, not microcode.)

The latch circuit that stores flags

Each flag is stored in a latch circuit that holds the flag's value until it is updated. A typical flag latch has two inputs for updates: the flag value generated by the ALU, and a value from the bus when storing to all the flags. The latch also has a "hold" input to keep the existing value. (Some flags, such as carry, have more inputs, as will be described below.) A multiplexer (built from pass transistors) selects one of the inputs for the latch.

A typical latch to hold a flag. The latch is constructed from NMOS transistors and inverters. A "1" input turns on a transistor, letting its input pass through it.

A typical latch to hold a flag. The latch is constructed from NMOS transistors and inverters. A "1" input turns on a transistor, letting its input pass through it.

The latch is based on pass transistors and two inverters forming a loop. To see how it works, suppose select 1 is high. This turns on the transistor letting the in 1 value flow through the transistor and the first inverter. When clk' is high, the signal will flow through the second inverter and the output. While hold is high, the output is fed back from the output to the input, causing the latch to "remember" its value. The latch is controlled by the CPU's clock and it will only update the output when clk' is high. While clk' is low, the output will remain unchanged; the capacitance of the wire is enough to provide an input to the second inverter, a bit like dynamic RAM.4

The diagram below shows how one of these latches looks on the die. The pinkish regions are doped silicon, while the brownish lines are polysilicon. A transistor gate is formed where polysilicon crosses over doped silicon. Each inverter consists of two transistors. The signal flows through the latch in roughly a counterclockwise circle, starting with one of the inputs on the right.

The latch for the Sign Flag. The metal layer was removed for this image.

The latch for the Sign Flag. The metal layer was removed for this image.

Implementation of the flags

In this section, I'll discuss each flag in detail. But first, I'll explain the circuitry common to all the flags. As explained above, microcode can treat the flags as a register, reading or writing all the flags in parallel. When the microcode specifies flags as the destination for a move, a signal is generated that I call flags-load. This signal enables the multiplexer inputs (described above) that connect the ALU bus to the flag latches, loading the bits into the latches. Conversely, when microcode specifies the flags as the source for a move, a signal is generated that I call flags-read.5 This signal connects the outputs of the flag latches to the ALU bus through pass transistors, loading the value of the flags onto the bus.

Sign flag

The sign flag is pretty simple: it stores the top bit of the ALU result, indicating a negative result. For a byte operation, this is bit 7 and for a word operation, bit 15, so some logic selects the right bit based on the instruction. (This is another example of how logic circuitry looks after the details that microcode ignores.) The output from the sign flag goes to the condition evaluation circuitry to support conditional jumps, as do the other arithmetic flags. I wrote about that recently, so I won't go into details here.

The six arithmetic status flags are updated by arithmetic operations when the microcode F bit is set. This bit generates a signal that I call arith-flag-load, indicating that the flags should be updated based on the ALU result. This signal enables the multiplexer inputs between the ALU circuitry and the flag latches. There is an inconvenient special case: rotate instructions only update the overflow and carry flags for compatibility with the 8080 processor.6 To support this, a rotate instruction blocks the arith-flag-load signal for the sign, parity, zero, and auxiliary carry flags. Again, this is handled by gates, rather than microcode.

Zero flag

The zero flag is also straightforward. It indicates that the result byte or word is all zeros, for a byte or word operation respectively. An 8-input NOR gate at the top of the flags circuitry determines if the lower byte is all zeros, while an 8-input NOR gate at the bottom of the flags circuitry tests the upper byte. These NOR gates are spread out and span the width of the ALU, essentially a wire that is pulled low by any result bits that are high. The zero flag is set based on the low byte or the whole word, for a byte instruction or word instruction respectively.

There is a second zero flag, hidden from the programmer. This zero flag always tests the full 16-bit result, so I'll call it Z16. The other key difference is that the Z16 flag is updated on every ALU micro-operation, rather than under the control of the F bit. Thus, the Z16 flag can be updated without interfering with the programmer-visible zero flag. This makes it useful for internal microcode operations, such as loops.

Parity flag

The parity flag is conceptually simple, but it is fairly expensive to implement in hardware as it requires exclusive-oring the eight bits of the result byte together. This is implemented with seven XOR circuits.7 Since each XOR circuit is implemented with two logic gates, the raw parity calculation requires 14 gates. Only 8-bit parity is supported, even if a word operation is performed.8

The schematic below shows how an XOR circuit is implemented. It uses two gates; due to the properties of NMOS transistors, the AND-NOR gate is implemented as a single gate. To see how it works, suppose A and B are 0. The first NOR gate will output 1, forcing the output to 0. If A and B are both 1, the AND gate will force the output to 0. Otherwise the output is 1, providing the XOR function. The key point is that XOR is fairly costly compared to other logic functions.

Schematic of an XOR circuit.

Schematic of an XOR circuit.

Auxiliary carry flag

The auxiliary carry starts off simple, but is complicated by the decimal adjust instructions. In most cases, the auxiliary carry is carry-out from bit 3 of the ALU (i.e. the half-carry). For subtraction, the flag must be inverted to indicate a borrow, so the half-carry is exclusive-or'd with a subtraction signal.

However, the decimal adjust instructions (DAA, DAS, AAA, AAS) use the auxiliary carry and also modify the auxiliary carry when performing a decimal adjust. After an addition or subtraction, the decimal adjust instructions produce a correction value if necessary. If the lower digit is more than 9 or the auxiliary carry is set, the value 6 is added (or subtracted) from the accumulator.9 The DAA and AAA instructions also test if a correction of 0x60 is needed for the upper digit. The correction signals are wired to the ALU bus to generate the correction factor of 0x06, 0x60, or 0x66 for an adjustment ALU operation. The correction signal for the low digit is stored as the auxiliary carry flag.

Carry flag

The carry flag is surprisingly complex, with five inputs to the carry flag input multiplexer. The first input is the carry value for an ALU operation:10 the top bit of the ALU result (bit 7 or 15 for a byte or word operation). However, for a subtraction the carry is inverted to form the borrow. But for a DAA or DAS decimal adjust operation, the carry comes from the high-digit correction signal. And for an AAA or AAS ASCII adjust operation, the carry comes from the low-digit correction signal. These cases are determined with logic gates and fed into a single multiplexer input.

Another multiplexer input supports the CMC (Complement Carry) instruction by feeding in the current flag value but inverted. The STC and CLC (Set Carry and Clear Carry) instructions are implemented by feeding the low bit of the instruction into a different multiplexer input. This input also supports the micro-instructions SCOF (Set Carry, Overflow Flags), CCOF (Clear Carry, Overflow Flags), and RCY (Reset Carry).

The rotate and shift instructions have complex interactions with the carry flag, since bits are shifted in and out of the carry flag. For a shift or rotate, a separate multiplexer input provides the bit for the carry flag latch. For a right shift or rotate, the lowest bit of the ALU argument is fed into the carry flag. For a left shift or rotate, the carry out of bit 15 or bit 7 is fed into the carry flag; this was the highest bit for a word or byte operation respectively.

The output from the carry flag is fed into the ALU's carry-in for the ADC (Add with Carry), SBB (Subtract with Borrow), and RCL (Rotate through Carry, Left) instructions; the carry is inverted for SBB to form the borrow. For an RCR (Rotate through Carry, Right), the carry is fed into the ALU's output bit 7 or 15 (for a byte or word operation respectively).

Overflow flag

The circuitry for the overflow flag is fairly complicated, as there are multiple cases. For an arithmetic operation, the overflow flag indicates a signed overflow. The overflow is computed as the exclusive-or of the carry-in to the top bit and the carry-out from the top bit, selected for a byte or word operation. (I explained the mathematics behind this earlier.)

For a shift or rotate, however, the overflow flag indicates that the shifted value changed sign. The ALU implements left shifts and rotates by passing bits as carries so the old sign bit is the carry-out from the top bit, while the new sign bit is the carry-in to the top bit. Thus, the standard arithmetic overflow circuit also handles left shifts and rotates. On the other hand, for a shift or rotate right, the top two bits of the result are exclusive-or'd together to see if they are different: bits 6 and 7 for a byte shift and bits 14 and 15 for a word shift. (The second-from-the-top bit was the sign bit before the shift.)

Finally, two micro-instructions affect the flag: CCOF (Clear Carry and Overflow Flags) and SCOF (Set Carry and Overflow Flags). All these different sources for the overflow flag are combined in logic gates, rather than a complex multiplexer like the carry flag used.

Direction flag

The three remaining flags are "control" flags: rather than storing the status of an ALU operation, these flags control the CPU's behavior. The direction flag controls the direction of string operations that scan through memory: auto-incrementing or auto-decrementing. This is implemented by feeding the direction flag into the Constant ROM to determine the increment value applied to the SI and DI registers. The direction flag is set or cleared through the STD and CLD instructions (Set Direction and Clear Direction). For these instructions, the low bit of the instruction is passed into the flag to set or clear it as appropriate.

Interrupt flag

The output from the interrupt flag goes to the interrupt handling circuitry to enable or disable interrupts. This flag is set or cleared by a programmer through the STI and CLI instructions. For the STI and CLI instructions, the low bit of the instruction is passed into the flag to set or clear it as appropriate. Microcode can clear the interrupt flag and the trap flag (discussed below) with the CITF (Clear Interrupt and Trap Flag) micro-instruction. This is used in the interrupt handler to disable subsequent interrupts and traps. The CITF micro-instruction is implemented with a separate input to the flag latch.

Trap flag

The trap flag turns on single-stepping for debugging. With the trap flag on, every instruction generates an interrupt. This flag doesn't have machine instructions to modify it directly. Instead, the programmer must mess around with the PUSHF and POPF instructions to put all the flags on the stack and modify the flag bit there (details). Like the interrupt flag, the trap flag has an input to clear it if the CITF micro-instruction is active.

Layout of the flag circuitry

The diagram below shows the circuitry for the flags on the die, with the approximate location of each flag indicated. ALU bits 7 through 0 are above this circuitry and ALU bits 15 through 8 are below. The zero gates stretch the length of the ALU at the top and bottom, while the parity gates are near the low byte of the ALU. The flag circuitry appears highly irregular on the die because each flag has different circuitry. However, the circuitry for a flag is generally near the appropriate bit that receives the flag, so the layout is not as arbitrary as it may seem. For instance, the sign flag is affected by bit 7 or 15 of the ALU result and is loaded or stored to bit 7, so it is at the left. The trap and interrupt flags are outside the ALU, to the right of this image.

Closeup of the circuitry on the die that implements the flags. The metal layer has been removed to show the polysilicon and silicon underneath.

Closeup of the circuitry on the die that implements the flags. The metal layer has been removed to show the polysilicon and silicon underneath.

The history behind the 8086 flags11

The Datapoint 2200 (1970) is a desktop computer that was sold as a "programmable terminal". Although mostly forgotten now, the Datapoint 2200 is one of the most influential computers ever, as it led to the 8086 processor and thus the modern x86 architecture. For flags, the Datapoint 2200 had four "control flip-flops": carry/borrow,12 zero, sign, and parity. These were not bits in a register and could not be accessed directly. Instead, conditional jumps, subroutine calls, or subroutine returns could be performed based on the status of one of these flip-flops. Because the Datapoint 2200 was used as a terminal, and terminal protocols often used parity, implementing parity in hardware was a useful feature.

But how did the Datapoint 2200 lead to the 8086? The Datapoint 2200 was created before the microprocessor, so its processor was a large board of TTL chips. Datapoint asked Intel and Texas Instruments if they could replace this TTL processor with a single chip. Texas Instruments created the TMX 1795, the first 8-bit microprocessor. Intel created the 8008 shortly after. Both chips copied the instruction set and architecture of the 2200. Datapoint didn't like either chip and stuck with TTL. Texas Instruments couldn't find a customer for the TMX 1795 and abandoned it. Intel, on the other hand, marketed the 8008 as a general-purpose microprocessor, essentially creating the microprocessor industry. Since the 8008 copied the Datapoint 2200, it kept the four status flip-flops.

In 1974, Intel created the 8080 microprocessor, an improvement of the 8008. The 8080 kept the flags from the 8008 and added the auxiliary carry. Moreover, the flags could be accessed as a byte, making the flags appear like a register. The 8080 defined specific values for the unused flag bits. These decisions have persisted into the modern x86 architecture.13

Structure of the 8080 flags when saved on the stack. From 8080 Assembly Language Programming Manual.

Structure of the 8080 flags when saved on the stack. From 8080 Assembly Language Programming Manual.

The 8086 was designed to be backward compatible with the 8080, at least at the assembly language level.14 To support this, the 8086 kept the 8080's flag byte unchanged, putting additional flags in the high byte, as shown below. Thus, the selection, layout, and behavior of the 8086 flags (and thus x86) are largely historical accidents going back to the 8080, 8008, and Datapoint 2200 processors.

Arrangement of the 8086 flags in the word. The shaded flags match the 8080/8085 flags. Diagram from iAPX 86/88 Users Manual fig 2.10.

Arrangement of the 8086 flags in the word. The shaded flags match the 8080/8085 flags. Diagram from iAPX 86/88 Users Manual fig 2.10.

Conclusions

You might expect flags to be a simple part of a CPU, but the 8086's flags are surprisingly complex. About 1/3 of the ALU is devoted to flag computation and storage. Each flag is implemented with completely different circuitry. The 8086 is a CISC processor (Complex Instruction Set Computer), where the instruction set is designed to be powerful and to minimize the gap between machine language and high-level languages.15 This can be seen in the implementation of the flags, which are full of special cases to increase their utility with different instructions.16 In contrast, a RISC (Reduced Instruction Set Computer) simplifies the instruction set to make each instruction faster. This philosophy also affects the flags: for example, the ARM-1 processor (1985) has four arithmetic flags compared to the 8086's six flags. The behavior of the ARM flags is simpler, and the ARM doesn't deal with byte versus word operations. It also doesn't have instructions like decimal adjust that have complex flag behavior. This simplicity is reflected in the simpler and more regular circuitry of the ARM-1 flags, which I reverse-engineered here.

I've written multiple posts on the 8086 so far and plan to continue reverse-engineering the 8086 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

  1. Strictly speaking, the Intel 8088 launched the PC revolution as it was the processor in the first IBM PC. But internally the 8086 and 8088 are almost identical, so everything in this post applies to the 8088 as well. (The 8088 has an 8-bit bus compared to the 8086's 16-bit bus. As a result, the bus interface circuitry is different. The 8088 has a 4-byte prefetch queue compared to the 8086's 6-byte prefetch queue. And there are a few microcode changes. Apart from these changes, the dies are essentially identical.) 

  2. Since BCD arithmetic is performed using the binary addition and subtraction instructions, an adjustment may be required. For instance, consider adding 19 + 18 = 37 using BCD: 0x19 + 0x18 = 0x31 rather than the desired 0x37. Adding an adjustment factor of 6 yields the desired answer, taking into account the carry from the low digit. The BCD adjustment instructions are DAA (Decimal Adjust after Addition), DAS (Decimal Adjust after Subtraction), AAA (ASCII Adjust after Addition), and AAS (ASCII Adjust after Subtraction). (I wrote about the DAA instruction in detail here.) 

  3. Unlike other arithmetic and logic instructions, the NOT instruction does not change any of the flags. The designer of the 8086 states that this was an oversight. (See page 98 in "The 8086/8088 Primer".) Looking at the microcode shows that the microcode F bit was omitted in the implementation of NOT. I think that this "goof" also prevented the NOT and NEG microcode from being merged, wasting four micro-instructions. 

  4. Most of the latches in the 8086 have two pass transistors: one driven by clk and one driven by clk'. This makes the circuit function like an edge-triggered flip-flop, only transitioning on the edge of the clock. The flag latches, on the other hand, gate the multiplexer input controls so they are only active when clk is high. Thus, the two inverters are connected alternately during clk and clk'

  5. The connection from flag outputs to the ALU bus is more complex than simple pass transistors. For performance reasons, the ALU bus is charged high during the clock' phase of the clock. Then, any bits that should be 0 are pulled low during the high clock phase. (The motivation is that NMOS transistors can pull a line low faster than they can pull it high.) To support this, each inverted flag output drives a transistor connected to ground, and the output from this transistor is connected to the ALU bus through a pass transistor. 

  6. The 8080 processor has four rotate instructions, while the 8086 adds three shift instructions. The new shift instructions update the arithmetic flags according to the result. However, the 8080's rotate instructions only updated the carry flag, leaving the other flags unchanged. For backward compatibility, the 8086 preserves this behavior for the rotate instructions, not modifying the other flags inherited from the 8080. Since the 8086's overflow flag didn't exist in the 8080, the 8086 can update the overflow flag for rotate instructions without breaking compatibility, even though it's not obvious what "overflow" means in the case of a rotate. (The 8080's behavior of only updating the carry flag for shifts dates back to the Datapoint 2200.)

    Curiously, The 8086 Family User's Manual shows SHR and SAL/SHL as updating just the overflow and carry flags (pages 2-265 and 2-66), contradicting the text (page 2-39). 

  7. The 8086 implements the parity computation by XORing pairs of bits. The pairs are then combined in sequence: (((bit0⊕bit1)⊕(bit2⊕bit3))⊕(bit4⊕bit5))⊕(bit6⊕bit7). Combining the terms in a tree-like arrangement would have saved gate delays, but apparently wasn't necessary. 

  8. The parity flag only examines the low byte of the result, even for a 16-bit operation, making it unusual compared to the other flags. The motivation is probably that the parity flag was only supported for backward compatibility and not considered particularly useful. Even in modern 64-bit Intel processors, the parity flag only examines the least-significant byte. 

  9. The decimal adjust circuitry uses a gate circuit to test if the lower digit is greater than nine. Specifically, it uses the expression: bit3•(bit2+bit1). In other words, if the ALU input has 8 and either 4 or 2 or both.

    The logic to determine if the upper digit needs a correction is more complex: carry+bit7•(bit6+bit5+bit4•af9), where af9 indicates that AF is not set and the lower digit is more than 9. This tests if the upper digit is greater than nine, but also handles the case where the upper digit is 9 and adjusting the lower digit will increase it.

    The DAA instruction on the 8086 has slightly different behavior from the DAA instruction on x86 in a few cases. For example, 0x9a + 0x02 = 0x9c; DAA converts this to 0xa2 on the 8086, but 0x02 on x86. Since 0x9a is not a valid BCD value, this is technically an undefined case, but it is interesting that there is a difference. Perhaps this behavior was inherited from the 8080; if anyone has an 8080 available, perhaps they can test this case. (I wrote about the x86 DAA behavior in detail here.) 

  10. One special case is that the increment and decrement instructions affect all the arithmetic flags except for carry. This is implemented by blocking the carry-flag update for an increment or decrement instruction. The motivation is to allow a loop counter to be updated without disturbing the carry flag. This behavior was first implemented in the 8008 processor. 

  11. The book "Computer Architecture", Blaauw and Brooks, contains a detailed discussion of different approaches for condition flags, pages 353-358. Some processors, such as the IBM 704 (1954), don't explicitly store flags, but test and branch in a single instruction. Storing conditions as 1-bit values (as in the 8086) is called an "indicator". An alternative is the "condition code", which encodes mutually-exclusive condition values into a smaller number of bits, as in System/360 (1964). For example, addition stores four conditions (zero, negative, positive, or overflow) encoded into two bits, rather than separate zero, sign, and overflow flags. Other alternatives are where to store the conditions: in "working store" (i.e. a regular register), in memory, in a unique indicator (i.e. a flags register), or in a shared condition register (e.g. System/360). The point is that while the typical microprocessor approach of storing flags in a flag register may seem natural, many alternatives have been tried in different systems. 

  12. For subtraction, a borrow flag can be defined in different ways. The Datapoint 2200 and descendants store the borrow bit in the carry flag. This approach was also used by the 6800 and 68000 processors. The alternative is to store the complement of the borrow bit in the carry flag, since this maps more naturally onto twos-complement arithmetic. This approach was used by the IBM System/360 mainframe and the 6502 and ARM processors. 

  13. The positions of the 8080's flags in the byte are not arbitrary but have some logic. When performing multi-byte additions, the carry flag gets added into the low bit of the next byte, so it makes sense to put the carry flag in bit 0. Likewise, the auxiliary carry flag is in bit 4, since that is the bit it is added into. The sign bit is bit 7 of the result, so it makes sense to put the sign bit in bit 7 of the flags. As for the zero and parity flags, and the values of the unused flag bits, I don't have an explanation for those. 

  14. The 8086 was designed to provide an upgrade path from the 8080, so it inherited many instructions and architectural features along with the change from 8 bits to 16 bits. The two processors were not binary compatible or even directly compatible at the assembly code level. Instead, assembly code for the 8080 could be converted to 8086 assembly via a program called CONV-86, which would usually require manual cleanup afterward. Many of the early programs for the 8086 were conversions of 8080 programs. 

  15. The terms RISC and CISC are vague, and there are many different definitions. I'm not looking to debate definitions. 

  16. The motivation behind how 8086 instructions affect the flags is given in The 8086/8088 Primer, by Stephen Morse, the creator of the 8086 instruction set. It turns out that there are good reasons for the flags to have special-case behavior for various instructions.