Reverse-engineering the surprisingly advanced ALU of the 8008 microprocessor

A computer's arithmetic-logic unit (ALU) is the heart of the processor, performing arithmetic and logic operations on data. If you've studied digital logic, you've probably learned how to combine simple binary adder circuits to build an ALU. However, the 8008's ALU uses clever logic circuits that can perform multiple operations efficiently. And unlike most 1970's microprocessors, the 8008 uses a complex carry-lookahead circuit to increase its performance.

The 8008 was Intel's first 8-bit microprocessor, introduced 45 years ago.1 While primitive by today's standards, the 8008 is historically important because it essentially started the microprocessor revolution and is the ancestor of the x86 processor family that you are probably using right now.2 I recently took some die photos of the 8008, which I described earlier. In this article, I reverse-engineer the 8008's ALU circuits from these die photos and explain how the ALU functions.

Inside the 8008 chip

The image below shows the 8008's tiny silicon die, highly magnified. Around the outside of the die, you can see the 18 wires connecting the die to the chip's external pins. The rest of the chip contains the chip's circuitry, built from about 3500 tiny transistors (yellow) connected by a metal wiring layer (white).

Die photo of the 8008 microprocessor, showing important functional blocks.

Die photo of the 8008 microprocessor, showing important functional blocks.

Many parts of the chip work together to perform an arithmetic operation. First, two values are copied from the registers (on the right side of the chip) to the ALU's temporary registers (left side of the chip) via the 8-bit data bus. The ALU computes the result, which is stored back into the accumulator register via the data bus. (Note that the data bus splits and goes around both sides of the ALU to simplify routing.) The carry lookahead circuit generates the carry bits for the sum in parallel for higher performance.3 This is all controlled by the instruction decode logic in the center of the chip that examines each machine instruction and generates signals that control the ALU (and other parts of the chip).

The Arithmetic-Logic Unit

The 8008's ALU implements four functions: Sum, AND, XOR and OR. The Sum operation adds two 8-bit numbers. The remaining three operations are standard Boolean logic operations. The AND operation sets an output bit if the bit is set in the first AND the second number. OR checks if a bit is set in the first OR the second number (or both). XOR (exclusive-or) checks if a bit is set in the first OR the second number (but not both).

The concept of carries during addition is a key part of the ALU. Binary addition in a processor is similar to grade-school long addition, except with binary numbers instead of decimal. Starting at the right, each column of two numbers is added and there can be a carry to the next column. Thus, in each column, the ALU adds two bits as well as a carry bit.

In most early microprocessors, addition of each column needs to wait until the column to the right has been added and the carry is available. The carry "ripples" through the bits, right to left, slowing the addition. The 8008, however, uses a fast carry-lookahead circuit3 to generate the carries for all 8 columns in parallel before the addition happens. Then all the columns can all be added in parallel without waiting for the carry to "ripple" through the sum. This carry-lookahead circuit is an unusual feature to see in an early microprocessor due to its complexity.

Since the 8008 is an 8-bit processor, the ALU operates on two eight-bit arguments. Most 8-bit processors (including the 8008) use a "bit-slice" construction for the ALU, with a one-bit ALU slice repeated eight times. Each one-bit ALU slice takes two input bits and the carry-in bit, and produces the output bit. In most 8-bit processors, the bit-slice ALU is arranged by stacking 8 rectangular ALU slices to form a compact, regular block. However, the 8008 has its eight ALU slices arranged in an irregular fashion—some blocks are even sideways—as shown in the diagram below. The motivation for this is that the carry lookahead circuit takes up a triangular space on the chip. To fit the remaining space better, the 8008's ALU is arranged into its unusual triangular layout.

Arrangement of the eight ALU slices on the 8008 microprocessor die. Unlike most processors, the 8008's ALU slices are arranged in a haphazard triangular arrangement. This fits better with the triangular carry-lookahead circuit above the ALU.

Arrangement of the eight ALU slices on the 8008 microprocessor die. Unlike most processors, the 8008's ALU slices are arranged in a haphazard triangular arrangement. This fits better with the triangular carry-lookahead circuit above the ALU.

Zooming in on the die photo, we can look at one of the ALU slices and see how the circuitry is constructed. The chip is built from three layers (to simplify slightly). The topmost layer is the metal wiring. It is the most visible feature, and looks metallic (not surprisingly). In the detail below, you can see the horizontal and vertical metal traces. The polysilicon layer is underneath the metal layer and appears yellow/orange under the microscope. Polysilicon can act as wiring, but more importantly it forms the gates of the transistors, switching them on and off. The bottom layer is the grayish silicon die itself, but it is hard to see under the other layers.

Die photo of the 8008 processor, zoomed in on the circuit for one bit of the ALU.

Die photo of the 8008 processor, zoomed in on the circuit for one bit of the ALU.

In the diagram above, the carry c and the complemented a and b inputs enter through the metal wires at the top. The ALU output is at the bottom. The control signals are horizontal metal lines. The circuit is powered by the Vcc (+5 volts) and Vdd (-9 volts) metal lines. The brighter yellow polysilicon regions are transistors. Each gate in the circuit requires a "load resistor" connected to Vdd to pull its output low; for improved performance, these are implemented with transistors rather than resistors.

Removing the metal layer with acid makes the silicon and polysilicon layers more visible, as shown below.6 The chip is formed on a silicon wafer with regions of it "doped" with impurities to create regions of semiconducting silicon. You can see dark lines along the border between doped silicon and undoped silicon. A transistor is formed where a yellowish polysilicon wire crosses the doped silicon. The transistor forms a switch between the two silicon sides, controlled by the polysilicon gate. Each ALU slice contains 20 transistors; the diagram below points out two of them.5

With the metal layer removed from the 8008 processor die, the underlying silicon is visible. The photo shows bit 1 of the 8008's ALU.

With the metal layer removed from the 8008 processor die, the underlying silicon is visible. The photo shows bit 1 of the 8008's ALU.

Simulating one slice of the ALU

By examining the die photos carefully, you can map out the ALU slice's 20 transistors and their connections. From this, you can reverse-engineer the gates that make up the circuit. I explained in my previous article how PMOS gates are structured, so I won't go into the details here. The result is the schematic below, showing one bit of the ALU. Each ALU slice takes two inputs (a and b) and the input carry c, and outputs one result bit. There are three mode lines (m1, m2 and m3) that select one of the four ALU operations.7

The schematic below is interactive. First, select an operation and the table will update with results for the eight different inputs. Next, click a row in the table, and the schematic will update, showing how the ALU computes that row. (Note that the a and b inputs to the ALU are inverted, indicated by an overbar.)

Operation:

While this ALU slice looks like it is made of many gates, physically it is only three gates: two large, multilevel AND-OR-NAND gates and one NAND gate. The AND-OR-NAND logic is implemented on the chip as a single complex gate, rather than by combining simpler gates, since a single large gate provides better performance with less circuitry than multiple small gates. One feature of MOS logic is it's just as easy to form an AND-OR-NAND gate (for instance) as a plain NAND gate.

Understanding the ALU logic

The 8008's ALU circuit above looks like a mysterious collection of gates, but eventually I figured out the structure behind it. The starting point is a full adder that handles the Sum operation. (A full adder adds three input bits (a, b and c) and outputs the (low-order) sum bit and a carry bit.) The full adder is then heavily modified to support the logic operations, yielding the ALU from the previous section. The logic operations are implemented by using the mode lines to block parts of the circuit, yielding XOR, AND or OR, rather than the more complex Sum.

The diagram below strips down the 8008's ALU circuit to reveal the full adder "hidden" inside. The gate in red generates the carry-out from the three inverted inputs, using relatively straightforward logic. (Since the 8008 uses carry-lookahead, this carry-out signal isn't passed to the next ALU slice, but just used to generate the ALU output.) If you examine the possible sum cases, you will see that the sum bit is almost always just the carry-out inverted, except for the 0+0+0 and 1+1+1 cases. Thus, the sum bit can be generated by inverting the carry-out and handling the two exceptional cases.8 The two gates indicated below handle the exceptions by forcing the sum output to the correct value.

Simplified 8008 ALU slice, showing the full adder circuit.

Simplified 8008 ALU slice, showing the full adder circuit.

Comparing the full adder with the full ALU circuit earlier shows how the mode lines support the logic operations. Once you have a full adder, generating XOR is simply a matter of setting the carry-in to 0, which is done by the m3 control line. For the OR and AND operations, mode lines m3 and m2 respectively disable all of the circuit except the gates labeled in green.9 Thus, if you start with a full-adder and extend it to support XOR, AND and OR, the 8008's ALU circuit is a logical result.

Intel's earlier 4004 microprocessor had a simple ALU that only supported addition and subtraction, not any logic operations.10 Interestingly, the 4004's ALU circuit is almost identical to the full adder circuit shown above. So it's very likely that Intel designed the 8008 ALU by extending the 4004 ALU as described above. This would explain why the 8008's ALU generates carries internally, even though the carry lookahead circuit made this redundant.11

The 8008's ALU logic is very similar to the Z80's ALU,12 although the Z80's ALU is (surprisingly) 4 bits (details). The 8085 uses a different complex gate arrangement. The 6502 on the other hand, uses an entirely different approach: straightforward circuits for addition, AND, OR, XOR and shift-right, using pass-transistor multiplexers to select the operation.

Instruction decoding: how the ALU knows what operation to do

The 8008 executes 8-bit instructions, which move data, perform I/O, branch, call subroutines, and so forth. The instruction decoding logic examines the instruction and determines what operation to perform, generating about 30 control signals.13 Over a quarter of the instructions perform ALU operations, and the instruction set is carefully designed so three bits of the instruction specify which of the eight operations to perform.14 By examining these bits, the instruction decoder generates the ALU's mode control lines m1, m2 and m3.

Looking at AND instructions illustrates how this works. All AND instructions have the bit pattern xx100xxx (where x is either 0 or 1). For instance, the instruction to AND with memory is 10100111 and the instruction to AND with a constant is 00100100. When the instruction decode circuit matches this pattern, it pulls the m1 control line low, which causes the ALU to perform an AND operation.7 Other bit patterns generate the other ALU control signals.15

Part of the 8008's instruction decode PLA. The three indicated transistors match opcode pattern XX100XXX, indicating an AND instruction.

Part of the 8008's instruction decode PLA. The three indicated transistors match opcode pattern XX100XXX, indicating an AND instruction.

The diagram above shows part of the instruction decode circuit. The instruction bits (and their complements) are on yellow polysilicon wires running vertically through the circuit. Each row matches a bit pattern, with a transistor connected to each instruction bit to be matched. (The doped silicon regions forming transistors are the black outlines. Circles are connections between a transistor and the row's metal line.) For example, the three transistors marked with arrows match bit 3 low, bit 4 low, and bit 5 high, detecting the AND instruction pattern. Thus, the processor uses the grid of transistors in the instruction decoder to determine the meaning of each instruction.

Loose ends: Subtraction and rotating

The ALU implements a Sum operation, so you might wonder how subtraction is implemented. By using two's complement arithmetic, the CPU can perform subtraction by simply flipping all the bits on a value and then adding it. The ALU uses two temporary registers to hold the two operands since the ALU can't read the operands from the register file and write the result back simultaneously. One of the temporary registers has the feature that its value can be fed to the ALU directly or inverted. The subtraction instructions generate a signal causing the temporary register to provide the inverted value to the ALU, causing the ALU to perform subtraction.

One important operation in most processors is rotating or shifting the bits in a value, to the left or to the right. In most of the microprocessors I've examined, shifting is performed by the ALU.16 The 8008, on the other hand, implements the rotate logic in the register access circuit, on the opposite side of the chip from the ALU. When reading a register, the bits can be shifted one position left or right by a simple circuit before going onto the data bus.

History of the 8008

The Intel 8008 is important historically since it is the ancestor of the dominant Intel x86 architecture that you're probably using right now.2 I wrote a detailed article for the IEEE Spectrum on early microprocessor history, so I'll just give the outline of the 8008's complicated history here.

The 8008 copies the instruction set and architecture of the Datapoint 2200, a popular minicomputer introduced in 1970 as a programmable terminal.17 As was typical for minicomputers, the Datapoint 2200 contained a CPU build from individual TTL chips, filling up a circuit board. Datapoint contracted with both Intel and Texas Instruments to build a single-chip CPU that would replace this processor board, but keeping the same architecture and instruction set.

The Datapoint 2200 computer. The 8008 microprocessor was built to implement the Datapoint 2200's architecture and instruction set. Photo courtesy of Austin Roche.

The Datapoint 2200 computer. The 8008 microprocessor was built to implement the Datapoint 2200's architecture and instruction set. Photo courtesy of Austin Roche.

Texas Instruments was first to build a 2200-compatible microprocessor, creating the TMC 1795 chip. Intel got their version, the 8008, working a bit later, around the end of 1971. Datapoint rejected both processors, instead updating the Datapoint 2200 to use the 74181 TTL ALU chip. Texas Instruments couldn't find a new customer for the TMC 1795 and abandoned it. Intel, on the other hand, came up with the idea of selling the 8008 as a well-supported general-purpose processor. The 8008 led to the 8080, the 8085, 8086, and Intel's x86 line, which still retains some features of the 8008.

Conclusion

Although the 8008 was a very early microprocessor, its ALU was more advanced than you might expect. In particular, it used a complex carry-lookahead circuit for higher performance. Unfortunately, even with the carry-lookahead circuit, the 8008 was slower than the TTL-based Datapoint 2200 processor it was supposed to replace; addition took 20µs on the 8008, compared to 16µs on the original Datapoint 2200 and just 3.2µs on the upgraded Datapoint 2200. This illustrates the speed advantage that TTL had over MOS in the early 1970s. To us, a microprocessor may seem obviously better than a board of chips, but this wasn't always the case.

If you're interested in the 8008, my previous article has a detailed discussion of the architecture, more die photos and information on how to take them, and information on semiconductor history, so take a look.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

  1. The 8008 chip was publicly announced in an article in Electronics on March 13, 1972, entitled "8-bit parallel processor offered on a single chip", offering the chips for $200 each. 

  2. If you're not using an x86 processor right now, you're probably using an ARM processor. Don't feel neglected, though, since I've reverse-engineered the ARM-1 too. (Although there are many more ARM chips out there than x86, analytics show 71% of my readers are on x86.)  

  3. Using a carry look ahead circuit avoids the delay from a standard ripple-carry adder, where the carries propagate through the sum. The 8008's carry-lookahead is based on the Manchester carry chain, but with a separate carry chain for each carry, yielding the triangular structure you see on the die. For performance, the carry chain is implemented with dynamic logic, depending on wire capacitance, rather than with standard Boolean gates. The 74181 ALU chip in comparison, uses a different carry lookahead scheme implemented with standard logic. I plan to write more about the 8008's carry lookahead later. 

  4. The 8008 implements eight different arithmetic/logic functions: Add, Add with carry, Subtract, Subtract with borrow, AND, XOR, OR, and Compare.14 These are implemented in terms of the ALU's four basic operations. Subtraction is performed by inverting the second argument. The operations without carry/borrow clear the carry-in bit. Compare is simply a subtraction that doesn't store the result; it just sets the flags with the status. Thus, the four fundamental operations of the ALU are used to implement eight different arithmetic/logic operations. 

  5. Note that the 8008 uses PMOS transistors, rather than the faster NMOS transistors in later microprocessors such as the 8080, 6502 and Z80. If you're familiar with NMOS circuits, PMOS can be confusing since everything is backwards. PMOS transistors turn on if the gate is low, and typically pull the output high. Vdd in PMOS is negative, and "ground" is positive. The "pull-up resistor" in a PMOS gate pulls the output down. A PMOS NAND gate has transistors in parallel (compared to serial for an NMOS NAND gate). A PMOS NOR gate has transistors in serial (compared to parallel for an NMOS NOR gate). 

  6. The metal layer of the chip is protected by silicon dioxide passivation layer. The professional way to remove this layer is with dangerous hydrofluoric acid. Instead, I used Armour Etch glass etching cream, which is slightly safer and can be obtained at craft stores. I applied the etching cream to the die and wiped it for four minutes with a Q-tip. (Since the cream is designed for frosting glass, it only etches in spots. It must be moved around to obtain a uniform etch.) After this, I soaked the die in hydrochloric acid (pool acid from the hardware store) overnight to dissolve the metal. This was probably too long, since the edges of the polysilicon were eaten away in places. 

  7. The following values are used for the three mode lines to select the ALU function:

    Operationm1m2m3
    Sum111
    And010
    Or100
    Xor110
     

  8. A more straightforward way of generating the sum bit is by xoring the three inputs: a⊕b⊕c. Unfortunately, an XOR gate is relatively difficult to implement with Boolean logic, so designers will often try to avoid XOR. 

  9. You might wonder why the OR operation is implemented with an AND gate, and vice versa. Since the inputs and the output of the OR gate are inverted, this is equivalent to an AND gate (by De Morgan's laws), and similarly for the AND gate. 

  10. Strictly speaking, the 4004 microprocessor has an AU (arithmetic unit), not an ALU (arithmetic/logic unit), since it doesn't do logical operations. Since the 4004 was designed for a calculator, logical operations weren't required. 

  11. The 8008's full adder generates the carry-out first, and generates the sum from that. In contrast, the typical full adder circuit combines two half adders to generate the sum and carry-out separately. If the typical full adder circuit had been used in the 8008, the carry-out logic could easily be omitted. 

  12. To see the similarity between the Z80's ALU circuit and the 8008's, you need to swap AND and OR gates. (Apply De Morgan's laws since the 8008's ALU inputs are inverted.) In the Z80, the carry-out comes from the ALU rather than a carry-lookahead circuit, so the control lines are somewhat different. But the fundamental ALU circuit is otherwise the same between the 8008 and Z80, which is not surprising since Federico Faggin worked on both chips. 

  13. Instruction decoding is based on a Programmable Logic Array (PLA), an arrangement of transistors that efficiently implements logic gates. These gates match bit patterns and generate the appropriate control signals for the rest of the chip. The 8008's PLA has 16 input lines flow vertically through the PLA. Each row in the PLA matches a bit pattern and generates a control signal output.

    In more detail, each row output line is pulled low by a load resistor/transistor to Vdd. The transistors are connected between the row line and Vcc (+5V). The bit lines are connected to the transistor's gate. If any bit line is low (indicating a mismatch), the PMOS transistor turns on, pulling the row line high. Thus, if there is no mismatch, the control line is low, and if there is a mismatch, the control line is high. In other words, each row is a NAND gate with instruction bit inputs.

    The input lines are ordered as follows: bit 3, bit 3 complement, 4, 4', 5, 5', 0, 0', 1, 1', 2, 2', 6, 6', 7, 7'. This order may seem strange, but there's a reason for it. In the 8008, the ALU operation is selected by bits 3, 4 and 5 of the instruction. By putting those bits on the left side of the PLA, they are closer to the ALU. Some rows of the PLA actually decode two instructions: bits 3, 4 and 5 are decoded on the left side, generating an ALU control signal, while the remaining bits are decoded on the right side generating a different control signal. This increases the PLA density and saves space on the chip. 

  14. The 8008's instruction set is designed around octal. Among other things, there are 8 ALU operations, 8 registers and 8 conditionals. In octal, the ALU instructions have the value 2ar, where a is the ALU operation to perform (0 through 7) and r is the register to use (0 through 7, where 7 indicates memory). The octal structure originates with the Datapoint 2200, which decoded instructions with TTL 7442 BCD chips that decoded groups of three bits. This octal structure persisted in descendants of the 8008, including the Z80 and x86. Unfortunately, these instruction sets are almost always presented in hexadecimal, which hides the underlying structure. 

  15. The instruction decoder generates all the signals required by the ALU. As described above, AND matches xx100xxx, pulling the m1 control signal low. An OR opcode has the bit pattern xx110xxx, which causes the instruction decode circuit to pull the m2 control line low. An XOR instruction has the bit pattern xx101xxx. The m3 control line is pulled low for patterns xx10xxxx or xx1x0xxx, matching AND, OR or XOR instructions. The subtract (with and without borrow) instructions match xx01xxxx, generating a signal that inverts the second argument. 

  16. Different processors use a variety of techniques for shifting. In the Z80, shifting is performed as data enters the ALU. The 6502 performs a left shift with "A plus A", and has a path inside the ALU for right shifts; the 8085 is similar. The ARM-1 has a barrel shifter next to the ALU that performs arbitrary shifts. 

  17. The instruction set of the Datapoint 2200 is described in the Reference Manual. The 8008 has a couple minor changes. For instance, the 8008 has increment and decrement instructions that are not present in the 2200. 

Inside the 74181 ALU chip: die photos and reverse engineering

What's inside a TTL chip? To find out, I opened up a 74181 ALU chip, took high-resolution die photos, and reverse-engineered the chip.1 Inside I found several types of gates, implemented with interesting circuitry and unusual transistors. The 74181 was a popular chip in the 1970s used to perform calculations in the arithmetic-logic unit (ALU) of minicomputers. It is a moderately complex chip containing about 67 gates and 170 transistors3, implemented using fast and popular TTL (transistor-transistor logic) circuitry.

The 74181 die photo is below. (Click the image for a high-resolution version.) The golden stripes are the metal layer that interconnects the circuitry of the chip. (It's not gold, just aluminum that looks golden from the lighting.) The white squares around the edge of the die are the pads that are connected by tiny bond wires to the external pins. Under the metal layer is the silicon that makes up the chip. Faint lines show the doped silicon regions that make up the transistors and resistors. While the chip may appear impossibly complex at first, with careful examination it is possible to understand how it works.

Die photo of the 74181 ALU chip.

Die photo of the 74181 ALU chip.

The 74181 chip is important because of its key role in minicomputer history. Before the microprocessor era, minicomputers built their processors from boards of individual chips. The arithmetic operations (addition, subtraction) and logical operations (AND, OR, XOR) were performed by the arithmetic/logic unit (ALU) in the processor. Early minicomputers built ALUs out of a large number of simple gates. But in March 1970, Texas Instruments introduced the 74181 Arithmetic / Logic Unit (ALU) chip, which put a full 4-bit ALU on one fast TTL chip.4 This chip provided 32 arithmetic5 and logic functions2, as well as fast carry lookahead.7 Using the 74181 chip simplified the design of a minicomputer processor and made it more compact, so it was used in many minicomputers. Computers using the 74181 ranged from the popular PDP-11 and NOVA minicomputers to the powerful VAX-11/780 to the Datapoint 2200 desktop computer. The 74181 is still used today in retro hacker projects.6

A brief guide to NPN transistors

The 74181 is built from bipolar NPN transistors, a different technology from the MOS transistors in modern processors. The diagram below shows how an NPN transistor appears in an integrated circuit, along with a cross section. The transistor has three connections: the collector, base and emitter, with metal lines for each. The collector is connected to N-type silicon, the base to P silicon, and the emitter to N silicon (giving it the NPN structure). On the chip, you can recognize the emitter from its nested squares, the base because its silicon region surrounds the emitter, and the collector because it is the largest contact.

Structure of an NPN transistor appears in an IC.

Structure of an NPN transistor appears in an IC.

The key idea of the NPN transistor is it acts as a switch between the collector and emitter, controlled by the base. Normally there is no current flow between the collector and the emitter, so it's like a switch in the "off" position. But if you pass a small current from base to emitter, the transistor allows a large current from collector to emitter, like a switch in the "on" position. (This is vastly oversimplified—bipolar transistors are much more "analog"—but should be enough to understand how the 74181 works.) At the right is the symbol for an NPN transistor with the collector, base and emitter labeled.

Inverter

The fundamental component of TTL logic is the inverter, and other gates are modifications of the inverter circuit. Thus, it's important to understand the basic construction of the inverter, even though it is a bit complicated. I'll explain how it works in an oversimplified way.9

The diagram below shows an inverter in the 74181 chip. The 5V and ground lines run vertically along the left, powering the inverter. The transistors are highlighted with boxes. The resistors are visible as long strips of doped silicon snaking around.11 An input pin (A0) is wired to the pad. On the right is the schematic for a TTL inverter10, with components highlighted to match the die photo.

An inverter in the 74181 ALU chip, along with a schematic showing the components of the inverter.

An inverter in the 74181 ALU chip, along with a schematic showing the components of the inverter.

The input is connected to transistor Q1 (red). This transistor is used in an unusual way, acting as a "current-steering" transistor. If the input is low, R1's current is steered through Q1's emitter to the input, leaving Q2 off. If the input is high, R1's current flows "backwards" out Q1's collector to Q2's base, turning on Q2. Transistor Q2 (orange) can be considered a "phase splitter transistor", which makes sure that exactly one of the output transistors (Q3 and Q4) is activated. (That is, they turn on in opposite phases.) If Q2 is off, R2 provides current to turn on Q3 (yellow), which pulls the output high. Meanwhile, R3 turns off Q4. On the other hand, if Q2 turns on, it provides enough current to turn on Q4 (green), which pulls the output low. I'll explain the diodes in a footnote.12

The 74181 schematic

The schematic below13 shows the circuitry of the 74181. If you've taken a digital logic course, you've probably seen how to build a full adder circuit. But if you look at the schematic of the 74181, it's implemented in a very different way, to provide higher speed and more flexibility.8 The main reason for its complexity is it computes everything in parallel, rather than waiting for the carry to ripple from bit to bit, and this requires a lot more logic.

The different types of gates are highlighted. There are a few inverters (red) to invert input signals. Most of the logic consists of AND-OR-INVERT gates. The AND stages are shown in blue, and the OR-INVERT (NOR) stages in green. (Some of the OR-INVERT stages are not explicit on the schematic and are empty boxes.) The chip uses a few XOR gates (purple) to compute sums. Finally, there are a couple unique gates shown in yellow.

Schematic of the 74181 ALU.

Schematic of the 74181 ALU.

The schematic can be matched up with the labeled die image below. Conveniently, the layout of the die largely matches the schematic. The AND-OR-INVERT gates make up the majority of the chip. Also notice the large chip real estate used for resistors. The chip pins are labeled with blue text. (The metal layer was removed for this photo, to make the underlying circuitry more visible.)

The 74181 ALU die, with main gate types outlined.

The 74181 ALU die, with main gate types outlined.

AND-OR-INVERT

Most of the 74181's logic is implemented with AND-OR-INVERT gates, which consist of AND gates connected to a NOR gate as shown below. After seeing the inverter, you may expect that an AND-OR-INVERT gate is very complex. But as the schematic below14 shows, the AND-OR-INVERT is not much more complex than an inverter, requiring just a few more transistors. An AND gate is implemented by adding more emitters to the current-steering input transistor (red). (This may seem very strange, but transistors with multiple emitters are common in TTL circuits.) If all inputs are high, the base current will be steered to the collector. Otherwise, the base current will flow out the emitter. Thus, the AND of the inputs is generated. The NOR gate is implemented by putting phase splitter transistors in parallel (orange). If any of the bases are high, the corresponding transistor (Q2A or Q2B) will conduct, pulling the output low. While the circuit below has two AND gates, it can easily be extended to as many gates and inputs as desired.

The AND-OR-INVERT circuit from the 7451 chip. The multiple-emitter transistors that implement AND are highlighted in red. The transistors that implement OR are highlighted in orange.

The AND-OR-INVERT circuit from the 7451 chip. The multiple-emitter transistors that implement AND are highlighted in red. The transistors that implement OR are highlighted in orange.

The diagram below shows how these multiple-emitter transistors are implemented on the chip. Three of these transistors are shown, each with four or five emitters (the dark squares), creating 4-input or 5-input AND gates. Each transistor's base is at the top and each collector is at the bottom. The signal lines run horizontally, with emitters connected as needed. With this structure, multiple AND gates can arranged efficiently on the chip (similar to a Programmable Logic Array or PLA). Note that the base resistors take up a significant amount of space.

Three AND gates in the 74181 ALU chip. Each one is a single transistor with multiple emitters.

Three AND gates in the 74181 ALU chip. Each one is a single transistor with multiple emitters.

The diagram below shows how the OR-INVERT part of the circuit appears on the chip. Note that Q2A and Q2B (orange) share a collector, so the two transistors don't take up much space on the die. Their inputs come from AND circuits such as the ones above. 3-input and 4-input OR gates are implemented similarly, by adding more transistors.

The OR-INVERT stage of the AND-OR-INVERT gate in the 74181, compared with the 7451 AND-OR-INVERT gate.

The OR-INVERT stage of the AND-OR-INVERT gate in the 74181, compared with the 7451 AND-OR-INVERT gate.

Exclusive-OR

The chip uses a clever, compact circuit to compute XOR with two transistors wired in an unusual way: the emitters and bases are tied together and there is no connection to ground. The way it works is if the first input is high and the second is low, the first transistor turns on due to the base-emitter current. This pulls the output low through the transistor, with the second input acting as ground. Likewise, if the first input is low and the second is high, the second transistor turns on and pulls the output low. If both inputs are the same, there is no base-emitter current, both transistors remain off, and the output is pulled high by the resistor. The output from the transistor pair goes to the standard inverter stage, so the resulting signal is the XOR of the two inputs. 15 As with OR-INVERT, the two transistors share a collector, making the layout more compact.

The circuit used in the 74181 to compute XOR. Layout inspired by userbinator.

The circuit used in the 74181 to compute XOR. Layout inspired by userbinator.

A few things to note about the photo. The two transistors share a collector, which is equivalent to wiring their collectors together. The pull-up resistor doesn't appear in the photo; it is off to the right. The inputs to the XOR are from AND-OR-INVERT gates; their output transistors are at the top of the photo.

NOT-AND

The chip uses four AND gates that have one inverted input.17 On the die, it appears at first that the gates are implemented with the standard AND transistors, but an interesting trick is used to invert one of the inputs. Transistor Q1 is wired in the normal current-steering way, with R1 providing a base current. But transistor Q2 has its resistor connected to the collector, not the base.16 Normally R2 will pull the output high. But if input X is high and input Y is low, R1's current will go through Q1's collector and Q2's emitter, turning on Q2 and pulling the output low. Thus, the result after the inverter stage will be X AND NOT Y.

The 74181 uses an interesting circuit to generate NOT-AND. It uses the multi-emitter transistors but in a subtly different way from the AND gates.

The 74181 uses an interesting circuit to generate NOT-AND. It uses the multi-emitter transistors but in a subtly different way from the AND gates.

Getting the die photos

To create die photos, the integrated circuit package must be opened to expose the silicon die inside. Most chips have an epoxy package, which can be dissolved in boiling sulfuric acid. Since I don't like boiling acid, I obtained the 74181 chip in a ceramic package, which is much easier to open.

The 74181 ALU chip in a ceramic package.

The 74181 ALU chip in a ceramic package.

I tapped the chip along the seam with a chisel, splitting the two layers apart. Below, you can see how the metal pins are mounted between the layers, and are connected to the silicon die with tiny bond wires.

By tapping the 74181 chip with a chisel, the ceramic package can be popped open.

By tapping the 74181 chip with a chisel, the ceramic package can be popped open.

To photograph the die, I used a metallurgical microscope, a special type of microscope that shines light down through the lens to illuminate the chip from above. I took 22 photographs and then used the Hugin stitching software to combine them into a high-resolution image (details). Then, I removed the metal layer from the chip with hydrochloric acid and took more images, resulting in the image below. Removing the metal makes it easier to see the structure of the silicon layer and determine how the chip works. (Click for high-resolution version.)

Removing the metal layer of the 74181 chip with HCl reveals the silicon layer underneath.

Removing the metal layer of the 74181 chip with HCl reveals the silicon layer underneath.

Conclusion

The 74181 ALU chip is a complex, high-performance TTL chip that was a key component in the processor of many minicomputers. I took detailed die photos of the 74181 ALU that reveal how the chip works internally. It uses several different logic gates, primarily AND-OR-INVERT gates that have an efficient layout on the chip. These gates are implemented by extending an inverter circuit in different ways, but are more complex than their MOS equivalents. I plan to explain how the 74181 implements its 32 functions and fast carry in a future article, so keep watching.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

  1. To understand what's inside a TTL chip, it might be more sensible to start with a simple chip such as a NAND gate. But why take the easy way when there's a complex chip to explore? 

  2. Many of the 74181's 32 functions are strange, but there is actually a system behind it. Note that there are exactly 16 possible functions on two (one-bit) binary inputs A and B. (There are 4 lines in the truth table, and two choices for each output, so 2^4 possible functions.) The 74181's 16 logic functions are simply these 16 functions (extended to 4 bits). The 74181's 16 arithmetic functions are A PLUS (one of the 16 possible functions of A and B) PLUS carry-in. 

  3. Various sources say the 74181 has 61 or 75 gates. The schematic shows 67 gates. If you omit the five 1-input AND gates, you get 62 gates, i. On the die, I counted 169 transistors, but it's quite possible I missed some. 

  4. The history of the 74181 chip is described in detail on this site. The 74181 is apparently the first ALU chip created. In 1968, Fairchild introduced the 3800, an 8-bit accumulator chip, but it didn't have logical functions so technically it's just an AU (Arithmetic Unit) not an ALU like the 74181. Before the 74181 was the 7483 4-bit adder chip (1968); internally, the 7483 is similar to the lower half of the 74181. The 7483 was used in minicomputers such as the PDP 8/E. 

  5. ALU chips of this era didn't perform multiplication or division, let alone floating point operations. Multiplication and division operations were common in computers of that era, but were typically performed with multiple cycles of addition or subtraction. The one operation that seems missing from the 74181 is "shift right"; it can do a shift left with "A PLUS A". 

  6. Retro projects using the 74181 include the APOLLO181 CPU, Fourbit CPU, 4 Bit TTL CPU, Magic-1 (using the 74F381), TREX, Mark 1 FORTH and Big Mess o' Wires

  7. When multiple 74181 chips are connected together for larger words, you can simply feed the carry-out of one chip into the carry-in of the next. For higher performance, the 74182 look-ahead carry generator could be used to compute the carries across multiple chips in parallel. Some minicomputers (such as the Xerox Alto) didn't use the 74182, while others (such as the Interdata 7/16) did. 

  8. I'll give a brief overview of the chip's implementation here. The chip is build around the idea of carry lookahead. In particular, the upper and-or-invert gates create the carry P (propagate) and G (generate) signals for each bit of A PLUS f(A,B). The lower and-or-invert gates use these signals to compute the carry for each bit of the sum. Finally, the xor gates add the P, G and carry to compute each final sum. The point of this implementation is to compute the four bits in parallel and avoid a slow ripple carry. In a later post I'll explain this circuitry in full detail. 

  9. I've simplified the discussion of the TTL logic circuit, since most people probably don't care about saturation, β, biasing, and so forth. If you want the full analysis, see Logic gates: the NOT gate or Transistor-Transistor Logic. This presentation shows schematics for the different gates and TTL logic families. 

  10. The inverter schematic is from the datasheet of the common 7404 inverter chip. Interestingly, the basic circuit used in an inverter chip is almost identical to the circuit used inside the 74181. This turns out to be true for most of the 74181 circuits—they are similar to individual TTL parts. The 74181's transistors are a bit smaller because not as much current is required inside the chip, but there is much less scaling than you might expect. 

  11. In the 74181's inverter, R4 is not used. R1 takes its place. This is probably because an inverter in the chip doesn't need to provide as much current as a 7404 inverter chip. 

  12. The tricky part of the inverter circuit is that if Q2 turns on, there's enough voltage to turn on Q4 but not Q3, thanks to diode D2. The purpose of diode D2 isn't to conduct current in one direction, like you'd expect from a diode. Instead, its purpose is to raise Q3's emitter voltage by one diode drop (about 0.7V). As a consequence, Q3 requires 0.7V more at the base to turn on. Thus, when Q2 is active, there is enough voltage to turn Q4 on, but not Q3. And diode D1 simply protects the chip by shunting any negative input voltage to ground. 

  13. The schematic is based on a diagram by Poil on WikiMedia, CC By-SA 3.0, with labeling changes and gates highlighted. 

  14. The AND-OR-INVERT schematic is slightly modified from the 7451 AND-OR-INVERT chip to match the 74181's circuit. The 74181's AND-OR-INVERT circuits in the lower half of the chip omit the pull-up output transistor found in the 7451 since the 74181 doesn't require as much output current internally. (The output diode remains to drop the phase splitter transistor's collector voltage by one diode drop, or else the low output voltage will be too high.) 

  15. The circuit used in the 74181 for exclusive-OR is similar to the 7486 TTL XOR chip

  16. On the die, Q2 appears to have two collectors. But these are just two contacts to the same collector, to simplify routing of the wiring. This is unlike the multiple-emitter transistors, which genuinely have multiple emitters. 

  17. Some datasheets (example) show an XOR gate instead of NOT-AND. You might wonder how this could possibly work since XOR and NOT-AND are different. The answer is that one of the four input combinations never happens in the 74181, and the gates are equivalent across the other three inputs. The physical implementation is NOT-AND rather than XOR. 

Die photos and analysis of the revolutionary 8008 microprocessor, 45 years old

Intel's groundbreaking 8008 microprocessor was first produced 45 years ago.1 This chip, Intel's first 8-bit microprocessor, is the ancestor of the x86 processor family that you may be using right now. I couldn't find good die photos of the 8008, so I opened one up and took some detailed photographs. These new die photos are in this article, along with a discussion of the 8008's internal design.

Die photograph of the 8008 microprocessor

Die photograph of the 8008 microprocessor

The photo above shows the tiny silicon die inside the 8008 package. (Click the image for a higher resolution photo.) You can barely see the wires and transistors that make up the chip. The squares around the outside are the 18 pads that are connected to the external pins by tiny bond wires. You can see the text "8008" on the right edge of the chip and "© Intel 1971" on the lower edge. The initials HF appear on the top right for Hal Feeney, who did the chip's logic design and physical layout. (Other key designers of the 8008 were Ted Hoff, Stan Mazor, and Federico Faggin.)

Inside the chip

The diagram below highlights some of the major functional blocks of the chip. On the left is the 8-bit Arithmetic/Logic Unit (ALU), which performs the actual data computations.3 The ALU uses two temporary registers to hold its input values. These registers take up significant area on the chip, not because they are complex, but because they need large transistors to drive signals through the ALU circuitry.

Die of the 8008 microprocessor showing major components.

Die of the 8008 microprocessor showing major components.

Below the registers is the carry look ahead circuitry. For addition and subtraction, this circuit computes all eight carry values in parallel to improve performance.2 Since the low-order carry depends on just the low-order bits, while the higher-order carries depend on multiple bits, the circuit block has a triangular shape.

The triangular layout of the ALU is unusual. Most processors stack the circuitry for each bit into a regular rectangle (a bit-slice layout). The 8008, however, has eight blocks (one for each bit) arranged haphazardly to fit around the space left by the triangular carry generator. The ALU supports eight simple operations.3

In the center of the chip is the instruction register and the instruction decoding logic that determines the meaning of each 8-bit machine instruction. Decoding is done with a Programmable Logic Array (PLA), an arrangement of gates that matches bit patterns and generates the appropriate control signals for the rest of the chip. On the right are the storage blocks. The 8008's seven registers are in the upper right. In the lower right is the address stack, which consists of eight 14-bit address words. Unlike most processors, the 8008's call stack is stored on the chip instead of in memory. The program counter is just one of these addresses, making subroutine calls and returns very simple. The 8008 uses dynamic memory for this storage

The physical structure of the chip is very close to the block diagram in the 8008 User's Manual (below), with blocks located on the chip in nearly the same positions as in the block diagram.

Block diagram of the 8008 microprocessor, from the User's Manual.

Block diagram of the 8008 microprocessor, from the User's Manual.

The structure of the chip

What does the die photo show? For our purposes, the chip can be thought of as three layers. The diagram below shows a closeup of the chip, pointing out these layers. The topmost layer is the metal wiring. It is the most visible feature, and looks metallic (not surprisingly). In the detail below, these wires are mostly horizontal. The polysilicon layer is below the metal and appears orange under the microscope.

A closeup of the 8008 die, showing the metal layer, the polysilicon, and the doped silicon.

A closeup of the 8008 die, showing the metal layer, the polysilicon, and the doped silicon.

The foundation of the chip is the silicon wafer, which appears purplish-gray in the photo. Pure silicon is effectively an insulator. Regions of it are "doped" with impurities to create semiconducting silicon. Being on the bottom, the silicon layer is difficult to distinguish, but you can see black lines along the border between doped silicon and undoped silicon. A few vertical silicon "wires" are visible in the photo.4

Transistors are the key component of the chip, and a transistor is formed where a polysilicon wire crosses doped silicon. In the photo, the polysilicon appears as a brighter orange where it forms a transistor.

Why an 18 pin chip?

One inconvenient feature of the 8008 is it only has 18 pins, which makes the chip slower and much more difficult to use. The 8008 uses 14 address bits and 8 data bits so with 18 pins there aren't enough pins for each signal. Instead, the chip has 8 data pins that are reused in three cycles to transmit the low address bits, high address bits, and data bits. A computer using the 8008 requires many support chips to interact with this inconvenient bus architecture.5

There was no good reason to force the chip into 18 pins. Packages with 40 or 48 pins were common with other manufacturers, but 16 pins was "a religion at Intel".6 Only with great reluctance did they move to 18 pins. By the time the 8080 processor came out a few years later, Intel had come to terms with 40-pin chips. The 8080 was much more popular, in part because it had a simpler bus design permitted by the 40-pin package.

Power and data paths in the chip

The data bus provides data flow through the chip. The diagram below shows the 8-bit data bus of the 8008 with rainbow colors for the 8 data lines. The data bus connects to the 8 data pins along the outside of the upper half of the chip. The bus runs between the ALU on the left, the instruction register (upper center), and the registers and stack on the right. The bus is split on the left with half along each side of the ALU.

Die photo of the 8008 microprocessor. The power bus is shown in red and blue. The data bus is shown with 8 rainbow colors.

Die photo of the 8008 microprocessor. The power bus is shown in red and blue. The data bus is shown with 8 rainbow colors.

The red and blue lines show power routing. Power routing is an under-appreciated aspect of microprocessors. Power is routed in the metal layer due to its low resistance. But since there is only one metal layer in early microprocessors, power distribution must be carefully planned so the paths don't cross.7 The diagram above shows Vcc lines in blue and Vdd lines in red. Power is supplied through the Vcc pin on the left and the Vdd pin on the right, then branches out into thin, interlocking wires that supply all parts of the chip.

The register file

To show what the chip looks like in detail, I've zoomed in on the 8008's register file in the photo below. The register file consists of an 8 by 7 grid of dynamic RAM (DRAM) storage cells, each using three transistors to hold one bit.8 (You can see the transistors as the small rectangles where the orange polysilicon takes on a slightly more vivid color.) Each row is one of the 8008's seven 8-bit registers (A, B, C, D, E, H, L). On the left, you can see seven pairs of horizontal wires: the read select and write select lines for each register. At the top, you can see eight vertical wires to read or write the contents of each bit, along with 5 thicker wires to supply Vcc. Using DRAM for registers (rather than the more common static latches) is an interesting choice. Since Intel was primary a memory company at the time, I expect they chose DRAM due to their expertise in the area.

The register file in the 8008. The chip has seven 8-bit registers: A, B, C, D, E, H, L

The register file in the 8008. The chip has seven 8-bit registers: A, B, C, D, E, H, L

How PMOS works

The 8008 uses PMOS transistors. To simplify slightly, you can think of a PMOS transistor as a switch between two silicon wires, controlled by a gate input (of polysilicon). The switch closes when its gate input is low and it can pull its output high. If you're familiar with the NMOS transistors used in microprocessors like the 6502, PMOS may be a bit confusing because everything is backwards.

A simple PMOS NAND gate can be constructed as shown below. When both inputs are high, the transistors are off and the resistor pulls the output low. When any input is low, the transistor will conduct, connecting the output to +5. Thus, the circuit implements a NAND gate. For compatibility with 5-volt TTL circuits, the PMOS gate (and thus the 8008) is powered with unusual voltages: -9V and +5V.

A NAND gate implemented with PMOS logic.

A NAND gate implemented with PMOS logic.

For technical reasons, the resistor is actually implemented with a transistor. The diagram below shows how the transistor is wired to act as a pull-down resistor. The detail on the right shows how this circuit appears on the chip. The -9V metal wire is at the top, the transistor is in the middle, and the output is the silicon wire at the bottom.

In PMOS, a pull-down resistor (left) is implemented with a transistor (center). The photo on the right shows an actual pull-down in the 8008 microprocessor.

In PMOS, a pull-down resistor (left) is implemented with a transistor (center). The photo on the right shows an actual pull-down in the 8008 microprocessor.

History of the 8008

The 8008's complicated story starts with the Datapoint 2200, a popular computer introduced in 1970 as a programmable terminal. (Some people consider the Datapoint 2200 to be the first personal computer.) Rather than using a microprocessor, the Datapoint 2200 contained a board-sized CPU build from individual TTL chips. (This was the standard way to build a CPU in the minicomputer era.) Datapoint and Intel decided that it would be possible to replace this board with a single MOS chip, and Intel started the 8008 project to build this chip. A bit later, Texas Instruments also agreed to build a single-chip processor for Datapoint. Both chips were designed to be compatible with the Datapoint 2200's 8-bit instruction set and architecture.

The 8008 processor was first described publicly in "Electronic Design", Oct 25, 1970. Although Intel claimed the chip would be delivered in January 1971, actual delivery was more than a year later in April, 1972.

The 8008 processor was first described publicly in "Electronic Design", Oct 25, 1970. Although Intel claimed the chip would be delivered in January 1971, actual delivery was more than a year later in April, 1972.

Around March 1971, Texas Instruments completed their processor chip, calling it the TMC 1795. After delaying the project, Intel finished the 8008 chip later, around the end of 1971. For a variety of reasons, Datapoint rejected both microprocessors and built a faster CPU based on newer TTL chips including the 74181 ALU chip. TI tried unsuccessfully to market the TMC 1795 processor to companies such as Ford, but ended up abandoning the processor, focusing on highly-profitable calculator chips instead. Intel, on the other hand, marketed the 8008 as a general-purpose microprocessor, which eventually led to the x86 architecture you're probably using right now. Although TI was first with the 8-bit processor, it was Intel who made their chip a success, creating the microprocessor industry.

A family tree of the 8008 and some related processors. Black arrows indicate backwards compatibility. Light arrows indicate significant architecture changes.

A family tree of the 8008 and some related processors. Black arrows indicate backwards compatibility. Light arrows indicate significant architecture changes.

The diagram above summarizes the "family tree" of the 8008 and some related processors.10 The Datapoint 2200's architecture was used in the TMC 1795, the Intel 8008, and the next version Datapoint 220011. Thus, four entirely different processors were built using the Datapoint 2200's instruction set and architecture. The Intel 8080 processor was a much-improved version of the 8008. It significantly extended the 8008's instruction set and reordered the machine code instructions for efficiency. The 8080 was used in groundbreaking early microcomputers such as the Altair and the Imsai. After working on the 4004 and 8080, designers Federico Faggin and Masatoshi Shima left Intel to build the Zilog Z-80 microprocessor, which improved on the 8080 and became very popular.

The jump to the 16-bit 8086 processor was much less evolutionary. Most 8080 assembly code could be converted to run on the 8086, but not trivially, as the instruction set and architecture were radically changed. Nonetheless, some characteristics of the Datapoint 2200 still exist in today's x86 processors. For instance, the Datapoint 2200 had a serial processor, processing bytes one bit at a time. Since the lowest bit needs to be processed first, the Datapoint 2200 was little-endian. For compatibility, the 8008 was little-endian, and this is still the case in Intel's processors. Another feature of the Datapoint 2200 was the parity flag, since parity calculation was important for a terminal's communication. The parity flag has continued to the x86 architecture.

The 8008 is architecturally unrelated to Intel's 4-bit 4004 processor12. The 8008 is not an 8-bit version of the 4-bit 4004 in any way. The similar names are purely a marketing invention; during its design phase the 8008 had the unexciting name "1201".

If you want more early microprocessor history, I wrote a detailed article for the IEEE Spectrum. I also wrote a post about TI's TMC 1795.

How the 8008 fits into the history of semiconductor technology

The 4004 and 8008 both used silicon-gate enhancement-mode PMOS, a semiconductor technology that was only used briefly. This puts the chips at an interesting point in chip fabrication technology.

The 8008 (and modern processors) uses MOS transistors. These transistors had a long path to acceptance, being slower and less reliable than the bipolar transistors used in most computers of the 1960s. By the late 1960s, MOS integrated circuits were becoming more common; the standard technology was PMOS transistors with metal gates. The gates of the transistor consisted of metal, which was also used to connect components of the chip. Chips essentially had two layers of functionality: the silicon itself, and the metal wiring on top. This technology was used in many Texas Instruments calculator chips, as well as the TMC 1795 chip (the chip that had the same instruction set as the 8008).

A key innovation that made the 8008 practical was the self-aligned gate—a transistor using a gate of polysilicon rather than metal. Although this technology was invented by Fairchild and Bell Labs, it was Intel that pushed the technology ahead. Polysilicon gate transistors had much better performance than metal gate (for complex semiconductor reasons). In addition, adding a polysilicon layer made routing of signals in the chip much easier, making the chips denser. The diagram below shows the benefit of self-aligned gates: the metal-gate TMC 1795 is bigger than the 4004 and 8008 chips combined.

Intel's 4004 and 8008 processors are much denser than Texas Instruments' TMC 1795 chip, largely due to their use of self-aligned gates.

Intel's 4004 and 8008 processors are much denser than Texas Instruments' TMC 1795 chip, largely due to their use of self-aligned gates. TMC 1795 die photo courtesy of Computer History Museum.

Shortly afterwards, semiconductor technology improved again with the use of NMOS transistors instead of PMOS transistors. Although PMOS transistors were easier to manufacture initially, NMOS transistors are faster, so once NMOS could be fabricated reliably, they were a clear win. NMOS led to more powerful chips such as the Intel 8080 and the Motorola 6800 (both 1974). Another technology improvement of this time was ion-implantation to change the characteristics of transistors. This allowed the creation of "depletion-mode" transistors for use as pull-up resistors. These transistors improved chip performance and reduced power consumption. They also allowed the creation of chips that ran on standard five-volt supplies.13 The combination of NMOS transistors and depletion-mode pull-ups was used for most of the microprocessors of the late 1970s and early 1980s, such as the 6502 (1975), Z-80 (1976), 68000 (1979), and Intel chips from the 8085 (1976) to the 80286 (1982).

In the mid 1980s, CMOS took over, using NMOS and PMOS transistors together to dramatically reduce power consumption, with chips such as the 80386 (1986), 68020 (1984) and ARM1 (1985). Now almost all chips are CMOS.14

As you can see, the 1970s were a time of large changes in semiconductor chip technology. The 4004 and 8008 were created when the technological capability intersected with the right market.

How to take die photos

In this section, I explain how I got the photos of the 8008 die. The first step is to open the chip package to expose the die. Most chips come in epoxy packages, which can be dissolved with dangerous acids.

The 8008 microprocessor in a ceramic package

The 8008 microprocessor in a ceramic package

Since I would rather avoid boiling nitric acid, I took a simpler approach. The 8008 is also available in a ceramic package (above), which I got on eBay. Tapping the chip along the seam with a chisel pops the two ceramic layers apart. The photo below shows the lower half of the ceramic package, with the die exposed. Most of the metal pins have been removed, but their positions in the package are visible. To the right of the die is a small square; this connects ground (Vcc) to the substrate. A couple of the tiny bond wires are still visible, connected to the die.

Inside the package of the 8008 microprocessor, the silicon die is visible.

Inside the package of the 8008 microprocessor, the silicon die is visible.

Once the die is exposed, a microscope can be used to take photographs. A standard microscope shines the light from below, which doesn't work well for die photographs. Instead, I used a metallurgical microscope, which shines the light from above to illuminate the chip.

I took 48 photographs through the microscope and then used the Hugin stitching software to combine them into one high-resolution image (details). Finally, I adjusted the image contrast to make the chip's structures more visible. The original image (which is approximately what you see through the microscope) is below for comparison.

Die photograph of the 8008 microprocessor

Die photograph of the 8008 microprocessor

Conclusion

I took detailed die photos of the 8008 that reveal the circuitry it used. While the 8008 wasn't the first microprocessor or even the first 8-bit microprocessor, it was truly revolutionary, triggering the microprocessor revolution and leading to the x86 architecture that dominates personal computers today. In future posts, I plan to explain the 8008's circuits in detail to provide a glimpse into the roots of todays computers.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. Or you can use the RSS feed.

Notes and references

  1. According to the oral history of the 8008, photos of the 8008 were obtained in October / November 1971 (page 6). Chip designer Federico Faggin mentions that toward the end of 1971, "everything was working except for a few errors." Faggin then debugged a problem with the dynamic memory losing data, making it ready for production (page 9). 

  2. Using the carry look ahead circuit avoids the delay from a standard ripple-carry adder, where the carries propagate through the sum. 

  3. The 8008's ALU supports eight operations: add, subtract, add with carry, subtract with carry, AND, OR, XOR, and compare. It also implements left and right shift and rotate operations. The 8008 also has increment and decrement instructions, extending the Datapoint 2200's instruction set

  4. Because silicon has higher resistance than polysilicon, most chips use the polysilicon and metal layers for wiring, not the silicon layer. The 4004 and 8008 chips are unusual in that they prefer to use the silicon layer for wiring rather than polysilicon. I expect this was due to the recent introduction of polysilicon: before polysilicon, routing needed to be done in the silicon layer and perhaps the chip designers were sticking with the older layout techniques. 

  5. The 8008 required 20 support chips according to chip architect Federico Faggin. In contrast, the 4004 and earlier MOS computers such as the Four Phase and CADC were designed with a small number of MOS chips that worked together without extra "glue chips". In this sense, the 8008 was a step backwards architecturally, saying "here's the CPU, you figure out how to make a computer out of it." 

  6. For details on Intel's insistence on 16 pins, see Oral History of Federico Faggin, page 55-56. It was only when the 1103 memory chip required 18 pins that Intel reluctantly moved beyond 16 pins. And that was treated by Intel like "the sky had dropped from heaven," resulting in "so many long faces". 

  7. If two metal lines need to cross, one of them can be routed under the other by using the polysilicon layer. To be low resistance, this cross-under must be relatively wide, so cross-unders are avoided if possible. 

  8. The 8008 registers use the "3T1C" cell: three transistors and one capacitor (details). The circuit doesn't physically contain a separate capacitor, but uses the gate capacitance of the transistor. One unusual feature of the 8008 cell is it uses one wire for both reading and writing the bit, while the typical 3T cell has separate wires for reading and writing. The 4004 had separate wires, but the design changed slightly in the 8008. 

  9. Pull-up resistors in later chips such as the 6502 were implemented using depletion-mode NMOS transistors. These yielded more faster, more efficient logic. They were also wired differently, with the gate connected to the output rather than the power rail. 

  10. The 8008 architecture and the evolution of Intel's microprocessors are discussed in detail in Intel Microprocessors: 8008 to 8086

  11. The second version of the Datapoint 2200 had a totally new implementation of the processor, still built from TTL chips. While the first version had a serial ALU (processing one bit at a time), the second version operated in parallel using 74181 ALU chips. As a result, the second version was much faster. 

  12. The extensive 4004 Anniversary Project has reverse-engineered the 4004 processor. The 4004 schematic is here

  13. The Motorola 6800 microprocessor originally used enhancement-mode transistors. To operate off a single +5V supply, it had a voltage-doubler circuit on the chip. 

  14. Interestingly, in 2007 Intel started using metal gates again in order to scale transistors further (details). In a way, semiconductor technology has gone full circle, back to metal gates, although now unusual metals such as hafnium are used.