Inside an unusual 7400-series chip implemented with a gate array

When I look inside a chip from the popular 7400 series, I know what to expect: a fairly simple die, implemented in a straightforward, cost-effective way. However, when I looked inside a military-grade chip built by Integrated Device Technology (IDT)4 I found a very unexpected layout: over 1500 transistors in an orderly matrix. Even stranger, most of the die is wasted: less than 20% of these transistors are used, forming scattered circuits connected by thin metal wires.

In this blog post, I look at this chip in detail, describe its gates, and explain how it implements the "1-of-4" decoder function. I also discuss why it sometimes makes sense to build chips with a gate array design such as this, despite the inefficiency.

A photo of the tiny silicon die in its package.  This chip is the IDT 54FCT139ALB dual 1-of-4 decoder.  Click this image (or any other) for a larger version.

A photo of the tiny silicon die in its package. This chip is the IDT 54FCT139ALB dual 1-of-4 decoder. Click this image (or any other) for a larger version.

In the photo below, you can see the silicon die in more detail, with the silicon appearing pink. The main circuitry is implemented in the nine rows that form the gate array, a grid of 1584 transistors. The tiny dark rectangles are transistors of two types, NMOS and PMOS, that work together to implement CMOS logic circuits. At this scale, the metal wiring is visible as faint gray lines and smudges, but most of the transistors are unconnected. Surrounding the gate array are 22 input/output (I/O) blocks each with a square bond pad. As with the transistors, many of these I/O blocks are unused. Fourteen of these bond pads have tiny metal bond wires (the thick black lines) that connect the silicon die to the chip's external pins. Finally, the pairs of bond wires at the center left and center right provide ground and power connections for the chip.

Closeup die photo.

Closeup die photo.

The photo below zooms in on three rows of circuitry in the chip. The large dark rectangles are pairs of transistors, with two lines of transistors in each row of circuitry. At the top and bottom of each row, the thick horizontal white lines are metal wiring that provides power and ground. In each row, one line of transistors holds PMOS transistors, next to the power wiring, while the other line holds NMOS transistors, next to the ground wiring. (The orientation flips in each successive row, so it isn't obvious which transistors are which unless you check the power connections at the end of the row.)

A closeup of the die.

A closeup of the die.

The transistors are wired into gates by the metal layers, the white lines. The gates are connected by horizontal and vertical wiring using the wiring channels between the rows. This wiring style is very similar to standard-cell logic. However, unlike standard-cell logic, the underlying transistor grid is fixed, resulting in wasted transistors. In the image above, most of the transistors in the middle row are used, while the top row is unused and the bottom row is mostly unused.

The diagram below shows the structure of one of the transistor blocks, which contains two tall, thin MOS transistors. The vertical metal contacts connect to the sources and drains of the transistors, with the two transistors sharing the middle contact. (On an integrated circuit, the source and drain of a transistor are identical, so it is arbitrary which side is the source and which is the drain.) The short horizontal metal contacts at the top connect to the gates of the two transistors; the gates are made of polysilicon, which is barely visible in the die photo. The gates partition the active silicon (green), forming the transistors. The gate width is approximately 1 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

A block of two transistors as they appear on the die, along with a diagram showing the structure. The bar indicates a length of 10 µm.

NAND gate

In this section, I'll explain the construction of one of the NAND gates on the die. The NAND gate below uses four transistors, two NMOS transistors on the top and two PMOS transistors on the bottom. The white lines are the metal wiring, forming two layers. Most of the wiring (including power and ground) is in the lower (M1) layer. The slightly wider and darker vertical segments are the upper (M2) layer. The circles connect the metal layers when they join, or connect the metal layer to the underlying silicon or polysilicon. With two metal layers, it's a bit tricky to see how the wiring is connected. The A and B inputs each connect to two transistor gates. The transistor group at the top is connected to ground on the right, with the output on the left. The transistor group on the bottom is connected to Vcc on the left and right, with the output in the middle. This has the effect of putting the upper transistors in series and the lower transistors in parallel.

A NAND gate on the die.

A NAND gate on the die.

Below, I've drawn the schematic of the NAND gate. On the left, the layout of the schematic matches the die layout above. On the right, I've redrawn the schematic with a more traditional layout. To understand its operation, note that a PMOS transistor (top on the right schematic) turns on when the input is low, while an NMOS transistor (bottom on the right) turns on when the input is high. When both inputs are high, the two NMOS transistors turn on, connecting ground to the output, pulling it low. When either input is low, one of the PMOS transistors turns on, pulling the output high. Thus, the circuit implements the NAND function. The NMOS and PMOS transistors operate in a complementary fashion, giving CMOS (Complementary MOS) its name.

Schematic of a NAND gate.

Schematic of a NAND gate.

NOR gate

In this section, I'll explain the layout of one of the NOR gates on the die, shown below. This gate is twice as large as the previous NAND gate so it can provide twice the output current.1 The NOR gate uses eight transistors, four PMOS transistors in the upper half and four NMOS transistors in the lower half. (Note that Vcc and ground are flipped compared to the previous gate, as are the NMOS and PMOS transistors.) The two transistors in each block are wired in parallel to produce more current for the output. (A out is the same signal as A in, exiting the block at the top to connect to other circuitry.)

A NOR gate on the die.

A NOR gate on the die.

The schematic below shows the wiring of the eight transistors. The schematic layout corresponds to the physical layout to make it easier to map between the image and the schematic. The upper transistor groups are wired in series, while the lower transistor groups are wired in parallel.

Schematic corresponding to the gate above.

Schematic corresponding to the gate above.

The schematic below has been redrawn to make the functionality clearer, and the parallel transistors have been removed. If either input is high, one of the NMOS transistors on the bottom will turn on and pull the input low. If both inputs are low, the two PMOS transistors will turn on and pull the input high. This provides the desired NOR function.

Simplified NOR gate schematic.

Simplified NOR gate schematic.

Note that the NAND and NOR gates have similar but opposite schematics. In the NAND gate, the NMOS transistors are in series while the PMOS transistors are in parallel. In the NOR gate, the roles of the transistors are swapped.

The chip's circuit

The chip I examined is a "dual 1-of-4 decoder with enable".2 The decoding function takes a two-bit input and selects one of four output lines depending on the binary value. The enable line must be low to activate this operation; otherwise all four output lines are disabled. The chip contains two of these decoders, which is why it is called a dual decoder. In total, the chip contains 18 logic gates,3 so it is very simple, even by 1990s standards.

I reverse-engineered the chip and created the schematic below, showing one of the dual units. Each NAND gate matches one of the four input possibilities to drive one of the four outputs. The NOR gates support the ENABLE signal, blocking the outputs unless ENABLE is active (i.e. low).

Reverse-engineered schematic of half the chip.

Reverse-engineered schematic of half the chip.

The chip uses a general-purpose I/O block (below) for each pin, that can be used as an input or an output depending on how it is wired. Each block contains two large drive transistors: an NMOS transistor to pull the output low and a PMOS transistor to pull the output high. The I/O block has separate control lines for the two output transistors. (At the bottom of the image below, two thin metal wires drive the high-side and low-side transistors.) This permits tri-state logic: if neither transistor is energized, the output is left floating. The gate array drives the output transistors with high-current inverter, constructed from multiple transistors in parallel. (This is why the schematic shows more inverters than may seem necessary.)

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

One of the 22 I/O blocks on the die. Each I/O block is associated with a bond pad, where a bond wire can be connected to an external pin.

When used as an input, the pad is wired to the surrounding circuitry slightly differently, connecting to input protection diodes (not shown on the schematic). Thus, the functionality of the I/O blocks can be changed by modifying the metal layers, without changing the underlying silicon.

Some 7400-series history

The earliest logic integrated circuits used resistors and transistors internally, so they were called RTL (Resistor Transistor Logic), but RTL had significant performance problems. RTL was rapidly replaced by Diode Transistor Logic (DTL) and then Transistor Transistor Logic (TTL). In 1964, Texas Instruments created a line of TTL integrated circuits for military applications called the SN5400 series. This was shortly followed by the commercial-grade SN7400 series.

The 7400 series of integrated circuits was inexpensive, fast, and easy to use. The line started with simple logic circuits such as four NAND gates on a chip, and moved into more complex chips such as counters, shift registers, and ALUs. The 7400 series became very popular in the 1970s and 1980s, used by electronics hobbyists and high-performance minicomputers alike. These chips became essential building blocks and "glue" logic for microcomputers, heavily used in the Apple II for instance.

The original 7400 series branched into dozens of families with different performance characteristics but the same functionality. The 74LS (low-power Schottky) family, for instance, became very popular as it both improved speed and reduced power consumption. In the mid-1970s, 7400-series chips were introduced that used CMOS circuitry instead of TTL for dramatically lower power consumption. This CMOS family, the 74C series, was followed by numerous other CMOS families.

That brings us to the chip I examined, a member of IDT's 74FCT (Fast CMOS TTL-compatible) line of chips, introduced in the mid-1980s. (Specifically, it is in the 54FCT family because it handles a wider temperature range.) These chips used advanced CMOS technology to provide high speed, low power consumption, and as a military option, radiation tolerance.

Conclusions

Why would you make a chip in this inefficient way, using a gate array that wastes most of the die area? The motivation is that most of the design cost can be shared across many different part types. Each step of integrated circuit processing requires an expensive mask for photolithography. With a gate array, all chip types use the same underlying silicon and transistors, with custom masks just for the two metal layers. In comparison, a fully custom chip might require eight custom masks, which costs much more. The tradeoff is that gate array chips are larger so the manufacturing cost is higher per device.5 Thus, a gate array design is better when selling chips in relatively small quantities, while a custom design is cheaper when mass-producing chips.6 IDT focused on the high-performance and military market rather than the commodity chip market, so gate arrays were a good fit.

One last thing. The packaging of this chip is very interesting since it is mounted on a multi-chip module. The module also contains two Atmel EEPROMs. Presumably the decoder chip decodes address bits to select one of the EEPROMs.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

The multi-chip module containing the decoder chip along with an AT28HC64 EPROM on either side.

Thanks to Don S. for providing the chip. Follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff.

Notes and references

  1. Properly sizing the transistors in a gate is important for performance. Since the transistors in the gate array are all the same size, multiple transistors are used in parallel to get the desired current. The 1999 book Logical Effort describes a methodology for maximizing the performance of CMOS circuits by correctly sizing the transistors. 

  2. The part number is "IDT 54FCT139ALB". "54" indicates the chip operates under an enhanced temperature range of -55°C to +125°C. The "A" indicates the chip is 35% faster than the base series (but not as fast as "C"). "L" indicates the chip is packaged in a leadless chip carrier, the square package shown at the top of the article. Finally, "B" indicates the chip was tested according to military standards: MIL-STD-883, Class B. 

  3. The chip contains 18 logic gates according to the functional schematic in the datasheet (below). The implementation actually uses 52 logic gates by my count (2×26) because the implementation doesn't exactly match the schematic. In particular, the datasheet shows three-input NAND gates, but the chip uses a NAND gate and a NOR gate along with inverters. The chip also has additional inverters to drive the output transistors in each I/O block.

    Schematic of the chip from the datasheet.

    Schematic of the chip from the datasheet.

     

  4. Integrated Device Technology was a spinoff from Hewlett Packard that started in 1980. IDT built advanced CMOS chips including fast static RAM and microprocessors (bit-slice and MIPS). It became part of Renesas in 2018. A very detailed 1986 profile of IDT is here. IDT's logo is pretty cool, combining a chip wafer and calculus.

    The logo of Integrated Device Technology.

    The logo of Integrated Device Technology.

    Here's how the logo looks on the die:

    Closeup of the die showing the IDT logo.

    Closeup of the die showing the IDT logo.

    The die also has the initials of the designers, along with some mysterious symbols. One looks like the Chinese character "正". (Update: based on a Twitter comment, these symbols are probably tally marks, indicating the revision count for each mask.)

    Closeups of two parts of the die.

    Closeups of two parts of the die.

  5. Integrated circuit manufacturing is partitioned into the "front end of line", where the transistors are created on the silicon wafer, and the "back end of line", where the metal wiring is put on top to connect the transistors. With a gate array construction, the front end of line steps create generic gate array wafers. The back end of line steps then connect the transistors as desired for a particular component. The gate array wafers can be produced in large quantities and stored, and then customized for specific products in smaller quantities as needed. This reduces the time to supply a particular chip type since only the back end of line process needs to take place. 

  6. The IDT High-Speed CMOS Logic Design Guide briefly mentions the gate array design. The FCT family was built from two sizes of gate arrays, "4004" for smaller chips and "8000" for larger chips. Later, IDT shrunk the original "Z-step" gate arrays to smaller, higher-performance "Y-step" arrays. They then customized some of the devices to create the "W-step" devices. Looking at the markings on the die, we see that this chip uses the "4004Y" gate array.

    The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

    The die shows gate slice 4004Y and part 4139Y (indicating 54139 or 74139). The numbers are slightly obscured by a bond wire.

     

The Intel 8088 processor's instruction prefetch circuitry: a look inside

In 1979, Intel introduced the 8088 microprocessor, a variant of the 16-bit 8086 processor. IBM's decision to use the 8088 processor in the IBM PC (1981) was a critical point in computer history, leading to the dominance of the x86 architecture that continues to the present.1 One way that the 8086 and 8088 increased performance was by prefetching: the processor fetches instructions from memory before they are needed, so the processor can execute them without waiting on the relatively slow memory. I've been reverse-engineering the 8088 from die photos and this blog post discusses what I've uncovered about the prefetch circuitry.

The die photo below shows the 8088 microprocessor under a microscope. The metal layer on top of the chip is visible, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. I've labeled the key functional blocks; this article focuses on the prefetch queue components highlighted in red. The components in purple also play a role, and will be discussed below. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below. The BIU handles memory accesses, while the Execution Unit (EU) executes instructions. In particular, the BIU fetches instructions, which are transferred from the prefetch queue to the Execution Unit via the queue bus.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

The 8086 and 8088 processors present the same 16-bit architecture to the programmer. The key difference is that the 8088 has an 8-bit data bus for communication with memory and I/O, rather than the 16-bit bus of the 8086. The 8088's narrower bus reduced performance, since the processor only transfers one byte at a time rather than two. However, the 8-bit bus enabled cheaper computer hardware. The 8-bit bus was also a better match for hardware based on the older but popular 8-bit Intel 8080 and 8085 processors, allowing the reuse of 8-bit I/O circuitry for instance. Much of the IBM PC was based on the little-known IBM DataMaster, a computer built around the Intel 8085. Thus, selecting the 8088 processor was a natural choice for the IBM PC.

For the most part, the 8086 and 8088 are very similar internally, apart from trivial but numerous layout changes on the die. The biggest differences are in the Bus Interface Unit, the circuitry that communicates with memory and I/O devices, since this circuitry handles 16 bits in the 8086 versus 8 bits in the 8088. There are a few microcode differences between the two chips. One interesting change is that for performance reasons the 8088 has a smaller prefetch queue than the 8086 (four bytes instead of six). (I wrote about the 8086's prefetch circuity earlier.)

Prefetching and the architecture of the 8086 and 8088

The 8086 and 8088 were introduced at an interesting point in microprocessor history, when memory was becoming slower than the CPU. For the first microprocessors, the speed of the CPU and the speed of memory were comparable.2 However, as processors became faster, the speed of memory failed to keep up. The 8086 was probably the first microprocessor to prefetch instructions to improve performance. While modern microprocessors have megabytes of fast cache3 to act as a buffer between the CPU and much slower main memory, the 8088 has just 4 bytes of prefetch queue. However, this was enough to substantially increase performance.

Prefetching had a major impact on the design of the 8086 and thus the 8088. Earlier processors such as the 6502, 8080, or Z80 were deterministic: the processor fetched an instruction, executed the instruction, and so forth. Memory accesses corresponded directly to instruction fetching and execution and instructions took a predictable number of clock cycles. This all changed with the introduction of the prefetch queue. Memory operations became unlinked from instruction execution since prefetches happen as needed and when the memory bus is available.

To handle memory operations and instruction execution independently, the implementors of the 8086 and 8088 divided the processors into two processing units: the Bus Interface Unit (BIU) that handles memory accesses, and the Execution Unit (EU) that executes instructions. The Bus Interface Unit contains the instruction prefetch queue; it supplies instructions to the Execution Unit via the Q (queue) bus. The BIU also contains an adder (Σ) for address calculation, adding the segment register base to an address offset, among other things. The Execution Unit is what comes to mind when you think of a processor: it has most of the registers, the arithmetic/logic unit (ALU), and the microcode that implements instructions. The segment registers (CS, DS, SS, ES) and the Instruction Pointer (IP) are in the Bus Interface Unit since they are directly involved in memory accesses, while the general-purpose registers are in the Execution Unit.

Block diagram of the 8088 processor.
This diagram differs from most 8088 block diagrams because it shows the actual physical implementation, rather than the programmer's view of the processor.
The "Internal Communication Registers" consist of the Indirect Register (IND) and the Operand Register (OPR). These hold a memory address and memory data value respectively.
From The 8086 Family User's Manual page 243.

Block diagram of the 8088 processor. This diagram differs from most 8088 block diagrams because it shows the actual physical implementation, rather than the programmer's view of the processor. The "Internal Communication Registers" consist of the Indirect Register (IND) and the Operand Register (OPR). These hold a memory address and memory data value respectively. From The 8086 Family User's Manual page 243.

It may seem inefficient for the Bus Interface Unit to have its own adder instead of using the ALU, but there are reasons for the separate adder. First, every memory access uses the adder at least once to add the segment base and offset. The adder is also used to increment the PC or index registers. Since these operations are so frequent, they would create a bottleneck if they used the ALU. Second, since the Execution Unit and the Bus Interface Unit run asynchronously with respect to each other, it would be complicated to share the ALU without conflicts.

Prefetching had another major but little-known effect on the 8086 architecture: the designers were considering making the 8086 a two-chip microprocessor. Prefetching, however, required a one-chip design because the number of control signals required to synchronize prefetching across two chips exceeded the package pins available. This became a compelling argument for the one-chip design that was used for the 8086.4 (The unsuccessful Intel iAPX 432, which was under development at the same time, ended up being a two-chip processor: one to fetch and decode instructions, and one to execute them.)

Implementing the queue

The 8088's instruction prefetch queue is implemented with four 8-bit queue registers along with two hardware "pointers" into the queue. One two-bit counter keeps track of the current read position from 0 to 3, i.e. the queue register that will provide the next instruction byte. The second counter keeps track of the current write position, i.e. the queue register that will receive the next instruction from memory.5 As bytes are fetched from the queue, the read pointer advances. As bytes are added to the queue, the write pointer advances.

The diagram below shows an example queue configuration with two prefetched bytes. The middle two queue registers (Q1 and Q2) hold data. The read pointer indicates that the Execution Unit will get its next byte from Q1. The write pointer indicates that the next prefetched byte will go into Q3.

A queue configuration with two bytes in the prefetch queue. Bytes in blue hold prefetched data.

A queue configuration with two bytes in the prefetch queue. Bytes in blue hold prefetched data.

The diagram below shows how the queue pointers can wrap around. In this configuration, two more bytes have been written to the queue (Q3 and Q0), so the queue is full. The write pointer now points to Q1, the same as the read pointer.

A queue configuration with four bytes in the prefetch queue.

A queue configuration with four bytes in the prefetch queue.

There is an important ambiguity, however. Suppose that four bytes are read from the queue, so the read pointer advances four positions, wrapping around back to Q1. The queue is now empty, as shown below, but the pointers have the same position as the full case above. Thus, if the read pointer and the write pointer both point to the same position, the queue may be empty or full. To distinguish these cases, a flip-flop is set if the queue enters the empty state. This flip-flop generates a signal that Intel called MT (empty).

A queue configuration with the queue empty.

A queue configuration with the queue empty.

To determine how many bytes are in the queue, the queue circuitry uses a two-bit queue length value, along with the MT flip-flop value to distinguish the empty state. Conceptually, the queue length is generated by subtracting the read position from the write position. However, the implementation does not use a standard subtraction circuit, but instead uses hardcoded logic to determine the two bits of the length, as shown below.

The circuitry to determine the queue length.

The circuitry to determine the queue length.

The low bit of the length is the XOR of the two positions. In NMOS logic (used by the 8088), an AND-NOR gate is easy to implement, while an XOR gate is difficult. Thus, XOR is implemented as shown in the top circuit. (You can verify that if one input is 1 and the other is 0, the output is 1.) The high-order bit of the length is also based on an AND-NOR gate, one with six inputs. Each input is a combination of read and write positions that yields an output bit 1; each input is computed by a NOR gate, which I haven't drawn.6 As a result, the amount of logic circuitry to compute the length is fairly large.

The diagram below zooms in on the queue control circuitry on the die, with the main flip-flops and circuitry labeled. The circuitry in the middle computes the queue length with the 6-input NOR gate stretched across the whole region. The flip-flops for the read and write positions are in the lower region. Despite the relative simplicity of the queue circuits, they take up a substantial part of the die. Compared to modern chips, the density of the 8088 is very low; you can almost see the flip-flops with the naked eye. But this isn't all the circuitry as prefetching also required queue registers and memory cycle control circuitry. Thus, prefetching was a moderately expensive feature for the 8088, as far as die area.

The queue and prefetch circuitry on the die. The metal layer has been removed for the closeup to show the silicon of the underlying transistors.

The queue and prefetch circuitry on the die. The metal layer has been removed for the closeup to show the silicon of the underlying transistors.

The loader

To decode and execute an instruction, the Execution Unit must get instruction bytes from the Bus Interface Unit, but this is not entirely straightforward. The main problem is that the queue can be empty, in which case instruction decoding must block until a byte is available from the queue. The second problem is that instruction decoding is relatively slow so it is pipelined. For maximum performance, the decoder needs a new byte before the current instruction is finished. A circuit called the "loader" solves these problems by providing synchronization between the prefetch queue and the instruction decoder. The loader uses a small state machine to efficiently fetch bytes from the queue at the right time and to provide timing signals to the decoder and microcode engine.

In more detail, as the loader requests the first two instruction bytes from the prefetch queue, it generates two timing signals that control the microcode execution. The FC (First Clock) indicates that the first instruction byte is available, while the SC (Second Clock) indicates the second instruction byte. Note that the First Clock and Second Clock are not necessarily consecutive clock cycles because the prefetch queue could be empty or contain just one byte, in which case the First Clock and/or Second Clock would be delayed. The instruction decoding circuitry and the microcode engine are controlled by the First Clock and Second Clock signals, so they remain synchronized with the bytes supplied by the prefetch queue.

At the end of a microcode sequence, the Run Next Instruction (RNI) micro-operation causes the loader to fetch the next machine instruction. However, fetching and decoding the next instruction is a bit slow so microcode execution would be blocked for a cycle. In many cases, this slowdown can be avoided: if the microcode knows that it is one micro-instruction away from finishing, it issues a Next-to-last (NXT) micro-operation so the loader can start loading the next instruction. This achieves a degree of pipelining in most cases; fetching the next instruction is overlapped with finishing the execution of the previous instruction.

The state machine for the 8086/8088 "loader" circuit.
The 1BL signal indicates a 1-byte instruction implemented in logic rather than microcode.
From patent US4449184.

The state machine for the 8086/8088 "loader" circuit. The 1BL signal indicates a 1-byte instruction implemented in logic rather than microcode. From patent US4449184.

The diagram above shows the state machine for the loader. I won't explain it in detail, but essentially it keeps track of whether it is waiting for a First Clock byte or a Second Clock byte, and if it is performing a fetch in advance (NXT) or at the end of an instruction (RNI). The state machine is implemented with two flip-flops to support its four states.

Microcode and the prefetch queue

The loader takes care of fetching an instruction that consists of an opcode byte and a Mod R/M (addressing mode) byte. However, many instructions have additional bytes or don't follow this format For example, an opcode such as "ADD AX" can be followed by an 8- or 16-bit immediate value, adding that value to the AX register. Or a "move memory to AX" instruction can be followed by a 16-bit memory address The microcode uses a separate mechanism for fetching these instruction bytes from the queue. Specifically, each micro-instruction contains a source register and a destination register that specify a data move. By specifying "Q" (the queue) as the source, a byte is fetched from the prefetch queue. If the queue is empty, microcode execution blocks until the Bus Interface Unit loads a byte into the prefetch queue. Thus, the complexity of instruction fetching and the prefetch queue is invisible to the microcode.7

A jump, subroutine call, or other control flow change causes the prefetch queue to be flushed since the queue contents are no longer useful. This is accomplished in microcode with the FLUSH micro-instruction, which resets the queue read and write pointers and sets the MT (empty) flip-flop. Note that the queue is flushed even if the target address is in the queue, for example if you jump one byte ahead.

One complication due to the prefetch queue is that the processor's Instruction Pointer points to the next instruction to be fetched, not the next instruction to be executed. This becomes a problem for a subroutine call, which needs to push the return address. It is also a problem for a relative jump, which is computed from the current instruction. The solution is the CORR micro-instruction, which corrects the Instruction Pointer by subtracting the queue length to determine the current execution position. This is implemented by the Bus Interface Unit, which holds correction constants in the Constant ROM, and subtracts them using the address adder (not the ALU).8

The queue registers

The 8086 and 8088 partition the registers into upper registers (in the Bus Interface Unit) and lower registers (in the Execution Unit). The upper registers are the registers associated with memory accesses (e.g. Instruction Pointer, segment registers) while the lower registers are more general purpose (e.g. AX, BX, SI, SP). The upper registers are connected to two 16-bit internal buses: the B bus and the C bus.

The queue registers are physically part of the upper registers, but are wired into the buses slightly differently, as shown below. In particular, the 8088's queue registers are written 8 bits at a time from the C bus. (In contrast, the 8086's queue registers can be written 16 bits at a time to support two-byte prefetches.) When accessing the queue, the queue registers are read 16 bits at a time, but only one byte is transferred to the Q bus for instruction processing.9

The queue registers in the 8088.

The queue registers in the 8088.

The diagram below shows how the queue registers appear on the die, comparing the six-byte prefetch queue in the 8086 (top) to the four-byte 8088 queue (bottom). The 8086 prefetch registers are structured as three rows of 16-bit registers, while the 8088 prefetch registers are structured as four rows of 8-bit registers. In both cases, each bit is stored in a cross-coupled pair of inverters. The bit lines (not present) are vertical, while the control lines to select a register are horizontal. The layout is different between the processors to support 16-bit versus 8-bit writes. Note the empty space at the bottom of the 8088 registers. Because the rest of the chips are mostly the same, the 8088 couldn't be "compacted" to avoid this wasted space.

The prefetch registers in the 8086 (top) and 8088 (bottom). For the 8086, the metal and polysilicon layers were removed, exposing the underlying silicon. For the 8088, the polysilicon and silicon are visible.

The prefetch registers in the 8086 (top) and 8088 (bottom). For the 8086, the metal and polysilicon layers were removed, exposing the underlying silicon. For the 8088, the polysilicon and silicon are visible.

Intel used simulations to determine the best queue sizes for the 8086 and 8088, balancing the performance cost of prefetching against the benefit. (The cost is that prefetching makes the bus unavailable for other memory or I/O operations.) The prefetch queue is discarded on a jump instruction or other change of control flow, causing the prefetched bytes to be wasted. Thus, as the queue gets longer, the chance of discarding a prefetched byte becomes larger, so the potential benefit of prefetching becomes smaller. Since the 8088 prefetches one byte at a time, compared to two bytes at a time on the 8086, prefetching on the 8088 costs twice as much as on the 8086 in terms of bus cycles used per byte. This changes the tradeoffs in favor of a shorter queue.

Because of the difference in queue lengths, the queue control circuitry is different between the 8086 and 8088. In particular, the 8086 needs three-bit counters for the read and write positions, while the 8088 uses two-bit counters. Because of this, the length computation circuitry is also different between the processors.

I plan to continue reverse-engineering the 8088 die so follow me on Twitter @kenshirriff or RSS for updates. I've also started experimenting with Mastodon recently as @oldbytes.space@kenshirriff. If you're interested in the 8086, I wrote about the 8086 die, its die shrink process and the 8086 registers earlier.

Notes and references

  1. Whenever I mention x86's domination of the computing market, people bring up ARM, but ARM has a lot more market share in people's minds than in actual numbers. One research firm says that ARM has 15% of the laptop market share in 2023, expected to increase to 25% by 2027. (Surprisingly, Apple only has 90% of the ARM laptop market.) In the server market, just an estimated 8% of CPU shipments in 2023 were ARM. See Arm-based PCs to Nearly Double Market Share by 2027 and Digitimes. (Of course, mobile phones are almost entirely ARM.) 

  2. Steve Furber, co-creator of the ARM chip, mentions that "The first integrated CPUs were coincidentally quite well matched to semiconductor memory speeds, and were therefore built without caches. This can now be seen as a temporary aberration." See VLSI Risc Architecture and Organization p77. To make this concrete, the Apple II (1977) used a MOS 6502 processor running at about 1 megahertz while its 4116 DRAM chips could perform an access in 250 nanoseconds (4 times the clock speed). The 8088 processor ran at 5-10 MHz which meant that 250 ns DRAM chips were slower than the clock speed. Nowadays, processors run at 4 GHz but DRAM access speed is about 50 nanoseconds (1/200 the clock speed). 

  3. Modern processors use caches to improve memory performance. Accessing data from a cache is faster than accessing it from main memory, but the tradeoff is that caches are much smaller than main memory. The prefetch queues in the 8086 and 8088 are similar to a cache in some ways, but there are some key differences. First, the prefetch queue is strictly sequential. If you jump ahead two bytes, even if the prefetch queue has those instruction bytes, the processor can't use them. Second, the prefetch queue can't reuse bytes. If you have a 6-byte loop, even though all the code fits in the prefetch queue, it will be reloaded every time. Third, the prefetch queue doesn't provide any consistency. If you modify an instruction in memory a couple of bytes ahead of the PC, the 8086 or 8088 will run the old instruction if it's in the queue. 

  4. The design decisions for the 8086 prefetch cache (and many other aspects of the chip) are described in: J. McKevitt and J. Bayliss, "New options from big chips," in IEEE Spectrum, vol. 16, no. 3, pp. 28-34, March 1979, doi: 10.1109/MSPEC.1979.6367944. Prefetch provided a 50% performance benefit to the 8086. 

  5. The queue read process doesn't use an explicit read operation. Instead, the selected queue register continuously puts its value onto the queue bus. When the Execution Unit uses this byte, it sends an increment signal to the queue to advance the read pointer. If the queue empty (MT) flip-flop is set, the Execution Unit will wait until a byte is ready. 

  6. The NOR gates are used as AND gates, following DeMorgan's laws. For example to produce a 1 output for write position 00 and read position 01, the logic is: NOR(write bit 1', write bit 0', read bit 1', read bit 0). Note that the bits into the NOR gate are all inverted from the "desired" values; if they are all 0, the NOR output is 1. Thus, there are also some inverters on the inputs. 

  7. Arbitrary memory reads and writes are performed directly on memory, bypassing the prefetch queue. The 8086/8088 do not provide consistency; if you modify an instruction byte in memory and the byte is in the queue, the processor will execute the old byte. (This type of self-modifying code can be used to determine the queue length, distinguishing the 8086 from the 8088 in software.) 

  8. The Constant ROM is used for more than just address correction. For example, it is also used to increment the Instruction Pointer after a prefetch. Other constants are used for the 8088's string operations, which act on a block of memory. The index registers are incremented or decremented by 1 for bytes or 2 for words. When popping a value from the stack, the stack pointer is decremented using the Constant ROM. 

  9. Are the 8088's queue registers 16 bits wide or 8 bits wide? It's ambiguous, since the registers are written 8 bits at a time, but read 16 bits at a time. This implementation was probably selected to support the 8088's 8-bit bus while reusing as much of the 8086 design as possible. In particular, the 8088 can only prefetch one byte at a time, so writes need to happen a byte at a time. Thus, there are four control lines selecting which queue byte is written. (The 8088 could write to half of a 16-bit register but that would require moving the prefetched byte to the correct half of a 16-bit bus.) On the read side, it would make sense to have four read lines, selecting one byte from the 8088's queue. However, since the 8086 already had a multiplexer to select one byte from two, the 8088 designers probably felt it was easier to keep that circuit. And with the smaller queue on the 8088, there was no need to try to save space by removing the circuit. Thus, the queue has two read-select lines and a multiplexer control line. All these lines are controlled by the write position and read position flip-flops. 

The first microcomputer: The transfluxor-powered Arma Micro Computer from 1962

What would you say is the first microcomputer?1 The Apple I from 1976? The Altair 8800 from 1974? Perhaps the lesser-known Micral N (1973) or Q1 (1972)? How about the Arma Micro Computer from way back in 1962. The Arma Micro Computer was a compact 20-pound transistorized computer, designed for applications in space such as inertial or celestial navigation, steering, radar, or engine control.

Obviously, the Arma Micro Computer is not a microcomputer according to modern definitions, since its processor was made from discrete components. But it's an interesting computer in many ways. First, it is an example of the aerospace computers of the 1960s, advanced systems that are now almost entirely forgotten. People think of 1960s computers as room-filling mainframes, but there was a whole separate world of cutting-edge miniaturized aerospace computers. (Taking up just 0.4 cubic feet, the Arma Micro Computer was smaller than an Apple II.) Second, the Arma Micro Computer used strange components such as transfluxors and had an unusual 22-bit serial architecture. Finally, the Arma Micro Computer evolved into a series of computers used on Navy ships and submarines, the E-2C Hawkeye airborne early warning plane, the Concorde, and even Air Force One.

The Arma Micro Computer

The Arma Micro Computer, with a circuit board on top. Click this image (or any other) for a larger version. Photo courtesy of Daniel Plotnick.

The Arma Micro Computer, with a circuit board on top. Click this image (or any other) for a larger version. Photo courtesy of Daniel Plotnick.

The Micro Computer used 22-bit words, which may seem like a strange size from the modern perspective. But there's no inherent need for a word size to be a power of 2. In particular, the Micro Computer was designed for mathematical calculations, not dealing with 8-bit characters. The word size was selected to provide enough accuracy for its navigational tasks.

Another strange aspect of the Micro Computer is that it was a serial machine, sequentially operating on one bit of a word at a time.2 This approach was often used in early machines because it substantially reduced the amount of hardware required: it only needs a 1-bit data bus and a 1-bit ALU. The downside is that a serial machine is much slower because each 22-bit word takes 22 clock cycles (plus 5 cycles of overhead). As a result, the Micro Computer executed just 36000 operations per second, despite its 1 megahertz clock speed.

Ad for the Arma Micro Computer (called the MICRO here). Source: Electronics, July 27, 1962.

Ad for the Arma Micro Computer (called the MICRO here). Source: Electronics, July 27, 1962.

The Micro Computer had a small instruction set of 19 instructions.3 It included multiply, divide, and square root, instructions that weren't implemented in early microprocessors. This illustrates how early microprocessors were a significant step backward in functionality. Moreover, the multiply, divide, and square root instructions used a separate arithmetic unit, so they could execute in parallel with other arithmetic instructions. Because the Micro Computer needed to interact with spacecraft systems, it had a focus on I/O, with 120 digital inputs or outputs, configured as needed for a particular mission.

Circuits

The Micro Computer was built from silicon transistors and diodes, using diode-transistor logic. The construction technique was somewhat unusual. The basic circuits were the flip-flop, the complementary buffer (i.e. an inverter), and the diode gate. Each basic circuit was constructed on a small wafer, .77 inches on a side.5 The photo below shows wafers for a two-transistor flip-flop and two diode gates. Each wafer had up to 16 connection tabs on the edges. These wafers are analogous to integrated circuits, but constructed from discrete components.

Three circuit modules from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Three circuit modules from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

The wafers were mounted on printed circuit boards, with up to 22 wafers on a board. Pairs of boards were mounted back to back with polyurethane foam between the boards to form a "sandwich", which was conformally coated. The result was a module that was protected against the harsh environment of a missile or spacecraft. The computer could handle a shock of 100 g's and temperatures of 0°C to 85°C as well as 100% humidity or a vacuum.

Because the Micro Computer was a serial machine, its bits were constantly moving. For register storage such as the accumulator, it used six magnetostrictive torsional delay lines, storing a sequence of bits as physical twists that formed pulses racing through a long coil of wire.

The photo below shows the Arma Micro Computer with the case removed. If you look closely, you can see the 22 small circuit wafers mounted on each printed circuit board. The memory driver boards and delay lines are towards the back, spaced more widely than the other printed circuit boards. The cable harness underneath the boards provides the connections between boards.4

Circuit boards inside the Arma Micro Computer. Photo courtesy of Daniel Plotnick.

Circuit boards inside the Arma Micro Computer. Photo courtesy of Daniel Plotnick.

Transfluxors

One of the most unusual parts of the Micro Computer was its storage. Computers at the time typically used magnetic core memory, with each bit stored in a tiny ferrite ring, magnetized either clockwise or counterclockwise to store a 0 or 1. One drawback of standard core memory was that the process of reading a core also cleared the core, requiring data to be written back after a read.

Diagram of Arma's memory system. From patent 3048828.

Diagram of Arma's memory system. From patent 3048828.

The Micro Computer used ferrite cores, but these were "two-aperture" cores, with a larger hole and a smaller hole, as shown above. Data is written to the "major aperture" and read from the "minor aperture". Although the minor aperture switches state and is erased during a read, the major aperture retains the bit, allowing the minor aperture to be switched back to its original state. Thus, unlike regular core memory, transfluxors don't lose their data when reading.

The resulting system is called non-destructive readout (NDRO), compared to the destructive readout (DRO) of regular core memory.6 The Micro Computer used non-destructive readout memory to ensure that the program memory remained uncorrupted. In contrast, if a program is stored in regular core memory, each instruction must be written back as it is executed, creating the possibility that a transient could corrupt the software. By using transfluxors, this possibility of error is eliminated. (In either case, core memory has the convenient property that data is preserved when power is removed, since data is stored magnetically. With modern semiconductor memory, you lose data when the power goes off.)

The photo below shows a compact transfluxor-based storage module used in the Micro Computer, holding 512 words. In total, the computer could hold up to 7808 words of program memory and 256 words of data memory. It appears that transfluxors didn't live up to their promise, since most computers used regular core memory until semiconductor memory took over in the early 1970s.

Transfluxor-based core memory module from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Transfluxor-based core memory module from the Arma Micro Computer. Image from "The Arma Micro Computer for Space Applications".

Arma's history and the path to the Micro Computer

The Arma Engineering Company was founded in 1918 and built advanced military equipment.7 Its first product was a searchlight for the Navy, followed by a gyroscopic compass and analog computers for naval gun targeting. In 1939, Arma produced the Torpedo Data Computer, a remarkable electromechanical analog computer. US submarines used this computer to track target ships and automatically aim torpedos. The Torpedo Data Computer performed complex trigonometric calculations and integration to account for the motion of the target ship and the submarine. While the Torpedo Data Computer performed well, the Navy's Mark 14 torpedo had many problems—running too deep, exploding too soon, or failing to explode—making torpedoes often ineffectual even with a perfect hit.

The Torpedo Data Computer Mark III in the USS Pampanito.

The Torpedo Data Computer Mark III in the USS Pampanito.

Arma underwent major corporate changes due to World War II. Before the war, the German-owned Bosch Company built vehicle starters and aircraft magnetos in the United States. When the US entered World War II in 1941, the government was concerned that a German-controlled company was manufacturing key military hardware so the Office of Alien Property Custodian took over the Bosch plant. In 1948, the banking group that controlled Arma bought Bosch from the Office of the Alien Property Custodian, merging them into the American Bosch Arma Corporation (AMBAC).8 (Arma had earlier received the rights to gyrocompass technology from the German Anschutz company, seized by the Navy after World War I, so Arma benefitted twice from wartime government seizures.)

In the mid-1950s, Arma moved into digital computers, building an inertial guidance computer for the Atlas nuclear missile program. America's first ICBM was the Atlas missile, which became operational in 1959. The first Atlas missiles used radio guidance from the launch site to direct the missile. Since radio signals could be jammed by the enemy, this wasn't a robust solution.

The solution to missile guidance was an inertial navigation system. By using sensitive gyroscopes and accelerometers, a missile could continuously track its position and velocity without any external input, making it unjammable. A key developer of this system was Arma's Wen Tsing Chow, one of the driving forces behind digital aviation computers. He faced extreme skepticism in the 1950s for the idea of putting a computer in a missile. One general mocked him, asking "Where are you going to put the five Harvard professors you'll need to keep it running?" But computerized navigation was successful and in 1961, the Atlas missile was updated to use the Arma inertial guidance computer. It was said to be the first production airborne digital computer.9 Wen Tsing Chow also invented the programmable read-only memory (PROM), allowing missile targeting information to be programmed into a computer outside the factory.

Wen Tsing Chow, computer engineer, with Arma Micro Computer. From Control Engineering, January 1963, page 19. Courtesy of Daniel Plotnick.

Wen Tsing Chow, computer engineer, with Arma Micro Computer. From Control Engineering, January 1963, page 19. Courtesy of Daniel Plotnick.

The photo below shows the Atlas ICBM's guidance system. The Arma W-107A computer is at the top and the gyroscopes are in the middle. This computer was an 18-bit serial machine running at 143.36 kHz. It ran a hard-wired program that integrated the accelerometer information and solved equations for the crossrange error function, range error function, and gravity, making these computations every half second.10 The computer weighed 240 pounds and consumed 1000 watts. The computer contained about 36,000 components: discrete transistors, diodes, resistors, and capacitors mounted on 9.5" × 6.5" printed-circuit boards. On the ground, the computer was air-cooled to 55 °F, but there was no cooling after launch as the computer only operated for five minutes of powered flight and wouldn't overheat during that time.

Guidance system for Atlas ICBM.  From "Atlas Inertial Guidance System" by John Heiderstadt. Photo unclassified in 1967.

Guidance system for Atlas ICBM. From "Atlas Inertial Guidance System" by John Heiderstadt. Photo unclassified in 1967.

The Atlas wasn't originally designed for a computerized guidance system so there wasn't room inside the missile for the computer. To get around this, a large pod was stuck on the side of the missile to hold the computer and gyroscopes, as indicated in the photo below. This doesn't look aerodynamic, but I guess it worked.

Atlas missile. Arrow indicates the pod containing the Arma guidance computer and inertial navigation system. Original photo by Robert DuHamel, CC BY-SA 3.0.

Atlas missile. Arrow indicates the pod containing the Arma guidance computer and inertial navigation system. Original photo by Robert DuHamel, CC BY-SA 3.0.

The Atlas guidance computer (left, below) consisted of three aluminum sections called "decks". The top deck held two replaceable target constant units, each providing 54 navigation constants that specified a target. The constants were stored in a stack of printed circuit boards 16" × 8" × 1.5", covered in over a thousand diodes, Wen Tsing Chow's PROM memory. A target was programmed into the stack by a rack of equipment that would selectively burn out diodes, changing the corresponding bit to a 1. (This is why programming a PROM is referred to as "burning the PROM".11) The diode matrix was later replaced with a transfluxor memory array, which had the advantage that it could be reprogrammed as necessary. The top deck also had connectors for the accelerometer inputs, the outputs, and connections for ground support equipment. The bottom deck had power connectors for 28 volts DC and 115V 400 Hz 3-phase AC. In the bottom deck, quartz delay lines were used for storage, representing bits as acoustic waves. Twelve circuit cards, each with a faceted quartz block four inches in diameter, provided a total of 32 words of storage.

Three generations of Arma Computers: the W-107A Atlas ICBM guidance computer,  the Lightweight Airborne Digital Computer, and the Arma Micro Computer (perhaps a prototype). Photo courtesy of Daniel Plotnick.

Three generations of Arma Computers: the W-107A Atlas ICBM guidance computer, the Lightweight Airborne Digital Computer, and the Arma Micro Computer (perhaps a prototype). Photo courtesy of Daniel Plotnick.

Arma considered the Micro Computer the third generation of its airborne computers. The first generation was the Atlas guidance computer, constructed from germanium transistors and diodes (in the pre-silicon era). The second-generation computer moved to silicon transistors and diodes. The third-generation computers still used discrete components, but mounted on the small square wafers. The third generation also had a general-purpose architecture and programmable transfluxor memory instead of a hard-wired program.

After the Micro Computer

Arma continued to develop computers, improving the Arma Micro Computer. The Micro C computer (1965) was developed for Navy ships and submarines. Much like the original Micro, the Micro C used transfluxor storage, but increased the clock frequency to 972 kHz. The computer was much larger: 3.87 cubic feet and 150 pounds. This description states that "the machine is an outgrowth of the ARMA product line of micro computers and is logically and electrically similar to micro-computers designed for missile environments."

Module from the Arma Micro-C Computer. Photo courtesy of Daniel Plotnick.

Module from the Arma Micro-C Computer. Photo courtesy of Daniel Plotnick.

In mid-1966, Arma introduced the Micro D computer, built from TTL integrated circuits. Like the original Micro, this computer was serial, but the Micro D had a word length of 18 bits and ran at 1.5 MHz. It weighed 5.25 pounds and was very compact, just 0.09 ft3. Instead of transfluxors, the Micro D used regular magnetic core memory, 4K to 31K words.

The Arma Micro-D 1801 computer. The 1808 was a slightly larger model. Photo courtesy of Daniel Plotnick.

The Arma Micro-D 1801 computer. The 1808 was a slightly larger model. Photo courtesy of Daniel Plotnick.

The widely-used Litton LTN-51 inertial navigation system was built around the Arma Micro-D computer.12 This navigation system was designed for commercial aircraft, but was also used for military applications, ships, and NASA aircraft. Aircraft from early Concordes to Air Force One used the LTN-51 for navigation. The photo below shows a navigation unit with the Arma Micro-D computer in the lower left and the gyroscope unit on the right.

Litton LTN-51 inertial navigation system.  Photo courtesy of pascal mz, concordescopia.com.

Litton LTN-51 inertial navigation system. Photo courtesy of pascal mz, concordescopia.com.

In early 1968, the Arma Portable Micro D was introduced, a 14-pound battery-powered computer also called the Celestial Data Processor. This handheld computer was designed for navigation in crewed earth orbital flight, determining orbital parameters from stadimeter and sextant measurements performed by astronauts. As far as I can tell, this computer never made it beyond the prototype stage.

The Arma Celestial Data Processor (source).

The Arma Celestial Data Processor (source).

Conclusions

The Arma Micro Computer is just one of the dozens of compact aerospace computers of the 1960s, a category that is mostly forgotten and ignored. Another example is the Delco MAGIC I (1961), said to be the "first complete airborne computer to have its logic functions mechanized exclusively with integrated circuits". IBM's 4 Pi series started in 1966 and was used in many systems from the F-15 to the Space Shuttle. By 1968, denser MOS/LSI chips were used in general-purpose aerospace computers such as the Rockwell MOS GP and the Texas Instruments Model 2502 LSI Computer. 13

Arma also illustrates that a company can be on the cutting edge of technology for decades and then suddenly go out of business and be forgotten. After some struggles, Arma was acquired by United Technologies in 1978 for $210 million and was then shut down in 1982. (The German Bosch corporation remains, now a large multinational known for products such as dishwashers, auto parts, and power tools.) Looking at a list of aerospace computers shows many innovative but vanished companies: Univac, Burroughs, Sperry (now all Unisys), AC Electronics (now part of Raytheon), Autonetics (acquired by Boeing), RCA (bought by GE), and TRW (acquired by Northrop Grumman).

Finally, the Micro Computer illustrates that terms such as "microcomputer" are not objective categories but are social constructs. At first, it seems obvious that the Arma Micro Computer is not a real microcomputer. If you consider a microcomputer to be a computer built around a microprocessor, that's true. (Although "microprocessor" is also not as clear as you might think.) But a microcomputer can also be defined as "A small computer that includes one or more input/output units and sufficient memory to execute instructions" (according to the IBM Dictionary of Computing, 1994)14 and the Arma Micro Computer meets that definition. The "microcomputer" is a shifting concept, changing from the 1960s to the 1990s to today.

For more, follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as @[email protected]. Thanks to Daniel Plotnick for providing a great deal of information and photos. Thanks to John Hartman for obtaining an obscure conference proceedings for me.

Notes and references

  1. I should mention the danger of "firsts" from a historical perspective. Historian Michael Williams advised "not to use the word 'first'" and said, "If you add enough adjectives to a description you can always claim your own favorite." (See ENIAC in Action, p7.)

    The first usage of "micro-computer" that I could find is from 1956. In Isaac Asimov's short story "The Dying Night", he mentions a "micro-computer" in passing: "In recent years, it [the handheld scanner] had become the hallmark of the scientist, much as the stethoscope was that of the physician and the micro-computer that of the statistician."

    Another interesting example of a "micro-computer" is the Texas Instruments Semiconductor Network Computer. This palm-sized computer is often considered the first integrated-circuit computer. It was an 11-bit serial computer running at 100 kHz, built out of RS flip-flops, NOR gates, and logic drivers. The 1961 article below described this computer as a "micro-computer", although this was a one-off use of the term, not the computer's name. This brochure describes the Semiconductor Network Computer in more detail and Semiconductor Networks are described in detail in this article. Unlike modern ICs, these integrated circuits used flying wires for internal connections rather than a deposited metal layer, making their design a dead end.

    The Texas Instruments Semiconductor Network Computer. From Computers and Automation, Dec. 1961.

    The Texas Instruments Semiconductor Network Computer. From Computers and Automation, Dec. 1961.

     

  2. Most of the information on the Arma Micro Computer in this article is from "The Arma Micro Computer for Space Applications", by E. Keonjian and J. Marx, Spaceborne Computing Engineering Conference, 1962, pages 103-116. 

  3. The Arma Micro Computer's instruction set consisted of 19 22-bit instructions, shown below.

    Instruction set of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

    Instruction set of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

     

  4. This block diagram shows the structure of the Micro Computer. The accumulator register (AC) is used for all data transfers as well as addition and subtraction. The multiply-divide register is used for multiplication, division, and square roots. The product register (PR), quotient register (QR), and square root register (SR) are used by the corresponding instructions. The data buffer register (S) holds data moving in or out of storage; it is shown with two 11-bit parts.

    Block diagram of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

    Block diagram of the Arma Micro Computer. Figure from "The Arma Micro Computer for Space Applications".

    For control logic, the location counter (L) is the 13-bit program counter. For a subroutine call, the current address can be stored in the recall register (RR), which acts as a link register to hold the return address. (The RR is not shown on the diagram because it is held in memory.) Instruction decoding uses the instruction register (I), with the next instruction in the instruction buffer (B). The operand register (P) contains the 13-bit address from an instruction, while the remaining register (R) is used for I/O addressing. 

  5. Arma's original plan was to mount circuits on ceramic wafers. Resistors would be printed onto the wafer and wiring silk-screened. (This is similar to IBM's SLT modules (1964), although IBM mounted diode and transistors as bare dies rather than components.) However, the Micro Computer ended up using epoxy-glass wafers with small, but discrete components: standard TO-46 transistors, "fly-speck" diodes, and 1/10 watt resistors. I don't see much advantage to these wafers over mounting the components directly on the printed-circuit board; maybe standardization is the benefit. 

  6. The Micro Computer used an unusual mechanism to select a word to read or write. Most computers used a grid of selection wires; by energizing an X and a Y wire at the same time, the corresponding core was selected. The key idea of this "coincident-current" approach is that each wire has half the current necessary to flip a core, so the core with the energized X and Y wires will have enough current to flip. This puts tight constraints on the current level, since too much current will flip all the cores along the wire, but not enough current will not flip the selected core. What makes this difficult is that the properties of a core change with temperature, so either the cores need to be temperature-stabilized or the current needs to be adjusted based on the temperature.

    The Micro Computer instead used a separate wire for each word, so as long as the current is large enough, the cores will flip. This approach avoids the issues with temperature sensitivity, an important concern for a computer that needs to handle the large temperature swings of a spacecraft, not an air-conditioned data center. Unfortunately, it requires much more wiring. Specifically, the large advantage of the coincident-current approach is that an N×N grid of wires lets you select N2 words. With the Micro Computer approach, N wires only select N words, so the scalability is much worse.

    For more on Arma's memory systems, see patents: Memory Device, 3048828 and Multiaperture Core Memory Matrix, 3289181

  7. The capitalization of Arma vs. ARMA is inconsistent. It often appears in all-caps, but both forms are used, sometimes in the same article. "Arma" is not an acronym; the name came from the names of its founders: Arthur Davis and David Mahood (source: Between Human and Machine, p54). I suspect a 1960s corporate branding effort was responsible for the use of all-caps. 

  8. For more on the corporate history of Arma, see IRE Pulse, March 1958, p9-10. Details of corporate politics and what went wrong are here. More information on the financial ups and downs of Arma is in "Charles Perelle's Spacemanship", Fortune, January 1959, an article that focused on Charles Perelle, the president of American Bosch Arma. 

  9. Wikipedia says that Arma's guidance computer was "the first production airborne digital computer". However, the Hughes Digitair (1958) has also been called "the first airborne digital computer in actual production." Another source says the Arma computer was the "first all-solid-state, high-reliability, space-borne digital computer." The TRADIC (Transistorized Airborne Digital Computer) (1954) was earlier, but was a prototype system, not a production system. In turn, the TRADIC is said by some to be the first fully transistorized computer, but that depends on exactly how you interpret "fully".

    This is another example of how the "first" depends on the specific adjectives used. 

  10. The information on the Arma W-107A computer is from "Atlas Inertial Guidance System: As I Remember It" by Principal Engineer John Heiderstadt. 

  11. Chow Wen Tsing's PROM patent discusses the term "burning", explaining that it refers to burning out the diodes electrically. To widen the patent, he clarifies that "The term 'blowing out' or 'burning out' further includes any process which, by means less drastic than actual destruction of the non-linear elements, effects a change of the circuit impedance to a level which makes the particular circuit inoperative." This description prevented someone from trying to get around the patent by stating that nothing was really burning. 

  12. Details on the LTN-51 navigation system and its uses are in this document

  13. For more information on early aerospace computers, see State-of-the-art of Aerospace Digital Computers (1967), updated as Trends in Aerospace Digital Computer Design (1969). Also see the 1970 Survey of Military CPUs. Efficient partitioning for the batch-fabricated fourth generation computer (1968) discusses how "The computer industry is on the verge of an upheaval" from new hardware including LSI and fast ROMs, and describes various LSI aerospace computers. 

  14. The "IBM Dictionary of Computing" (1994) has two definitions of "microcomputer": "(1) A digital computer whose processing unit consists of one or more microprocessors, and includes storage and input/output facilities. (2) A small computer that includes one or more input/output units and sufficient memory to execute instructions; for example a personal computer. The essential components of a microcomputer are often contained within a single enclosure." The latter definition was from an ISO/IEC draft standard for terminology so it is somewhat "official".