The adder at the heart of Intel's 8087 floating-point chip

In 1980, Intel released the Intel 8087 floating-point coprocessor, a chip that could make math up to 100 times faster. As well as arithmetic and square roots, the 8087 computed transcendental functions including tangent, exponentiation, and logarithms. But it all depended on a 69-bit adder: "The arithmetic heart of the floating-point execution unit is centered about a nanomachine comprised of the adder and its related registers, shifters and control circuitry," as the patent describes it. In this article, I explain the circuitry of this adder.

The photo below shows the 8087 die under a microscope. Around the edges of the die, hair-thin bond wires connect the chip to its 40 external pins. The complex patterns on the die are formed by its metal wiring, as well as the polysilicon and silicon underneath. At the top of the chip, the Bus Interface Unit connects to the rest of the system: coordinating with the main 8086 processor and memory. The chip's instructions are defined by the large microcode ROM in the middle.

Die of the Intel 8087 floating-point unit chip, with relevant functional blocks labeled. The die is 5mm×6mm.  Click for a larger image.

Die of the Intel 8087 floating-point unit chip, with relevant functional blocks labeled. The die is 5mm×6mm. Click for a larger image.

The bottom half of the die is the "datapath", the circuitry that performs calculations; it is split into the exponent datapath, which handles the exponent of a floating-point number, and the fraction datapath, which handles the fractional part (or significand). The adder (red) sits in the middle of the fraction datapath; to perform addition on the exponent, the exponent must be copied over to the fraction datapath.

Structure of the adder

Building a binary adder is easy; the hard part is making it fast. The key problem is how to handle the carries from a bit position to the next. Each carry potentially depends on all the lower carries, but you don't want to wait as a carry ripples through the logic for all 69 bits. (It's similar to doing 999999+1 with long addition: you need to carry the one, carry the one, ...)

The 8087's adder speeds up performance by breaking addition into 4-bit blocks, using two techniques to make computation inside each block fast. The carry needs to ripple from block to block, but this reduces the number of carry steps by a factor of four.

Simplified diagram of a four-bit block in the 8087's adder.

Simplified diagram of a four-bit block in the 8087's adder.

The diagram above shows the structure of one 4-bit block, with the carry generation circuits abstraced out for now. The adder takes two inputs: one (F) is from the chip's fraction bus, a bus that connects the components of the fraction datapath. The second input (B) comes from a register called the B register. Each bit of the sum is produced by XORing a F input, a B input, and the carry into that bit position.1 For reasons that will be explained below, the intermediate value (F XOR B) is called "propagate". The carry-out from each block is tied to the carry-in of the next block. But what happens inside the carry circuits?

In 1959, researchers at the University of Manchester developed a fast carry technique for a computer called Atlas. This technique, named the Manchester carry chain, computes the carry values by setting up switches in parallel and then letting the carry quickly propagate through the wires, controlled by the switches. Although the carry still needs to travel from bit to bit, it travels at the speed of a signal in a wire, not slowed by logic gates.2

The Manchester carry chain is built around the concepts of Generate, Propagate, and Delete (also known as Kill), which arise when adding two bits and a carry. If you add 1+1, a carry-out is generated, whether there is a carry-in or not. In contrast, if you add 0+0, there is no carry-out, regardless of the carry-in; any carry-in is deleted. The interesting case is if you add 0+1: a carry-out results only if there is a carry-in; that is, the carry-in is propagated to the carry-out. In logic terms, the generate signal is the AND of the two input bits, the delete signal is the NOR, and the propagate signal is the XOR. The important thing is that these signals can be computed for all bit positions in parallel, in constant time.

The idea behind the Manchester carry chain. Note that the low bit is on the left, so the carry flows left to right.

The idea behind the Manchester carry chain. Note that the low bit is on the left, so the carry flows left to right.

The Manchester carry chain is constructed as above, with the switches at each bit set according to the Generate/Propagate/Delete values. Once the switches are set, the carry status quickly flows through the circuit, producing the carry value at each position without any logic delays. If the propagate switch is closed, the previous carry passes through. But if the generate or delete switch is closed, the carry is set or cleared, respectively. Once the carry values are available, the final sum can be computed in parallel with XORs.

The 8087 uses an optimized circuit for the Manchester carry chain, combining the Generate and Delete cases. One stage of the adder's carry chain is shown below. For the propagate case, the carry-in Cin passes through the top switch, propagated to the carry out Cout. For the generate and delete cases, the bottom switch is closed, passing the input bit F. The trick is that the generate case corresponds to 1+1, so F is 1, resulting in Cout getting set. The delete case corresponds to 0+0, so F is 0, and Cout is cleared. (Note that both inputs, F and B, are the same in these cases, so using F instead of B is arbitrary.)

One stage of the Manchester carry chain.

One stage of the Manchester carry chain.

The middle of the diagram shows how the switches correspond to a multiplexer (mux) selecting the top signal Cin if prop is set, or the bottom signal F if prop is clear. The right side of the diagram shows the physical implementation with two NMOS transistors. These transistors function as switches (pass transistors), controlled by the prop signals on the gate.

The problem is that pass transistors aren't perfect switches, but lose a bit of voltage at each step. To fix this, the carry chain is broken into blocks of four bits (as shown earlier) and each block produces a "fresh" carry. This refresh is done by a "carry-skip" circuit, which can skip the carry processing inside the block. Specifically, the carry-skip mechanism checks if all positions inside the block are Propagate. In this case, the carry-out will have the same value as the carry-in (since the carry-in propagates through all the bit positions of the block). The carry-skip circuit detects this case and produces a carry-out signal matching the carry-in.

Putting this all together, the schematic below shows the adder circuitry for a typical block of four bits. The four multiplexers form the Manchester carry chain, while the NOR gate detects the carry-skip case.

Reverse-engineered schematic for a 4-bit block of the adder.

Reverse-engineered schematic for a 4-bit block of the adder.

To optimize performance, there is a complication for electrical reasons.3 The 8087 uses NMOS transistors, which are much faster to pull a signal low than to pull a signal high. To improve performance, the carry lines are precharged to 5V at the start of an addition, and then the circuitry pulls the lines low if needed. In order to start in the no-carry state, the carry lines are all negated, so the initial 5V state corresponds to no carry, and the ground state corresponds to a carry.

The last multiplexer in the block has four inputs instead of two4. The third input pulls the (inverted) carry line low for the carry skip case.5 The fourth input is the precharge signal; it puts 5V on the carry line to precharge it. (A control circuit activates the precharge signal at the start of an addition cycle.) Note that this only precharges one of the carry lines; to precharge the rest, the propagate signal is forced high during precharge.

Reverse-engineered schematic for the propagate circuit. This shows an arbitrary bit n.

Reverse-engineered schematic for the propagate circuit. This shows an arbitrary bit n.

The circuit to generate the propagate signal (above) is conceptually the XOR of the two inputs, but there are (of course) complications. When the precharge signal is high, propagate is forced high, tying all the carry lines together so the precharge can propagate to all of them. The second feature is that the B inputs can be blocked by the forceZero signal, so the value 0 is added instead of the B value.

To summarize, the adder is divided into blocks of four bits. Each block uses a Manchester carry chain and a carry-skip circuit to optimize the performance. Even with these optimizations, though, the large number of blocks requires the 8087 to take two clock cycles to complete an addition.

The adder in silicon

The image below shows how the circuitry for a block of four bits appears on the die. These blocks are stacked vertically to create the complete adder as seen in the earlier die photo. In this image, the metal layer is visible as white lines, mostly obscuring the circuitry underneath. The 8087 has a single metal layer, which constrains the layout. Note that metal wiring is tightly packed, occupying almost the complete area. The thick vertical metal trace at the left is ground, while the thick metal trace at the left is power, supplying the adder circuitry. The horizontal traces provide wiring inside the adder block, as well as allowing the fraction bus to pass across the adder. The vertical lines on either side are control signals for the adder (precharge and forceZero) as well as connections to circuitry at the bottom of the chip.

A block of four bits in the adder.

A block of four bits in the adder.

The photo below shows the silicon and polysilicon circuitry underneath the metal layer. (To take this photo, I dissolved the metal layer with acid.) The thin lines are polysilicon wiring, while the pinkish areas that appear raised are doped silicon. A transistor is formed when polysilicon crosses doped silicon. The circuitry is complex and irregular, connected by the horizontal metal wires above. The white circuits are contacts between the silicon and the metal wiring, while the white squares are contacts between the polysilicon and metal. Roughly speaking, if you divide the circuitry above into quarters, each quarter adds one bit. The carry-skip circuitry is in the middle.

A block of four bits in the adder with the metal layer removed.

A block of four bits in the adder with the metal layer removed.

The left and the right sides of the image don't have any transistors, just polysilicon lines that pass under the vertical metal wiring. Many of these polysilicon lines are widened to reduce their resistance and thus tune performance. The silicon in these regions is "wasted", just providing a channel for the vertical wiring.

The size of the adder

Although the 8087 nominally has 64-bit values for the fraction (significand), the adder is slightly larger: it takes 69 bits as input and generates 70 output bits. One reason is that the 8087 uses three extra low-order bits for rounding, called Guard, Round, and Sticky. These bits ensure that a value is always rounded in the right direction. Handling of the rounding bits is fairly complicated, with multiple modes, but from the adder's perspective they are just three input bits.6

As will be explained below, the value from the B register can be doubled, requiring one more bit. Finally, the fraction bus and the B value can be negated. (This is used for subtraction, among other things.) A negative value is represented in two's complement, requiring one more bit. In total, the inputs to the adder are 69 bits wide.

When adding two large numbers, the result can require one additional bit. Thus, the output of the adder is 70 bits wide. The Sum Shifter (explained below) can shift the output two bits to the right, cutting the result down to 68 bits. This is still one bit larger than 64 bits with 3 rounding bits; the "extra" bit is supported by a few special-purpose registers, such as the tmpC register7 and the Skip Shifter.

The surrounding circuitry

The inputs and outputs of the adder are tied to some special registers and circuits. I'll leave a detailed explanation of this circuitry to another post, but I'll provide a brief description here.8 The adder has two inputs: one input is from the fraction bus and the other input is from the B register. The adder's output is stored in the Sum Register. To make multiplication faster, the 8087 uses radix-4 Booth multiplication, which multiplies by two bits at a time. The multiplier is stored in the Skip Shifter, a register that allows two bits to be shifted out at a time. Based on these bits, one of the values 2B, B, 0, or -B is added. (The -B path is also used for subtraction.) The adder's output is shifted right two bits by the Sum Shifter (not to be confused with the Skip Shifter) and stored in the Sum Register.

The adder and associated registers. Based on the patent.

The adder and associated registers. Based on the patent.

Division is implemented by repeated subtraction, addition, and shifting. The bits of the result are accumulated in the quotient register. The implementation of square root is similar to the pencil-and-paper long square root, except in binary. The skip shifter provides two bits from the left, which are appended to the right side of the adder input. A subtract or add takes place, similar to division, and the square root is formed in the B register.

Multiplication, division, and square root require multiple steps to process all the bits. For performance, this looping is implemented in hardware, not in microcode. These instructions require a lot of microcode to prepare the arguments, handle exponents, handle special cases, and store the results, but the inner loop is hardware.

Conclusions

The 8087 patent expresses the importance of the adder: "Ultimately, all arithmetical operations are reduced at one point to a binary addition." Thus, the performance of the adder is vital to the performance of the 8087. There are faster ways to add, such as the Kogge-Stone adder in the Pentium, but these approaches require much more hardware, too much for the constrained transistor count of the 8087. The 8087 balanced complexity against performance, using the Manchester carry chain with a carry-skip adder.

I plan to write more about the 8087; for updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. Thanks to the members of the "Opcode Collective" for their hard work, especially Smartest Blob and Gloriouscow.

AI statement: I didn't use AI to write this article; the em-dashes are natural (details).

Notes and references

  1. I hope it's clear how the XOR of the two input bits and the carry in each position produces the corresponding sum bit. It's similar to long addition with pencil-and-paper: in each column, you have the two digits that you're adding, along with the carry (0 or 1) from the column to the right. XOR—exclusive or—functions like one-bit addition but discarding the carry out. 

  2. The Intel 386 processor also uses a Manchester carry chain, which I described here

  3. The 8087 uses NMOS transistors, unlike modern CMOS processors that use both NMOS and PMOS transistors. An NMOS transistor is much better at pulling a signal low than pulling a signal high. Thus, a frequent NMOS trick is to precharge a line high and then pull it low with a transistor; this is considerably faster than precharging a line low and pulling it high. This often requires a signal to be inverted, if 0 is the desired default value. 

  4. Strictly speaking, the 4-input carry-skip multiplexer isn't exactly a multiplexer since it is possible to have two inputs selected at the same time, such as propagate and skip. You might worry about a conflict if one selected input is 0 and the other selected input is 1. If the carry-skip input is selected, the carry from the carry chain will have the same value, since carry-skip is just an optimization. In the precharge case, both the Propagate and the +5V inputs are active; the Propagate inputs are rapidly pulled high, so again there is no conflict. 

  5. The carry-skip circuit uses a 5-input NOR gate. Since the inputs are all inverted, this is logically equivalent to a 5-input AND gate, testing if the four propagate signals are high and the carry-in is high. It's faster, however, to use a NOR gate in NMOS logic because the transistors are in parallel. This is another example of how the low level (using NMOS transistors) affects the higher-level circuitry. 

  6. Carry-skip is not used for the bottom three bits. The carry-in to the adder is controlled by bits in the microcode instruction; it can either be explicitly set or be set based on the B register sign to handle subtraction properly. 

  7. The fraction datapath has three temporary registers that are almost identical but have different sizes. tmpA and tmpB hold 64 bits, but tmpC holds 68 bits (including three rounding bits and one high-order bit).

    The tmpC register has circuitry for bit 63, but tmpA and tmpB do not.

    The tmpC register has circuitry for bit 63, but tmpA and tmpB do not.

    You can see the extra tmpC bits on the die. The photo above shows the high-order bits for the three registers. For the most part, the registers are mirror images of each other. But looking at the yellow box, tmpC has a NAND gate for bit 68, which is missing from tmpB and tmpA. At the low end (not shown), tmpC has three bits for rounding that are missing from the other bits. 

  8. The patent describes the arithmetic operations in some detail. See Section III (page 13). 

Powering up a module from the IBM 604: an electronic calculator from 1948

1948 was an interesting time for computing. For decades, businesses had used punch card equipment that added and sorted electromechanically. Now these electromechanical relays and counting wheels were being used to build room-filling general-purpose computers such as Harvard Mark I (1944) and IBM's SSEC (1948). But slow electromechanical mechanisms were already becoming obsolete. World War II had fostered the development of electronics and vacuum tubes for radio, radar, and navigation. Electronic technology was being used in massive electronic computers, such as Colossus (1943) and ENIAC (1946). The first stored-program computer, the Manchester Baby, was built in 1948.

The IBM 604 Electronic Calculating Punch behind a Type 521 Card Reader/Punch. Photo from IBM.
Note the panels in the side of the 604 and in the front of the 521 to hold plugboards.

The IBM 604 Electronic Calculating Punch behind a Type 521 Card Reader/Punch. Photo from IBM. Note the panels in the side of the 604 and in the front of the 521 to hold plugboards.

In the midst of these technological advances, IBM introduced the Electronic Calculating Punch, type 604.1 This system may seem like a step backward: it wasn't a computer, but a programmable calculator that performed a fixed set of operations.2 However, it was much smaller3 than a computer—about the size of a double refrigerator—and much cheaper: renting for $550 a month, it was affordable by businesses and universities. Since it used vacuum tubes, it was much more powerful than electromechanical equipment; it could do 60 operations in under a second, including multiplication and division. As a result, the IBM 604 became very popular, with over 5600 units produced. Moreover, IBM's experience with electronics in the 604 led to the success of its vacuum-tube computers in the 1950s.

One of the innovations of the 604 was the pluggable module, which combined a tube and its associated circuitry as shown below. The insulated handle was used to remove and install modules in the calculator. The nine pins at the bottom of the module plugged into a socket in the 604, with the sockets connected with backplane wiring. The tube was also socketed, so a bad tube could be quickly replaced. At the right, the resistors and capacitors are mounted on insulating wafers in the module.4

A thyratron tube module from the IBM 604 Electronic Calculating Punch.

A thyratron tube module from the IBM 604 Electronic Calculating Punch.

The 604 used several different types of modules. This module has a thyratron tube, a special type of tube that acts as a high-current switch. I put this module in a circuit and powered it up. The video below shows the module controlling a light bulb. The first button sends a small signal to the module (center), turning it on and illuminating the bulb. As I'll explain below, a thyratron tube stays on until its power is cut off, which I did with the second button.

Pluggable modules may seem trivial, but they were an important innovation. Previously, vacuum tube equipment was typically built from a metal chassis with tubes mounted on the top and the other components, such as resistors and capacitors, mounted underneath. IBM developed a different approach: pluggable modules, where each module held a vacuum tube along with its associated components. These patented modules were dense, since they packed components in three dimensions. Moreover, by using a small set of standardized modules, the modules could be mass-produced and the computers assembled on a production line. Maintenance and repair were simplified; modules could be swapped to find the bad module, which was replaced with a spare. These modules were so important that IBM featured them in ads for the 604. IBM used tube modules in later vacuum tube computers, using larger eight-tube modules in the high-end 700-series computers.

An ad for the IBM 604, highlighting the pluggable modules. From Time magazine, March 31, 1952, page 65. Click this image (or any other) for a larger version.

An ad for the IBM 604, highlighting the pluggable modules. From Time magazine, March 31, 1952, page 65. Click this image (or any other) for a larger version.

Vacuum tubes and the thyratron

The IBM 604 used about 1250 vacuum tubes. While vacuum tubes come in many different types, a typical type is the triode. A triode is analogous to a transistor: a small input signal is amplified to control a much larger current. In a transistor, the control signal is applied to the gate, controlling the current between the source and drain. In a triode tube, the control signal is applied to the grid, controlling the current between the cathode and the plate.

The components of a triode vacuum tube. From IBM 604 Customer Engineering manual.

The components of a triode vacuum tube. From IBM 604 Customer Engineering manual.

The diagram above shows the construction of a vacuum tube. The heater is a filament, very similar to an incandescent light bulb, that heats up the cathode to roughly 750 ºC. At this high temperature, the cathode emits electrons. When a large positive voltage (say, 100 volts) is put on the plate, the negatively-charged electrons are attracted. The stream of electrons from the cathode to the plate causes a current to flow through the tube. The current is controlled by the grid: if a small negative voltage is placed on the grid, it repels the negative electrons, preventing them from reaching the plate and blocking the current through the tube.

A thyratron tube is similar to a vacuum tube, except it has a tiny bit of xenon gas inside, allowing it to handle higher current.7 Like a triode, the thyratron is controlled by the grid. However, when current starts to flow through the thyratron, the xenon ionizes and the xenon plasma carries current. Unlike a vacuum tube, the grid cannot stop the flow of current. Once the gas is ionized, a thyratron tube stays on until you remove its power5 and the gas deionizes in microseconds.6

You can see this behavior in the video. When I pushed the first button, a small control signal ionized the gas, turning the tube on. The large current through the ionized gas illuminated the light bulb. The light stayed on until I briefly cut the power with the second button; the gas deionized, turning off the tube.

The thyratron tube, type 2D21.

The thyratron tube, type 2D21.

The photo above shows the thyratron tube, type 2D21, a miniature 7-pin tube.8 The plate is visible inside the tube, with the other components hidden by the plate. The dark stain at the top of the tube is the "getter", a reactive substance such as barium that absorbs impurities inside the tube.

In the 604, thyratron tubes drove relay coils and powered the electromagnets that punched holes in cards. Other IBM systems also used these thyratron tubes. For instance, the IBM 83 Card Sorter used thyratron tubes as short-term storage to keep track of which holes had been detected in a card.

Conclusion

The IBM 604 occupies an interesting position between electromechanical accounting machines and electronic computers. Although it has the speed of an electronic computer, it was still a calculator, lacking computer features such as loops, memory, and stored programs. Despite these limitations, the 604 was highly successful and led to other important IBM products.

IBM extended the 604 in 1949 so it could be programmed by punch cards in combination with plugboards; this was called the Card-Programmed Electronic Calculator. This system was still not quite a computer, but was very useful for scientific calculation at places such as Los Alamos National Labs (link). In 1953, IBM announced the successor to the 604, the IBM 650. Unlike the 604, the 650 was a programmable, general-purpose computer; it became the most popular computer of the 1950s.

Eric Schlaepfer (TubeTime) has a box of IBM 650 modules, which we hope to power up soon. For updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. Thanks to CuriousMarc for extensive milling work to build the socket and colorful breakout box to hold the module.

AI statement: Despite the presence of the em dash, no AI was used in the writing of this article (details).

Notes and references

  1. For information on the IBM 604, see the Operating Manual. The Customer Engineering Manual of Instruction explains the circuitry. See IBM's Early Computers for information on the development of the 604. For a detailed description of an application, see this petroleum engineering article, using the 604 to predict the profitability of an oil property. 

  2. The IBM 604 operated by reading numbers from a punch card, performing up to 60 operations, and punching the result onto the punch card. This was repeated for each card, processing 100 cards per minute. The IBM 604 was not a stored-program computer, so it didn't have code. Instead, the IBM 604 was programmed by plugging wires into plugboards. The plugboard below was inserted into the 604, while a second plugboard, twice as large, went in the card punch unit to control which columns of the 80-column punch card were read and punched.

    An IBM 604 plugboard. Photo from National Museum of American History, CI.328576. (Click for a larger image.)

    An IBM 604 plugboard. Photo from National Museum of American History, CI.328576. (Click for a larger image.)

    Looking at the plugboard above, the column on the left with the heading "PROGRAM" had a row for each programming step. A wire from that row was connected to the function to be performed on that step. The system supported conditionals: the operation that was performed on a step could be changed or skipped with the calculator selectors ("CALC. SEL.") on the right. (A selector was a relay that could send a signal along one of two paths (Normal or Transfer) based on a Control input.) For more information on the plugboards, see the Operator's Manual

  3. The IBM 604 weighed 1310 pounds, while the attached 521 Card Reader/Punch weighed 670 pounds. The system used 5.5 KW of power. (Vacuum tubes are power-hungry; the module that I used required 3.75 watts for the heater alone.) 

  4. I reverse-engineered the MD7A thyratron module to create the schematic below. Black pin numbers are module pins (1-9), while red pin numbers are tube pins (1-7).

    Schematic of the IBM MD7A module, reverse-engineered.

    Schematic of the IBM MD7A module, reverse-engineered.

    For my experiment, I powered the module with about 100 volts on the plate (pin 5). I used pin 3 of the module for the input, using about 8 volts to trigger the thyratron. Pin 4 is the output, pulled high when the thyratron fires. I connected the light bulb between pin 4 and ground (pin 6). I ignored pins 7, 8, and 9. 

  5. One disadvantage of a thyratron is that you need to remove its power to turn it off. In the 604, a mechanical cam in the card reader/punch activated a microswitch to turn off the power (details. Since the card reader/punch used cams on a rotating shaft for its timings, one more cam wasn't an inconvenience. 

  6. The behavior of a thyratron is very similar to the silicon-controlled rectifier (SCR). This semiconductor device is also called a thyristor, short for thyratron transistor. 

  7. The xenon pressure in the thyratron tube is very small, just .05 Torr, less than 1/10,000 of atmospheric pressure (source). Vacuum tubes, in comparison, have a vacuum that is orders of magnitude higher, around 10-6 Torr.

    Some high-power thyratron tubes use mercury vapor, such as the ones inside a 1940s power supply that we examined. These tubes give off a blue glow when active. The xenon tube, in comparison, didn't emit any light that I could see, apart from the orange glow from the filament. 

  8. The pinout for the 2D21 thyratron tube is shown below, and the datasheet is here. Thyratrons use the same symbols as vacuum tubes, except the large black dot indicates the presence of gas in the tube.

    Symbol for the 2D21 thyratron tube. From IBM 604 Customer Engineering manual.

    Symbol for the 2D21 thyratron tube. From IBM 604 Customer Engineering manual.

    As the symbol shows, the 2D21 tube has two grids, so it is technically a tetrode (four active elements). The second grid improves performance by screening the control grid from the cathode and the plate, reducing capacitance. (See Thyratrons for modern industry.) For my experiment, I ignored the screen grid. (The 604 also used some pentagrid tubes with a whopping five grids: two control grids, two screen grids, and a suppressor grid.) 

Microcode inside the Intel 8087 floating-point chip: register exchange

In 1980, Intel introduced the 8087 floating-point chip, a co-processor that made floating-point operations up to 100 times faster. This chip was highly influential, and today most processors use the floating-point standard introduced by the 8087.

The 8087 uses complicated algorithms to accurately compute functions such as square roots, tangents, and exponentials. These algorithms are implemented inside the chip in low-level code called microcode. I'm part of a group, the Opcode Collective, that is reverse-engineering this microcode. In this post, I take a close look at the microcode for one of the 8087's instructions—FXCH—and explain how the microcode works. The FXCH (Floating-point Exchange) instruction exchanges two floating-point registers. You might expect this instruction to be trivial, but there's more going on than you might expect; the microcode uses 14 micro-instructions to implement the exchange instruction.

The Intel 8087 chip is packaged in a 40-pin DIP (dual in-line package).

The Intel 8087 chip is packaged in a 40-pin DIP (dual in-line package).

To explore the microcode, I opened up an 8087 chip and created a high-resolution image with a microscope. The large microcode ROM occupies a central position, holding the micro-instructions that control the chip. The microcode engine on the left steps through the microcode, handling jumps and subroutine calls. The bottom half of the chip is the "datapath", the circuitry that performs floating-point calculations; it is split into a 16-bit datapath for the number's exponent and a 64-bit datapath for the number's fractional part (also known as the significand).

Die of the Intel 8087 floating-point unit chip, with main functional blocks labeled. The die is 5mm×6mm.  Click for a larger image.

Die of the Intel 8087 floating-point unit chip, with main functional blocks labeled. The die is 5mm×6mm. Click for a larger image.

This post focuses on the temporary registers and stack registers that are highlighted in red. The chip has two temporary registers and eight stack registers, each holding a number's exponent and fraction. Each register also has two tag bits that label the type of value in the register. The stack control circuitry at the right manages the stack, keeping track of the top-of-stack position as values are pushed onto the stack or popped off the stack.

The 8087's microcode

Executing an 8087 instruction such as arctan requires hundreds of internal steps to compute the result. These steps are implemented in microcode with micro-instructions specifying each step of the algorithm. (Keep in mind the two levels of instructions: the assembly language instructions used by a programmer and the undocumented low-level micro-instructions inside the chip.) The microcode ROM holds 1648 micro-instructions, implementing the 8087's instruction set. Each micro-instruction is 16 bits long and performs a simple operation such as moving data inside the chip, adding two values, or shifting data. I'm working with the Opcode Collective to reverse-engineer the micro-instructions and fully understand the microcode (link).

The 8087's micro-instructions are complicated, with many corner cases and ad hoc functions, but I'll provide a simplified overview. Each micro-instruction consists of 16 bits, as shown below. The first three bits specify the type of the micro-instruction, which controls the meaning of the remaining bits. The first type indicates a transfer operation, transferring data from one internal register to another. The two fields specify the source and destination for the data. The three unspecified bits are used for various special cases. Next is a shift operation, which uses the barrel shifter to shift a value left or right. The third type of micro-instruction uses the adder/subtractor. It can also be used in a loop for multiplication or division. Fourth are various arithmetic control micro-instructions that configure the adder, set rounding modes, and so forth. The far jump and far call micro-instructions perform a jump or subroutine call to a target micro-address in a fixed list. The condition field allows conditional jumps/calls based on numerous conditions, while the last bit inverts the condition. A local jump allows a conditional jump to a nearby micro-instruction. Finally, the miscellaneous micro-instructions range from returning from a subroutine or raising an exception to ending the microcode execution.

Structure of an 8087 micro-instruction.

Structure of an 8087 micro-instruction.

How values are stored inside the 8087 chip

The 8087 supports a variety of data types: floating-point numbers of various sizes, integers, and binary-coded decimal. But internally, everything is stored as an 80-bit floating-point number. A number has three parts: a 64-bit significand (the fractional part), a 15-bit exponent, and a sign bit. The chip has two separate data paths: one for the significand, and one for the exponent and sign.

The chip has eight registers to store numbers during calculations, the top registers in the diagram below. However, the registers are organized in an unusual way: as a stack, with numbers pushed to the stack and popped from the stack. Instead of accessing, say, register #3, you might access the third register from the top of the stack, denoted ST(3); as values are pushed or popped, ST(3) changes. The stack-based architecture was intended to improve the instruction set, simplify compiler design, and make function calls more efficient, although it didn't work as well as hoped.

The register set of the 8087, as seen by the programmer. From 8086 Family Numerics Supplement.

The register set of the 8087, as seen by the programmer. From 8086 Family Numerics Supplement.

Many 8087 instructions act on the top of the stack. For instance, the square root instruction replaces the value on the top of the stack with its square root. But what if you want to take the square root of a value in the middle of the stack? The solution is the FXCH instruction, the focus of this article. This instruction exchanges the value on the top of the stack with a specified stack position, providing access to values inside the stack.

One more feature of the 8087 is important to this discussion: each value in the register stack has an associated "tag" value, labeling it as valid, special, zero, or empty. A "normal" floating-point value is tagged as valid. If the floating-point value is infinity, Not a Number, or a denormalized value, then it is tagged as special. A zero value is tagged as zero. Finally, if a register is empty (e.g., its value has been popped off the stack), the register is tagged as empty. The 8087 uses tags to optimize performance and detect errors.1 For instance, if a programmer pops too many values from the stack and tries to read a stack register that is tagged empty, the 8087 raises an "invalid operation" exception.

The eight stack registers are visible to the programmer, but the 8087 also has temporary registers that it uses internally. Two of these temporary registers are important for this article: tmpA and tmpB. Like the stack registers, each temporary register is an 80-bit register, along with two tag bits.

The FXCH microcode

In this section, I'll explain how the microcode for the FXCH exchange instruction works. This instruction exchanges the top-of-stack register with the register at a specified position in the stack. If either register is empty, the instruction will raise an "invalid operation" exception and replace the missing value(s) with the special value "Not a Number" (NaN).

The microcode for the instruction is below, consisting of 14 micro-instructions.2 The first micro-instruction is a transfer, where the source is the top of stack value ST(0) and the destination is the temporary A register. The source specification causes the 64 significand to be placed on the fraction bus, the 16-bit exponent and sign to be placed on the exponent bus, and the two tag bits to be sent to the tag circuitry. The destination tmpA causes the bus values to be stored into the temporary register. Thus, the bits in the micro-instruction cause the desired transfer to take place. The third micro-instruction is similar, but uses a register inside the stack, ST(i), with the index specified in the machine instruction.

FXCH entry point:
#0200 ST(0) -> tmpA           read top of stack
#0201 nop                     Wait a cycle
#0202 ST(i) -> tmpB           Read specified stack register
#0203 if !(tmpA or tmpB empty) jmp #0210 Jump if both registers exist
#0204 set invalid exception   Raise an invalid exception
#0205 if (unmasked) jmp #0213 If interrupt, end
#0206 if !(tmpA empty) jmp #0208 Check if tmpA is empty
#0207 NaN -> tmpA             If so, move NaN to tmpA
#0208 if !(tmpB empty) jmp #0210 Check if tmpB is empty
#0209 NaN -> tmpB             If so, move NaN to tmpB
The happy path and error path continue here:
#0210 tmpB -> ST(0)           Save tmpB to the top of stack
#0211 nop                     Wait a cycle
#0212 tmpA -> ST(i)           Save tmpA to the specified stack register
#0213 RNI                     End of routine: Run Next Instruction
#0214 nop                     Unused
#0215 nop                     Unused
#0216 nop                     Unused

Next, the relative jump at micro-address #0203 illustrates a different type of micro-instruction: the conditional jump. The micro-instruction specifies a condition, in this case, testing if either temporary register is empty. (That is, the hardware tests the tag bits associated with the temporary registers to see if either is the "empty" tag.) The micro-instruction has a bit set to invert the condition. Finally, the micro-instruction has an offset of +6, yielding the jump target #0210. The advantage of a relative offset over specifying a full micro-address is that the offset only requires six bits. (For more information on how conditions are evaluated, see my article Conditions in the Intel 8087 floating-point chip's microcode.)

If either register is empty, the next micro-instruction raises an "invalid" exception. As I'll explain in the next section, you can program the 8087 to either generate an interrupt on an exception or continue processing. The next instruction is a conditional jump that tests if the exception was "unmasked", indicating that an interrupt was generated. In this case, the microcode ends while the main 8086 processor handles the interrupt.

Assuming the interrupt was masked, the microcode now replaces empty values with the special Not a Number value, first checking tmpA and then tmpB. The source NaN causes circuitry to pull the exponent bus to all 1's and the fraction bus to all 0's, except for the top two bits. This particular bit pattern represents Not a Number.3

At micro-address #0210, the empty-register path and the normal path join up to store the temporary registers in the stack registers. This is where the actual exchange operation happens, since tmpA and tmpB are written to the opposite stack positions from where they were read. Finally, RNI (Run Next Instruction) indicates the end of the microcode routine. This stops the microcode engine and gets the 8087 ready for the next instruction.

The nop (no-operation) microcode instructions are interesting. Each pair of stack reads or writes has a nop in the middle, probably due to timing constraints on the registers. The end of the microcode routine has three nop instructions before the next microcode routine starts. These instructions appear to be wasted space in the microcode; maybe the FXCH microcode was shortened by three words during development, causing this gap.

Exceptions

The 8087 has a complicated exception system to handle a variety of problems. Exceptions fall into six categories: invalid operation, denormalized operation, zero divide, overflow, underflow, or precision. An invalid operation occurs, for instance, if you take the square root of a negative number or try to perform an operation on an empty register. An overflow exception occurs if a value is too large to be represented, while an underflow exception occurs if a value is too small. A zero divide exception happens if you divide by zero.4 A precision exception occurs if a number cannot be exactly represented as a floating-point number (which is extremely common). Finally, a denormalized exception occurs if a value is too close to zero to be represented with full accuracy.

What happens if an exception occurs? The 8087 allows the programmer to select exception behavior for each exception type. The first option is for an exception to trigger a CPU interrupt, so the software can handle the problem. For instance, the software could attempt to work around the problem, log an error, or simply terminate the program. Alternatively, the programmer can "mask" an exception. In this case, the 8087 continues the operation in a "reasonable" way. For instance, an overflowed value would be set to infinity, while an invalid value would be set to the special value: "Not a Number" (NaN). For a precision exception (e.g., 1/3), the value is rounded. The designers of the 8087 put a lot of effort into continuing after a masked exception in the best way; the manual has pages of details on all the special cases.5

Handling of exception conditions is split between microcode and hardware. For example, if the FXCH microcode detects an empty register, it executes a set invalid exception micro-instruction. This micro-instruction sets a latch indicating the invalid exception. The 8087's control register includes six mask bits, one for each type of exception, blocking interrupts for that exception type. The hardware combines the exception flip-flop signals with the mask bits in the control register and the exception flags in the status register to see if a new, unmasked interrupt has been triggered. If so, the 8087 circuitry sends an interrupt to the 8086 processor.

On the other hand, if the interrupt is masked, execution of the microcode continues. In the case of FXCH, the microcode replaces empty registers with the Not a Number value. Finally, the microcode routine ends with RNI (Run Next Instruction). This triggers many hardware activities, but the relevant one is copying the state of the exception flip-flops into the status register. This sets the exception bit if the programmer wants to check it. The exception flip-flops are cleared when the next 8087 instruction starts. Since the hardware manages the flip-flops, status register, control register, and interrupt line, the microcode can be simpler and smaller.

Extracting the microcode

The 8087's microcode ROM contains 26,368 bits, specifying 1648 16-bit micro-instructions. At the time, this was a very large ROM; in order to fit it on the die, Intel used a special type of ROM that held two bits per transistor, twice the capacity of a standard ROM. This ROM is semi-analog, using four sizes of transistors to produce four voltage levels. Comparators convert the voltage level to a pair of bits.

A close-up of the 8087's microcode ROM, showing 77 transistors. A transistor is formed where a vertical polysilicon line crosses a horizontal stripe of doped silicon.

A close-up of the 8087's microcode ROM, showing 77 transistors. A transistor is formed where a vertical polysilicon line crosses a horizontal stripe of doped silicon.

To extract the microcode, I took high-resolution images of the ROM after dissolving the metal layer. Gloriouscow used a neural network to categorize the size of each transistor. (You can explore the full image and transistors here.) The next step was determining how to map the transistors to bits. You might expect that the grid of transistors corresponds to the grid of microcode bits. But due to various hardware optimizations, rows and columns are shuffled and mirrored, which I sorted out by studying the circuitry. The result was the microcode, expressed as a table of 0's and 1's.

The next step was assigning meaning to the microcode. For the 8086 processor, the patent provided a lot of detail on the structure of the microcode and the hardware, but the 8087 patent didn't explain the microcode. Instead, we figured out the micro-instructions through a combination of examining the circuitry, looking for patterns in the microcode, and thinking about how instructions might be implemented.

Microcode is usually complicated, and the 8087 is worse than most. The 8087 was on the edge of what was possible at the time, so the designers resorted to special cases and hacks where necessary. For instance, some conditional jumps have side effects such as updating registers. Other instructions set flip-flops that change the behavior of later operations. We're still working to completely understand the micro-instructions at the hardware level.

I plan to continue reverse-engineering the 8087 microcode; for updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. I've been working on this with the members of the "Opcode Collective", especially Smartest Blob and Gloriouscow, who converted the ROM images to microcode data and extensively analyzed the contents. See the 8087 repository on GitHub for more.

AI statement: Despite the presence of the em dash, no AI was used in the writing of this article (details).

Notes and references

  1. Tags are normally invisible to the programmer, but can be accessed through special operations. A programmer can access the 8087 tags by dumping the 8087's state to memory; the tags are stored in a 16-bit "tag word". 

  2. The raw 8087 microcode is available here, decoded by Smartest Blob. I've modified the microcode format for clarity. 

  3. The 8087 indicates a bad value with a special "Not a Number" (NaN) value. The system allows many different representations of NaN: any value with an exponent of all 1's, a nonzero significand (a zero significand indicates negative infinity), and either sign. For an invalid operation, the 8087 uses one particular NaN value, called real indefinite. For an internal 80-bit real number, this value has the top two bits of the significand set internally, and the rest zero, while the exponent bits and sign are all set. (See pages 87 (S-73) and 90 (S-76).) A 32-bit or 64-bit real uses a slightly different bit pattern for NaN; these number formats have an implied "1" bit for the significand, so only one bit of the significand is explicitly set for the real indefinite NaN. 

  4. Dividing by zero usually causes a zero divide exception, but 0 ÷ 0 causes an invalid operation exception, while infinity ÷ 0 is valid, yielding infinity. Just one reason why the microcode is so complicated. 

  5. For more information on the 8087's exceptions, see 8086 Family Numerics Supplement. The exception system is described on page 32 (S-18). The exception flags and exception masks are described on page 24 (S-10). The details of exception handling are described on page 89 (S-75). 

Reverse engineering circuitry in a Spacelab computer from 1980

Spacelab was a reusable laboratory that could be carried in the cargo bay of the Space Shuttle, providing lab space for astronauts and experiments. Spacelab was controlled by a French-built minicomputer, called the Mitra 125 MS. Unlike modern computers, this computer didn't contain a microprocessor chip. Instead, its 16-bit processor was constructed from several boards of chips. In this article, I reverse-engineer one of the processor boards, shown below, part of the computer's Arithmetic/Logic Unit (ALU).

The Mitra 125 MS computer, built by CIMSA, with one of the ALU/register cards shown.

The Mitra 125 MS computer, built by CIMSA, with one of the ALU/register cards shown.

Spacelab consisted of a pressurized cylindrical laboratory that held experiments, computers, and work areas for researchers. A tunnel connected the laboratory to the Shuttle, allowing researchers to move between the Shuttle and Spacelab. Spacelab also supported up to five unpressurized "pallets" that were exposed to space, holding experiments such as telescopes and sensors. The illustration below shows the tunnel, the Spacelab laboratory, and a pallet installed in the Shuttle's cargo bay.1

Illustration of the Spacelab-3 mission. From NASA.

Illustration of the Spacelab-3 mission. From NASA.

Because Spacelab was a European project, it used a European computer, the Mitra 125 MS. The Mitra line started in 1971 when a French company called CII introduced the Mitra 15 minicomputer, a 16-bit computer that used magnetic core memory. Mitra is a French acronym2 that translates as "Mini-machine for Real-Time and Automatic Computing." As the name suggests, Mitra was both small and designed for real-time computing, making it suitable for controlling experiments. The Mitra 15 was a popular computer, with almost 8000 units sold.

In 1975, CII produced a successor called the Mitra 125. The Mitra 125 improved on the Mitra 15 by adding memory management, I/O processors, higher performance, and additional instructions. Spacelab used the Mitra 125 MS minicomputer,3 a militarized variant of the Mitra 125 that was produced by a company called CIMSA. A Spacelab mission had three of these computers: the Subsystem Computer controlled and managed Spacelab itself, while the Experiment Computer handled the experiments. A Backup Computer could take over if either computer failed.1 These computers were part of Spacelab's Command and Data Management Subsystem, which controlled experiments and collected data.4

The three computers were normally mounted in the Spacelab laboratory underneath the Work Bench Rack (details). The computers were controlled through a keyboard and a color CRT display, called the Data Display System (DDS). The computer installation and a DDS are visible in the photo below.

This photo shows astronauts inside Spacelab (but not in space). The Spacelab computers were mounted under the Work Bench (right arrow). The Data Display System (left arrow) provided the interface to the computers. Photo is STS-51B Crew Portrait, 1984.

This photo shows astronauts inside Spacelab (but not in space). The Spacelab computers were mounted under the Work Bench (right arrow). The Data Display System (left arrow) provided the interface to the computers. Photo is STS-51B Crew Portrait, 1984.

For some Spacelab missions, the laboratory was omitted entirely, providing more room for experiment pallets. In this case, the computers were mounted in a small pressurized cylinder called the igloo. The researchers remained in the Shuttle, controlling experiments through two Data Display Systems that were mounted in the Shuttle's rear flight deck (photo).

The 74181 ALU chip

The Spacelab computer didn't use a microprocessor chip. Instead, like most minicomputers at the time, it was built from simple integrated circuits that were combined to implement the computer's circuitry. Unlike modern CMOS integrated circuits, these chips contained bipolar transistors, which were fast, but large and power-hungry, a technology known as TTL (transistor-transistor logic). Electronics hobbyists of a certain age will recall the popular 7400 series of TTL chips. The Spacelab computer was built from the military grade of these chips, the 5400 series.

The most complex chip in the computer was probably the '181 Arithmetic/Logic Unit (ALU) chip, containing about 170 transistors. The arithmetic/logic unit is the heart of a computer, performing arithmetic operations as well as Boolean logic operations. In 1970, Texas Instruments put a complete 4-bit arithmetic/logic unit on a single chip, called the 74181. Since the chip was fast, compact, and inexpensive, it was widely used, providing the ALU in computers from the popular PDP-11 and Xerox Alto to the powerful VAX-11/780 "superminicomputer".

The 74181 provides a full set of binary logical operations, including AND, OR, XOR, and complement. For arithmetic, it includes addition, subtraction, incrementing, and decrementing.5 Inconveniently, the 74181 doesn't support shifting right. Moreover, multiplication and division were much too complicated to be included in the 74181. Instead, a processor implemented multiplication and division through repeated addition or subtraction, combined with shifting. Likewise, floating-point operations were way beyond the capability of the 74181, but a processor could use the 74181 when performing the steps of a floating-point operation.

Although the 74181 only handled four bits, multiple 74181 chips could be combined to handle larger words, such as 16 bits or 32 bits. To handle carries, the chips could be chained together, with the carry-out from one chip fed into the carry-in of the next chip. This approach was simple but slow, since the carry had to "ripple" through all the chips before the answer could be obtained. The carry process could be sped up by using a carry-lookahead chip called the 74182, which speeds up addition by computing the carries from four 74181 chips (i.e., 16 bits) in parallel.

The Mitra's ALU/register boards

The Spacelab computer used eight '181 ALU chips to implement a 32-bit adder.6 (Specifically, these chips are the 54S181, a variant of the 74181: "54" indicates that the chips handle the military temperature range, and "S" indicates that the chip is built from high-speed Schottky logic.) However, the ALU boards required numerous additional chips. Depending on the instruction, eight different inputs could be selected for the ALU. Chips called multiplexers selected the desired value, requiring 32 multiplexer chips. Three 32-bit registers provided storage for ALU inputs and outputs, requiring 24 chips. Two 54S182 carry-lookahead chips provided fast carry computation. Finally, some simple logic chips (inverters and NAND gates) tied things together.

Due to the number of chips required, the ALU/register circuitry was spread across three boards, as shown below. (I reverse-engineered the board on the right.7) The '181 chips are immediately visible as they are much larger than the other chips; they have 24 pins, compared to 14 or 16 pins for the other chips. The first board has two '181 chips, while the last two boards each have three '181 chips. The last two boards are similar, but not identical.

The three ALU/register boards from the Spacelab computer.
Click this image (or any other) for a larger version.

The three ALU/register boards from the Spacelab computer. Click this image (or any other) for a larger version.

Finding a 32-bit ALU was a surprise to me, since the computer is a 16-bit computer. The expanded ALU was probably implemented to improve performance. Multiplying two 16-bit numbers yields a 32-bit result, so a 32-bit ALU makes multiplication faster. Moreover, the computer supports 32-bit floating-point numbers, so the 32-bit ALU presumably makes floating-point operations faster.

The diagram below shows the architecture of the computer's 32-bit ALU system. In the middle is the ALU itself, operating on two 32-bit operands: A and B. At the left, multiplexers ("mux") select one of four values for A and one of four values for B. At the right, the output of the ALU can be stored in three 32-bit registers, or sent to the rest of the computer via the bus. The first two registers are shift registers, allowing the value to be shifted left or right, while the third register simply holds the value in flip-flops. The first two registers are connected by buses to the rest of the computer, while the value of the third register can only be accessed by using it for another arithmetic operation.8 I suspect that the shift registers are used for multiplication and division to shift the arguments at each step.

Block diagram of the ALU/register board.

Block diagram of the ALU/register board.

The inputs to the multiplexers provide flexibility. For instance, you can add register 1 to a number from the bus, or add register 2 shifted to the right to register 3. (Note that this shifting is implemented by wiring the inputs to the multiplexer shifted left or right, completely separate from the shift register's shifting.) The "all 1's" input is either a zero input (with negative logic), or -1 (in two's-complement). The B input can be taken from the bus, allowing the value to come from memory or from a general-purpose register. The mix input is a jumble of signal lines, register bits, a shift register input, and a pull-up with no apparent pattern. I describe a few more mysteries in the footnote;9 presumably, the mysteries would be resolved if I reverse-engineered the whole computer.

The functions of the multiplexers, ALU chips, and registers depend on what instruction is being executed. Specifically, the computer's microcode engine generates control signals for the computer, including the ALU/register boards. Some of these control signals select which multiplexer inputs are used. Other control signals select the ALU's function. Finally, control signals select which register receives the ALU's output.

The board that I reverse engineered implements 12 of the 32 bits of the ALU and registers. The image below shows the role of each chip on the board. The three 4-bit ALU chips are indicated 2, 1, and 0. Each ALU chip has two multiplexer chips to select the four A input bits and two multiplexer chips to select the four B input bits.10 Thus, there are 12 multiplexer chips on the board. The three 12-bit registers A, B, and C are each implemented with three 4-bit chips. Three hex inverter chips and a 4-input NAND chip complete the board.11

The ALU/register board with the chips labeled.

The ALU/register board with the chips labeled.

These printed-circuit boards (PCBs) have some interesting features. In most electronics, circuit boards have holes only where they are needed, but the Spacelab boards have holes in a fixed grid pattern. (IBM used similar boards in its System/360 computers in the 1960s.12) A hole can hold an IC pin or other component. Or a hole can be used as a via, connecting PCB traces on different layers. Another interesting feature of the boards is the vertical metal bars underneath the integrated circuits. These bars carry heat away from the integrated circuits.

The PCB traces are more visible on the back of the board (below). The traces are thin enough that two traces can pass between a pair of holes. Note the yellow "bodge" wires, correcting errors on the circuit board. I assume that these errors were fixed for the computers used in flight.

Back of an ALU/register board. This is a different board from the one I reverse engineered, since I wanted to show the yellow wires.

Back of an ALU/register board. This is a different board from the one I reverse engineered, since I wanted to show the yellow wires.

Each board has a 96-pin connector at the bottom, which plugs into the computer's motherboard. Note the three cylindrical pins sticking out of the connector. These pins are keyed to ensure that a board can only be plugged into the correct slot. That is, each pin has a metal tab oriented in one of six directions. On the motherboard, the connectors have corresponding notches. If the tabs and the notches don't match up, the board can't be plugged in.

A close-up of the connector, showing the keying. Also note that the zig-zag pin numbering on the left changes to an irregular number on the right. Unexpectedly, pin 52 is between pins 49 and 51, for example,

A close-up of the connector, showing the keying. Also note that the zig-zag pin numbering on the left changes to an irregular number on the right. Unexpectedly, pin 52 is between pins 49 and 51, for example,

The boards in the Spacelab computer are dense, tightly packing integrated circuits to minimize the size of the computer. However, the boards are considerably less dense than American aerospace computers. In particular, the Spacelab computer used the same integrated circuit packages that were used in consumer electronics: through-hole DIPs (dual in-line packages with two rows of pins). In contrast, IBM's line of 4 Pi aerospace computers used "flat-pack" integrated circuits that were considerably smaller and thinner (details). As a result, IBM's double-sided circuit boards could hold 156 integrated circuits compared to 30 on a single-sided Mitra board of roughly the same size.

A brief history of the French computer industry leading up to this computer

Bull is one of France's earliest computing companies, created in 1931. Bull initially sold punch-card equipment, competing with IBM. By the 1960s, Bull was a major computer company with products such as the transistorized Gamma 60 computer, a large-scale mainframe that was said to be the first system specifically designed for parallel and multiprogramming. Unfortunately, Bull had difficulty competing with IBM, its stock collapsed, and Bull was acquired by General Electric in 1964, forming Bull-GE. The collapse and controversial takeover were a blow to the French computer industry, and the incident was dubbed the Affaire Bull. To make things worse, GE soon canceled two of Bull's computers, focusing instead on GE's computer line.

The Affaire Bull was not only an affront to French pride, but an indication that France was largely dependent on the US for computer technology. A second incident revealed the critical military consequences of France's weakness. In the early 1960s, France was attempting to improve its nuclear strength by develop a hydrogen bomb. The mathematics of fusion is computationally intense, so France attempted to buy powerful American computers: the CDC 6600 supercomputer and the IBM 360/92.13 However, the US government blocked the export of these computers to France in an attempt to limit nuclear proliferation.

These problems led French president Charles de Gaulle to decide that France needed a strong computer industry of its own. In 1966, he developed a plan for computing (Plan Calcul)14, where the French government would reorganize the computer industry, picking companies to lead in each sector from minicomputers to semiconductors.

In the minicomputer sector, the government created a company called CII by combining three French computer companies: SEA, CAE, and SETI. CII was primarily owned by a large French company called Thomson-CSF (now Thales).15 CII played a key role in the Spacelab computer, since CII developed the Mitra line of computers. In the mid-1970s, CII and the American company Honeywell merged, with the computer division spun off to form a new company called SEMS, with majority shareholder Thomson. Another Thomson subsidiary, CIMSA, focused on military electronics and produced the militarized versions of the Mitra line. In particular, CIMSA produced the computer for Spacelab.16

France's Plan Calcul is generally viewed as a failure. Despite expensive subsidies, the French computer industry remained weak and unable to escape American dominance. When Giscard d'Estaing was elected president of France in 1974, he ended Plan Calcul. There are various interpretations, such as the failure of government planning versus the free market, but my view is that in the 1960s and 1970s, IBM crushed most challengers in the computer industry, both American and foreign, so Plan Calcul didn't have a chance. As for Bull, the company went through a dizzying sequence of American takeovers and nationaizations by France.17 Just two months ago (March 2026), the company was reacquired by the French government.

Replacement by the IBM AP-101SL computer

Since Spacelab was a European project, using a European computer was a point of pride. Unfortunately, the French computers were eventually replaced by IBM computers due to performance needs and undoubtedly political factors.

During the Space Shuttle program, the computers on the Shuttle and in Spacelab became obsolete as computer technology rapidly advanced. Although the computers were originally considered powerful, their performance and memory capacity became problems over time. The Space Shuttle's IBM AP-101 computers were upgraded to IBM AP-101S computers, first flying in 1991. The AP-101S was half the size, three times faster, and had more than twice the memory, using semiconductor memory instead of magnetic core memory.

The Spacelab computer system needed a similar upgrade, and in 1991, the CIMSA computers on Spacelab were replaced with IBM AP-101SL computers. The AP-101SL was based on the Shuttle's upgraded AP-101S computer, but modified to support the Mitra's hardware architecture, instruction set, and I/O capabilities. The packaging of IBM's computer was slightly changed to match the dimensions of the CIMSA computer and to use an external heat exchanger rather than an internal heat exchanger.

The IBM AP-101SL Spacelab computer. The circuit boards are much larger than the original Spacelab computer boards or the original AP-101B boards. Note the flat-pack ICs on the boards. Photo courtesy of Kyle Owen.

The IBM AP-101SL Spacelab computer. The circuit boards are much larger than the original Spacelab computer boards or the original AP-101B boards. Note the flat-pack ICs on the boards. Photo courtesy of Kyle Owen.

Changing the Shuttle's 32-bit AP-101S computer to run the 16-bit Mitra instruction set was easier than you might expect, since the AP-101S already supported multiple instruction sets: a 32-bit instruction set derived from the IBM System/360 and a 16-bit instruction set called 1750A that was an Air Force Standard. Because the AP-101S implemented its instructions in microcode—low-level software that specified the steps of a machine instruction—the instruction set could be modified by updating the microcode.

I compared the circuit boards in an AP-101S with the boards in an AP-101SL to quantify the changes. The semiconductor memory boards and power supplies were essentially identical. The CPU boards had minor changes. Unsurprisingly, the I/O boards were completely different, and the complex I/O Processor (IOP) in the Shuttle's AP-101S was omitted. For more on the IBM AP-101 line, see my History of IBM's 4 Pi computers.

Conclusions

The Spacelab computer provides an interesting look at how computers were built before microprocessors took over. The components of a computer, such as the ALU, registers, and control circuitry, were constructed from simple chips. Since each chip didn't do much, the computer required 36 boards full of chips. Even so, the computer was compact enough to go into space. By modern standards, these computers aren't much—each computer had a memory capacity of just 128 KB of magnetic core memory—but they played a critical part in the space program.

I'm not going to reverse-engineer the full computer, but I may write some more about it. For updates, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS.

Credits: Thanks to Steve Jurvetson for providing the Spacelab computer for examination. Thanks to Poul-Henning Kamp for comments.

AI statement: Despite the presence of the em dash, no AI was used in the writing of this article (details).

Notes and references

  1. For details on Spacelab, see Spacelab News Reference

  2. To avoid cluttering the main article, I'll summarize the French acronyms and companies in this footnote.

    • CAE: Compagnie européenne d'automatisme électronique (European Electronic Automation Company). A French computer company founded in 1960, selling versions of American computers such as TRW's RW-300. Part of the 1966 merger that formed CII.
    • CII: Compagnie internationale d'informatique (International Computer Company): the company that created the Mitra line of minicomputers. CII also sold computers designed by the American company SDS (Scientific Systems), which was bought by Xerox in 1969 and became XDS (Xerox Data Systems). XDS was shut down in 1975, costing Xerox hundreds of millions of dollars.
    • CIMSA: Compagnie d'informatique militaire, spatiale et aéronautique (Military, Space, and Aeronautical Computing Company): the company that manufactured the Spacelab computer.
    • CSF: Compagnie Générale de Télégraphie Sans Fil (General Wireless Telegraphy Company). A radio company dating back to 1918. It merged with Thomson in 1968 to form Thomson-CSF.
    • MATRA: Mécanique Aviation Traction (Mechanics-Aviation-Traction). An electronics company that was the contractor for Spacelab's data systems.
    • Mitra: Mini-machine pour l'Informatique Temps Réel et Automatique ("Mini-machine for Real-Time and Automatic Computing"). A line of minicomputers.
    • SEA: Société d'électronique et d'automatisme (Electronics and Automation Company): a French computer manufacturer, started in 1947 and merged into CII in 1966.
    • SEMS: Société Européenne de Mini-informatique et de Systèmes (European Society for Minicomputers and Systems). A subsidiary of Thomson, created by the French government in 1976 during the merger of CII and Honeywell. SEMS took over the manufacturing of Mitra computers from CII.
    • SETI: Société européenne de traitement de l'information (European Information Processing Society). SETI was a French computer company formed in 1961. The American computer company Packard Bell owned a quarter of SETI, and SETI sold the desk-sized Packard Bell 250 computer.
     

  3. On the ground, the Spacelab project used Mitra 125 S computers that were functionally identical to the Mitra 125 MS (details). 

  4. Spacelab's Command and Data Management Subsystem (CDMS) is surprisingly complicated because of the data communication paths between Spacelab, the Shuttle, and the ground. Moreover, multiple units store, encode, and decode data. In the CDMS block diagram below, I've highlighted the three computers; they are just a small part of the CDMS. See Section 3.5 of Spacelab News Reference or The Command and Data Management System of Spacelab for details on CDMS.

    A block diagram of Spacelab's Command and Data Management Subsystem. From The Command and Data Management System of Spacelab. Click for a larger version.

    A block diagram of Spacelab's Command and Data Management Subsystem. From The Command and Data Management System of Spacelab. Click for a larger version.

     

  5. I reverse-engineered the 74181 ALU chip in this article and explained the motivation for its quirky set of operations in this article

  6. Another board in the Spacelab computer has four 74S181 chips implementing a 16-bit ALU. My guess is that this board is part of the I/O processor. The board has the cryptic label "HMSG". 

  7. My reverse-engineering process was straightforward but tedious. I used a multimeter to beep out the connections between the integrated circuits as well as the connections to the connector. (Unlike many systems that I look at, these boards didn't have conformal coating, which made beeping out the connections practical.) I created a schematic in KiCad from this data; this schematic was "physical", with the layout of the chips and pins matching their physical location on the board. Next, I converted the integrated circuit symbols from physical rectangles to logical symbols. Finally, I moved the symbols around on the schematic to make a reasonable schematic. (I had to go back and beep out more connections as I discovered errors or missing connections.) Theoretically, I could reverse-engineer the entire computer, but reverse-engineering one of the 36 boards is enough for me. 

  8. My full reverse-engineered schematic of the ALU/register board is below. Click for a larger version.

    Schematic of the ALU/register board.

    Schematic of the ALU/register board.

     

  9. A few mysteries remain in the ALU/register board. The three registers probably act as an accumulator, a temporary register, and an extra register for multiplication/division, but it's not clear which register is which. I don't understand why the inputs are organized as they are; for instance, you can't add register 1 to register 2 shifted. The mix input seems very random; maybe these signals are part of a self test? On the board, I expected to see 12 bits out of a uniform 32-bit ALU. However, the top two 4-bit "nibbles" have different control lines and different zero-detection from the third. Perhaps this is because the Mitra floating-point numbers have 24 bits of mantissa and 8 bits of exponent. It would make sense for the ALU/register board to handle these parts separately. Another mystery is that the board has a circuit to test two hardwired bits and two external bits to see if they are all 0 or all 1, for some reason. 

  10. The multiplexer chips are dual 4-to-1 multiplexers. Thus, two multiplexer chips are required to support four bits. 

  11. The chips in the Spacelab computer use a variety of part number systems. A few chips have standard industry part numbers such as "SNJ5483" (equivalent to a 7483 adder). Most of the chips are labeled with military part numbers such as JM38510/07801 BJB, using the MIL-M-38510 standard. These part numbers can be cross-referenced using the MIL-HDBK-983 handbook. Other chips, like the ones below, have Fairchild part numbers that are a mystery to me. The first line is presumably the part number, "929 567" and "929 705", but I can't find these numbers anywhere. If you know what these numbers mean, please let me know! (07263 is the CAGE code for Fairchild, and the last line is the date code.)

    Two Fairchild ICs with mysterious part numbers.

    Two Fairchild ICs with mysterious part numbers.

    The ALU/register board that I examined uses the following JM38510 part numbers, which I have converted to standard parts:
    /01403 = 54153 dual 4-1 multiplexer
    /07003 = 54S04 hex inverter
    /07006 = 54S20 4-input NAND
    /07601 = 54S194 4-bit shift register
    /07801 = 54S1814-bit ALU
    /30107 = 54LS175 quad flip-flop 

  12. The photo below compares an IBM board (top) with a Spacelab board (bottom), both from the early 1980s. It's interesting how similar the boards are. Both use a 0.1" grid of holes, unlike most printed-circuit boards, which only use holes where needed. Both boards are multi-layer with integrated circuits on one side. The IBM board is denser; the chips are spaced 0.1" apart rather than 0.3" apart.

    An IBM computer board (top) and a board from the Spacelab computer (bottom).

    An IBM computer board (top) and a board from the Spacelab computer (bottom).

    I don't know which IBM system used this board, but it was a commercial system, not an aerospace system. This board is a bit unusual for IBM, since most of the chips are standard DIPs rather than the square metal cans that IBM typically used. 

  13. The US blocked computer exports to France with NSAM 294, a 1964 National Security Action Memorandum. The US later allowed sales of the CDC 6600 and IBM 360/91 computers to France on the condition that France not use the computers for atomic weapon development, a condition that France apparently violated. See A.E.C. Bids Industry Avoid Sales Aiding French Tests (1964) and Paris Promises Not to Use Equipment for Atomic Weapons (1966). The CDC 6600 supercomputer executed up to 10 million instructions per second (MIPS) while the IBM 360/91 executed about 17 MIPS. (In comparison, a 1995 Pentium Pro or a 2012 cell phone is faster than these computers.) In 1971, Henry Kissinger was still blocking computer exports to France, as shown in this transcript. (One confusing issue in these articles is that IBM announced the 360/92 computer in 1964, but renamed it as the 360/91 before it shipped in 1967.) 

  14. Some contemporary articles on Plan Calcul are France Entering Computer Battle: Starts All-French Company to Compete (New York Times, 1967) and France: First the Bomb, Then the "Plan Calcul" (Science, 1967). See History of Computing in France: A Brief Sketch for an overview of the French computer industry. 

  15. Thomson has a complicated history. In 1883, two Americans, Thomson and Houston, started the Thomson-Houston Electric Company. A decade later, this company became General Electric, with a French subsidiary: Thomson Houston International. After various mergers, the French subsidiary became Thomson-CSF, a major defense and electronics firm.

    In a sense, Thomson-Houston both created and destroyed GE. The Thomson-Houston Electrical Company became GE, but the French subsidiary of Thomson-Houston ended up being a key part of GE's collapse almost a century later. Specifically, the French rail transport company Alsthom (later Alstom) was formed from the French heavy engineering subsidiary of Thomson-Hudson in 1928; the "thom" in "Alsthom" comes from "Thomson". In 2014, General Electric acquired Alstom for $10.1 billion. The acquisition was a disaster, and in 2018, GE wrote off $23 billion. This loss, along with other financial problems, led to GE's announcement in 2021 that it would break up into three companies. 

  16. One more company should be mentioned: MATRA. MATRA was the contractor for Spacelab's data systems, so the Spacelab computer was produced under a contract from MATRA. People often confuse Mitra (the name of the computer line) with MATRA. 

  17. Due to financial difficulties, Bull was acquired by General Electric in 1964, then was acquired by Honeywell, nationalized by France, partnered with NEC, acquired Zenith, privatized by France, and acquired by Atos. Less than two months ago, France acquired Bull, continuing the series of reorganizations.