Ken Shirriff's blog

Hammer time: fixing the printer on a vintage IBM 1401 mainframe

The Computer History Museum has two operational IBM 1401 computers used for demos but one of the printers stopped working a few weeks ago. This blog post describes how the 1401 restoration team diagnosed and repaired the printer. After a lot of tricky debugging (as well as smoke coming out of the printer) we fixed a broken trace on a circuit board. (This printer repair might sound vaguely familiar because I wrote in September about an entirely different printer failure due to a failed transistor.)

The IBM 1401 business computer was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. A key selling point of the IBM 1401 was its high-speed line printer, the IBM 1403. It printed 10 lines per second with excellent print quality, said to be the best printing until laser printers were introduced in the 1970s.

$The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).$

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 printer (right).

To print characters, the printer used a chain of type slugs (below) that rotated at high speed in front of the paper, with an inked ribbon between the paper and the chain. Each of the 132 print columns had a hammer and an electromagnet. At the right moment, when the desired character passed the hammer, an electromagnet drove the hammer against the back of the paper, causing the paper and ribbon to hit the type slug, printing the character.1

The type chain from the IBM 1401's printer. The chain has 48 different characters, repeated five times.

The printer required careful timing to make this process work. The chain spun around at 7.5 feet per second and each hammer had to fire at exactly the right time to print the right character perfectly aligned without smearing. Every 11.1 µs, a print slug lined up with a hammer, and the control circuitry checked if the slug matches the character to be printed. If so, the electromagnet was energized for 1.5 ms, printing the character.

Printing mechanism of the IBM 1401 line printer. From 1401 Reference Manual, p11.

While the printer is usually reliable, a few weeks ago the printer stopped working and displayed a "sync check" error on the console. The computer needs to know the exact position of the chain in order to fire the hammers at the right time. If something goes wrong with this synchronization, the computer stops with "sync check" rather than printing the wrong characters.

When the sync check light on the printer is illuminated, you have a problem.

To track the chain position, the computer receives a sequence of pulses from the printer: a pulse when the first hammer is lined up with a type slug2 and a double pulse when the chain is in its "home" position with the first character lined up. The pulses are created by a slotted metallic timing disk inside the printer. A magnetic pickup detects these slots and produces a small 100 millivolt signal.7 This signal is amplified inside the printer by two differential amplifier cards to produce a stronger square wave signal. (This is the only electronic part of the printer. Everything else inside the printer is electromechanical or hydraulic; a high-speed hydraulic motor feeds paper through the printer and drips oil on the floor.)

The computer receives these pulses from the printer and generates a logic signal that increments counters to keep track of the chain's position. The schematic below shows part of the circuitry inside the computer, starting with the sense amplifier signal from the printer at the left. Don't try to understand this circuit; I just want to show the strange schematic symbols that IBM used in the 1960s. The box with "I" is an inverter. The triangle is an AND gate. The semicircle that looks like an AND gate is actually an OR gate. The large box with a "T" is a trigger, IBM's name for a flip-flop. The "SS" box is a "single shot" that creates a 400µs pulse; this detects the double pulse that indicates the chain's "home" position.

Excerpt from the 1401 Intermedia Level Diagrams (ILD) showing the chain detection circuitry.

To track down the problem, we removed the printer's side panel to access the two amplifier circuit boards, which are visible below. We probed the boards with an oscilloscope. The first amplifier stage (on the right) looked okay, but the second stage (on the left) had problems. In the photo below, the computer is at the back, mostly hidden by the printer.

We took the side panel off the 1403 printer to reveal the circuit boards. We hooked an oscilloscope up to the front board to test it.

The trace below shows what should happen. The board receives a differential signal at the bottom, with alternating cyan and pink signals. The difference between these signals (middle) is amplified to produce the clean, uniform pulse train at the top. Note the double pulse in the middle indicating the chain's home position.

Oscilloscope trace from a working printer.

But when we measured the signal, we saw signals that were entirely garbled. The differential signals at the bottom are a mess, and track each other rather than alternating. The output signal (top) is basically random. With this signal from the printer, the computer couldn't keep track of the chain position and the sync check error resulted.

Oscilloscope trace from the faulty printer.

We swapped the board with the board from the other, working printer, and verified that the board was the problem. The museum has a filing cabinet full of replacement circuit boards, but unfortunately not a replacement for this "WW" amplifier board. Instead, we had to diagnose the problem with this card and repair it. On the board below you can see the diodes (small gray cylinders), capacitors (silver cylinders), resistors (striped cylinders and large tan cylinders), and germanium transistors (round metal cans). The transistors are germanium transistors as the 1401 predated silicon transistors.

The "WW" differential amplifier card used by the printer.

We suspected a failed transistor, so we used Marc's vintage ~~Hewlett-Packard~~ Tektronix curve tracer (below) to test the transistors. One transistor was much weaker than the others. Since the performance of a differential amplifier depends on having transistors with closely matched characteristics, we searched through a couple dozen transistors to find a matching pair and replaced the transistors. (We later determined that these transistors were not part of the differential pair—they were "emitter follower" buffers, so our effort was wasted.)3

We used a vintage HP transistor curve tracer to test the transistors.

Back at the Computer History Museum, we tested out the repaired board and the printer still didn't work. Even worse, smoke started coming from the back of the printer! I quickly shut off the system as an acrid smell surrounded the printer. I expected to see a blackened transistor on the board, but it was fine. I examined the printer but couldn't find the source of the smoke.

I decided to test the board outside the printer by feeding in a 2kHz test signal, but the measurements didn't make sense. The board seemed to be ignoring one of the inputs, so I tested that input transistor but it was fine. Next I checked the diodes, capacitors and resistors again; all the components tested okay, but the board still mysteriously failed. I started carefully measuring voltages at various points in the circuit but the signals didn't make sense and weren't consistent. Since all the components were fine but the board didn't work, I was starting to losing confidence in electronics. Eventually, I nailed down a signal that randomly jumped between 10 volts and 1 volt. After wiggling all the components, I finally noticed that the voltage jumped if I flexed the board. Finally, I had an answer: a cracked trace on the circuit board between the input and the transistor was making intermittent contact.

The board had a cracked trace in the upper left, connecting the upper gold contact. Carl put a wire jumper across the bad section.

To fix the board, Carl put a wire bridge across the bad trace6. We put the board in the printer, and the printer mostly worked. However, when the printer tried to print in column 85, the column failed to print and the printer stopped with an error.5 More testing revealed four columns of the printer were failing to print due to hammer problems. Each electromagnetic hammer coil is driven by a 60 volt, 5 amp pulse for 1.5 milliseconds. This is a lot of power (300 watts), so if anything goes wrong, hammer coils can easily burn up.

We swung open one of the computer's "gates" (lower left), revealing the cards that drive the printer.

We looked at the printer driver cards inside the computer. Each card generates pulses for two hammers, so there are 66 of these cards. In the photo below, you can see the two large high-current transistors at the left that generate the pulses. (Note the felt insulators on top of these transistors. Due to their height, the transistors pose a risk of shorting against the bottom of the neighboring card.) Just to the right of these transistors are two colorful purple and yellow fuses. In the event of a fault, these fuses are supposed to burn out and protect the hammer coils. We checked the cards associated with the four bad columns on the printer and found four burnt-out fuses.

The "AEC" Alloy-Hammer Driver Latch card produces high-current pulses to drive the printer hammer coils.

Why did the fuses blow? The circuit to drive the hammer coils is a bit tricky. Every 11 microseconds, a hammer lines up with a character slug and can be fired. But when a hammer is fired, the coil needs to be activated for about 1.5 ms, a much longer time interval. To accomplish this, the hammer driver latches on when a hammer is fired. Later in the print cycle, the hammer driver is turned off. This process is controlled by the chain position counters, which are driven by the pulses from the chain sensor, the same pulses that were intermittent. Thus, if the computer received enough pulses to start printing a line, but then the pulses dropped out in the middle of the line, hammer drivers could be left in the on state until the fuse blows. This explained the problem that we saw.

After Carl replaced the fuses, the printer worked fine except for two problems. First, characters in column 85 were shifted slightly so the text was slightly crooked. Frank explained that the hammer in this column must be moving a bit too slow, hitting the chain after it had moved past its position. This explained the smoke: in the time it took the fuse to blow, the coil must have overheated and been slightly damaged. We'll look into replacing this coil next week. The second problem was that the printer's Ready light didn't go on. This turned out to be simply a bad light bulb, unrelated to the rest of the problems. In any case, the printer was working well enough for demos so the repair was a success.

Closeup of the type chain (upside down) for an IBM 1403 line printer.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).

Notes and references

You might expect that the 132 hammers align with 132 type slugs, so the matching hammers all fire at once, but that's not what happens. Instead, the hammers and type slugs are spaced slightly differently, so only one hammer is aligned at a time, and a tiny movement of the chain lines up a different hammer and type slug. (Essentially they form a vernier.) Specifically, every 11.1 microseconds, the chain moves 0.001 inches. This causes a new hammer / type slug alignment. For mechanical reasons, every third hammer lines up in sequence (1, 4, 7, ...) until the end of the line is reached; this is called a "subscan" and takes 555 microseconds. Two more subscans give each hammer in the line an option to fire, forming a print scan of 1.665 milliseconds. If you want more information on how the print chain works, I have an animation here. ↩
To be precise, the printer generates a pulse if hammer 1, 2, or 3 lines up with a type slug. This is due to the three "subscans", each using every third hammer. ↩
I'll explain how the differential amplifier works in this footnote, since most readers may not want this much detail. The computer uses two differential amplifier boards in series, first a WV board and then a WW board. They use similar principles, except the WV uses NPN transistors and the WW uses PNP. The differential output from the WW board is transmitted to the computer where a third differential amplifier (an NT card) converts the signal to a logic output. Each board is a differential amplifier, which takes two inputs and amplifies the difference, essentially an op amp with two outputs.4

A differential pair circuit.

The basic differential pair circuit for a differential amplifier is shown above. (Op amps contain a similar differential pair.) The resistor at the top sets a fixed current I. If the two inputs are equal, the current will be split, with half going through each transistor and branch resistor. But if one if the inputs is slightly lower, that transistor will conduct more and most of the current will go through that branch. Thus, the difference between the inputs steers the current down one side or the other, yielding an amplified signal across the lower resistors.

Schematic of the WW amplifier board from the SMS documentation.

The IBM 1401 documentation provides the schematic above for the board, but it's hard to follow what's happening. (Note the unusual transistor symbol, three boxes with an emitter arrow in or out.) I redrew the main part of the circuit below, so it resembles the simple differential pair. It has the same resistors at top and bottom as the differential pair, but there is an R-C circuit in each branch. To simplify, if there is a DC offset or low-frequency input, the capacitor will charge and counteract this offset. Thus, the amplifier operates as a high-pass amplifier; it cuts out low-frequency noise while amplifying the 1800 Hz sync pulses. The diodes clip the output, yielding a square wave. The differential output goes through emitter-follower buffers (omitted below) so the signal is strong enough to be transmitted through an under-floor cable from the printer to the computer.

The differential amplifier circuit of the WW card.

↩
An op amp with positive and negative outputs is known as a "fully differential op amp". ↩
The IBM 1403 printer has multiple error checks to avoid printing incorrect data. For a business machine, it would be bad to drop digits in, say, payroll checks or tax records. To detect hammer failures, the printer has 132 wires from the hammers back to the computer, to verify that each hammer fired when it was supposed to. If the computer doesn't get a pulse back from a hammer, the computer stops immediately, as we saw. ↩
We noticed that there was solder smeared across the broken part of the trace. My suspicion is that the same problem happened a few years ago and was repaired by bridging the broken trace with solder. Eventually the heavy vibrations inside the printer caused a hairline crack in the solder, causing the problem to recur. By bridging the break with wire rather than just solder, we hope we have fixed the problem permanently. We also noticed the transistor connected to the broken trace had been replaced, so they must have tried that first in the previous repair. ↩
The 1403 printer is documented in IBM 1403 Printer Component Description and 1403 Printers Field Engineering Maintenance Manual. See also this brief article about the 1403 printer in the IEEE Spectrum. For details on how the timing pulses work, see the 1403 Manual of Instruction, page 42. ↩

Two bits per transistor: high-density ROM in Intel's 8087 floating point chip

The 8087 chip provided fast floating point arithmetic for the original IBM PC and became part of the x86 architecture used today. One unusual feature of the 8087 is it contained a multi-level ROM (Read-Only Memory) that stored two bits per transistor, twice as dense as a normal ROM. Instead of storing binary data, each cell in the 8087's ROM stored one of four different values, which were then decoded into two bits. Because the 8087 required a large ROM for microcode1 and the chip was pushing the limits of how many transistors could fit on a chip, Intel used this special technique to make the ROM fit. In this article, I explain how Intel implemented this multi-level ROM.

Intel introduced the 8087 chip in 1980 to improve floating-point performance on the 8086 and 8088 processors. Since early microprocessors operated only on integers, arithmetic with floating point numbers was slow and transcendental operations such as trig or logarithms were even worse. Adding the 8087 co-processor chip to a system made floating point operations up to 100 times faster. The 8087's architecture became part of later Intel processors, and the 8087's instructions (although now obsolete) are still a part of today's x86 desktop computers.

I opened up an 8087 chip and took die photos with a microscope yielding the composite photo below. The labels show the main functional blocks, based on my reverse engineering. (Click here for a larger image.) The die of the 8087 is complex, with 40,000 transistors.2 Internally, the 8087 uses 80-bit floating point numbers with a 64-bit fraction (also called significand or mantissa), a 15-bit exponent and a sign bit. (For a base-10 analogy, in the number 6.02x10²³, 6.02 is the fraction and 23 is the exponent.) At the bottom of the die, "fraction processing" indicates the circuitry for the fraction: from left to right, this includes storage of constants, a 64-bit shifter, the 64-bit adder/subtracter, and the register stack. Above this is the circuitry to process the exponent.

Die of the Intel 8087 floating point unit chip, with main functional blocks labeled.

An 8087 instruction required multiple steps, over 1000 in some cases. The 8087 used microcode to specify the low-level operations at each step: the shifts, adds, memory fetches, reads of constants, and so forth. You can think of microcode as a simple program, written in micro-instructions, where each micro-instruction generated control signals for the different components of the chip. In the die photo above, you can see the ROM that holds the 8087's microcode program. The ROM takes up a large fraction of the chip, showing why the compact multi-level ROM was necessary. To the left of the ROM is the "engine" that ran the microcode program, essentially a simple CPU.

The 8087 operated as a co-processor with the 8086 processor. When the 8086 encountered a special floating point instruction, the processor ignored it and let the 8087 execute the instruction in parallel.3 I won't explain in detail how the 8087 works internally, but as an overview, floating point operations were implemented using integer adds/subtracts and shifts. To add or subtract two floating point numbers, the 8087 shifted the numbers until the binary points (i.e. the decimal points but in binary) lined up, and then added or subtracted the fraction. Multiplication, division, and square root were performed through repeated shifts and adds or subtracts. Transcendental operations (tan, arctan, log, power) used CORDIC algorithms, which use shifts and adds of special constants, processing one bit at a time. The 8087 also dealt with many special cases: infinities, overflows, NaN (not a number), denormalized numbers, and several rounding modes. The microcode stored in ROM controlled all these operations.

Implementation of a ROM

The 8087 chip consists of a tiny silicon die, with regions of the silicon doped with impurities to give them the desired semiconductor properties. On top of the silicon, polysilicon (a special type of silicon) formed wires and transistors. Finally, a metal layer on top wired the circuitry together. In the photo below, the left side shows a small part of the chip as it appears under a microscope, magnifying the yellowish metal wiring. On the right, the metal has been removed with acid, revealing the polysilicon and silicon. When polysilicon crosses silicon, a transistor is formed. The pink regions are doped silicon, and the thin vertical lines are the polysilicon. The small circles are contacts between the silicon and metal layers, connecting them together.

Structure of the ROM in the Intel 8087 FPU. The metal layer is on the left and the polysilicon and silicon layers are on the right.

While there are many ways of building a ROM, a typical way is to have a grid of "cells," with each cell holding a bit. Each cell can have a transistor for a 0 bit, or lack a transistor for a 1 bit. In the diagram above, you can see the grid of cells with transistors (where silicon is present under the polysilicon) and missing transistors (where there are gaps in the silicon). To read from the ROM, one column select line is energized (based on the address) to select the bits stored in that column, yielding one output bit from each row. You can see the vertical polysilicon column select lines and the horizontal metal row outputs in the diagram. The vertical doped silicon lines are connected to ground.

The schematic below (corresponding to a 4×4 ROM segment) shows how the ROM functions. Each cell either has a transistor (black) or no transistor (grayed-out). When a polysilicon column select line is energized, the transistors in that column turn on and pull the corresponding metal row outputs to ground. (For our purposes, an NMOS transistor is like a switch that is open if the input (gate) is 0 and closed if the input is 1.) The row lines output the data stored in the selected column.

Schematic of a 4×4 segment of a ROM.

The column select signals are generated by a decoder circuit. Since this circuit is built from NOR gates, I'll first explain the construction of a NOR gate. The schematic below shows a four-input NOR gate built from four transistors and a pull-up resistor (actually a special transistor). On the left, all inputs are 0 so all the transistors are off and the pull-up resistor pulls the output high. On the right, an input is 1, turning on a transistor. The transistor is connected to ground, so it pulls the output low. In summary, if any inputs are high, the output is low so this circuit implements a NOR gate.

4-input NOR gate constructed from NMOS transistors.

The column select decoder circuit takes the incoming address bits and activates the appropriate select line. The decoder contains an 8-input NOR gate for each column, with one NOR gate selected for the desired address. The photo shows two of the NOR gates generating two of the column select signals. (For simplicity, I only show four of the 8 inputs). Each column uses a different combination of address lines and complemented address lines as inputs, selecting a different address. The address lines are in the metal layer, which was removed for the photo below; the address lines are drawn in green. To determine the address associated with a column, look at the square contacts associated with each transistor and note which address lines are connected. If all the address lines connected to a column's transistors are low, the NOR gate will select the column.

Part of the address decoder. The address decoder selects odd columns in the ROM, counting right to left. The numbers at the top show the address associated with each output.

The photo below shows a small part of the ROM's decoder with all 8 inputs to the NOR gates. You can read out the binary addresses by carefully examining the address line connections. Note the binary pattern: a1 connections alternate every column, a2 connections alternate every two columns, a3 connections every four columns, and so forth. The a0 connection is fixed because this decoder circuit selects the odd columns; a similar circuit above the ROM selects the even addresses. (This split was necessary to make the decoder fit on the chip because each decoder column is twice as wide as a ROM cell.)

Part of the address decoder for the 8087's microcode ROM. The decoder converts an 8-bit address into column select signals.

The last component of the ROM is the set of multiplexers that reduces the 64 output rows down to 8 rows.4 Each 8-to-1 multiplexer selects one of its 8 inputs, based on the address. The diagram below shows one of these row multiplexers in the 8087, built from eight large pass transistors, each one connected to one of the row lines. All the transistors are connected to the output so when the selected transistor is turned on, it passes its input to the output. The multiplexer transistors are much, much larger than the transistors in the ROM to reduce distortion of the ROM signal. A decoder (similar to the one discussed earlier, but smaller) generates the eight multiplexer control lines from three address lines.

One of eight row multiplexers in the ROM. This shows the poly/silicon layers, with metal wiring drawn in orange.

To summarize, the ROM stores bits in a grid. It uses eight address bits to select a column in the grid. Then three address bits select the desired eight outputs from the row lines.

The multi-level ROM

The discussion so far explained of a typical ROM that stores one bit per cell. So how did 8087 store two bits per cell? If you look closely, the 8087's microcode ROM has four different transistor sizes (if you count "no transistor" as a size).6 With four possibilities for each transistor, a cell can encode two bits, approximately doubling the density.7 This section explains how the four transistor sizes generate four different currents, and how the chip's analog and digital circuitry converts these currents into two bits.

A closeup of the 8087's microcode ROM shows four different transistor sizes. This allows the ROM to store two bits per cell.

The size of the transistor controls the current through the transistor.8 The important geometric factor is the varying width of the silicon (pink) where it is crossed by the polysilicon (vertical lines), creating transistors with different gate widths. Since the gate width controls the current through the transistor, the four transistor sizes generate four different currents: the largest transistor passes the most current and no current will flow if there is no transistor at all.

The ROM current is converted to bits in several steps. First, a pull-up resistor converts the current to a voltage. Next, three comparators compare the voltage with reference voltages to generate digital signals indicating if the ROM voltage is lower or higher. Finally, logic gates convert the comparator output signals to the two output bits. This circuitry is repeated eight times, generating 16 output bits in total.

The circuit to read two bits from a ROM cell.

The circuit above performs these conversion steps. At the bottom, one of the ROM transistors is selected by the column select line and the multiplexer (discussed earlier), generating one of four currents. Next, a pull-up resistor12 converts the transistor's current to a voltage, resulting in a voltage depending on the size of the selected transistor. The comparators compare this voltage to three reference voltages, outputting a 1 if the ROM voltage is higher than the reference voltage. The comparators and reference voltages require careful design because the ROM voltages could differ by as little as 200 mV.

The reference voltages are mid-way between the expected ROM voltages, allowing some fluctuation in the voltages. The lowest ROM voltage is lower than all the reference voltages so all comparators will output 0. The second ROM voltage is higher than Reference 0, so the bottom comparator outputs 1. For the third ROM voltage, the bottom two comparators output 1, and for the highest ROM voltage all comparators output 1. Thus, the three comparators yield four different output patterns depending on the ROM transistor. The logic gates then convert the comparator outputs into the two output bits.10

The design of the comparator is interesting because it is the bridge between the analog and digital worlds, producing a 1 or 0 if the ROM voltage is higher or lower than the reference voltage. Each comparator contains a differential amplifier that amplifies the difference between the ROM voltage and the reference voltage. The output from the differential amplifier drives a latch that stabilizes the output and converts it to a logic-level signal. The differential amplifier (below) is a standard analog circuit. A current sink (symbol at the bottom) provides a constant current. If one of the transistors has a higher input voltage than the other, most of the current passes through that transistor. The voltage drop across the resistors will cause the corresponding output to go lower and the other output to go higher.

Diagram showing the operation of a differential pair. Most of the current will flow through the transistor with the higher input voltage, pulling the corresponding output lower. The double-circle symbol at the bottom is a current sink, providing a constant current I.

The photo below shows one of the comparators on the chip; the metal layer is on top, with the transistors underneath. I'll just discuss the highlights of this complex circuit; see the footnote12 for details. The signal from the ROM and multiplexer enters on the left. The pull-up circuit12 converts the current into a voltage. The two large transistors of the differential amplifier compare the ROM's voltage with the reference voltage (entering at top). The outputs from the differential amplifier go to the latch circuitry (spread across the photo); the latch's output is in the lower right. The differential amplifier's current source and pull-up resistors are implemented with depletion-mode transistors. Each output circuit uses three comparators, yielding 24 comparators in total.

One of the comparators in the 8087. The chip contains 24 comparators to convert the voltage levels from the multi-level ROM into binary data.

Each reference voltage is generated by a carefully-sized transistor and a pull-up circuit. The reference voltage circuit is designed as similar as possible to the ROM's signal circuitry, so any manufacturing variations in the chip will affect both equally. The reference voltage and ROM signal both use the same pull-up circuit. In addition, each reference voltage circuit includes a very large transistor identical to the multiplexer transistor, even though there is no multiplexing in the reference circuit, just to make the circuits match. The three reference voltage circuits are identical except for the size of the reference transistor.9

Circuit generating the three reference voltages. The reference transistors are sized between the ROM's transistor sizes. The oxide layer wasn't fully removed from this part of the die, causing the color swirls in the photo.

Putting all the pieces together, the photo below shows the layout of the microcode ROM components on the chip.12 The bulk of the ROM circuitry is the transistors holding the data. The column decoder circuitry is above and below this. (Half the column select decoders are at the top and half are at the bottom so they fit better.) The output circuitry is on the right. The eight multiplexers reduce the 64 row lines down to eight. The eight rows then go into the comparators, generating the 16 output bits from the ROM at the right. The reference circuit above the comparators generates the three reference voltage. At the bottom right, the small row decoder controls the multiplexers.

Microcode ROM from the Intel 8087 FPU with main components labeled.

While you'd hope for the multi-level ROM to be half the size of a regular ROM, it isn't quite that efficient because of the extra circuitry for the comparators and because the transistors were slightly larger to accommodate the multiple sizes. Even so, the multi-level ROM saved about 40% of the space a regular ROM would have taken.

Now that I have determined the structure of the ROM, I could read out the contents of the ROM simply (but tediously) by looking at the size of each transistor under a microscope. But without knowing the microcode instruction set, the ROM contents aren't useful.

Conclusions

The 8087 floating point chip used an interesting two-bit-per-cell structure to fit the microcode onto the chip. Intel re-used the multi-level ROM structure in 1981 in the doomed iAPX 432 system.11 As far as I can tell, interest in ROMs with multiple-level cells peaked in the 1980s and then died out, probably because Moore's law made it easier to gain ROM capacity by shrinking a standard ROM cell rather than designing non-standard ROMs requiring special analog circuits built to high tolerances.14

Surprisingly, the multi-level concept has recently returned, but this time in flash memory. Many flash memories store two or more bits per cell.13 Flash has even achieved a remarkable 4 bits per cell (requiring 16 different voltage levels) with "quad-level cell" consumer products announced recently. Thus, an obscure technology from the 1980s can show up again decades later.

I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future 8087 articles. I also have an RSS feed. Thanks to Jeff Epler for suggesting that I investigate the 8087's ROM.

Notes and references

The 8087 has 1648 words of microcode (if I counted correctly), with 16 bits in each word, for a total of 26368 bits. The ROM size didn't need to be a power of two since Intel could build it to the exact size required. ↩
Sources provide inconsistent values for the number of transistors in the 8087: Intel claims 40,000 transistors while Wikipedia claims 45,000. The discrepancy could be due to different ways of counting transistors. In particular, since the number of transistors in a ROM, PLA or similar structure depends on the data stored in it, sources often count "potential" transistors rather than the number of physical transistors. Other discrepancies can be due to whether or not pull-up transistors are counted and if high-current drivers are counted as multiple transistors in parallel or one large transistor. ↩
The interaction between the 8086 processor and the 8087 floating point unit is somewhat tricky; I'll discuss some highlights. The simplified view is that the 8087 watches the 8086's instruction stream, and executes any instructions that are 8087 instructions. The complication is that the 8086 has an instruction prefetch buffer, so the instruction being fetched isn't the one being executed. Thus, the 8087 duplicates the 8086's prefetch buffer (or the 8088's smaller prefetch buffer), so it knows that the 8086 is doing. Another complication is the complex addressing modes used by the 8086, which use registers inside the 8086. The 8087 can't perform these addressing modes since it doesn't have access to the 8086 registers. Instead, when the 8086 sees an 8087 instruction, it does a memory fetch from the addressed location and ignores the result. Meanwhile, the 8087 grabs the address off the bus so it can use the address if it needs it. If there is no 8087 present, you might expect a trap, but that's not what happens. Instead, for a system without an 8087, the linker rewrites the 8087 instructions, replacing them with subroutine calls to the emulation library. ↩
The reason ROMs typically use multiplexers on the row outputs is that it is inefficient to make a ROM with many columns and just a few output bits, because the decoder circuitry will be bigger than the ROM's data. The solution is to reshape the ROM, to hold the same bits but with more rows and fewer columns. For instance, the ROM can have 8 times as many rows and 1/8 the columns, making the decoder 1/8 the size.

In addition, a long, skinny ROM (e.g. 1K×16) is inconvenient to lay out on a chip, since it won't fit as a simple block. However, a serpentine layout could be used. For example, Intel's early memories were shift registers; the 1405 held 512 bits in a single long shift register. To fit this onto a chip, the shift register wound back and forth about 20 times (details). ↩
Some IBM computers used an unusual storage technique to hold microcode: Mylar cards had holes punched in them (just like regular punch cards), and the computer sensed the holes capacitively (link). Some computers, such as the Xerox Alto, had some microcode in RAM. This allowed programs to modify the microcode, creating a new instruction set for their specific purposes. Many modern processors have writeable microcode so patches can fix bugs in the microcode. ↩
I didn't notice the four transistor sizes in the microcode ROM until a comment on Hacker News mentioned that the 8087 used two-bit-per-cell technology. I was skeptical, but after looking at the chip more closely I realized the comment was correct. ↩
Several other approaches were used in the 1980s to store multiple bits per cell. One of the most common was used by Mostek and other companies: transistors in the ROM were doped to have different threshold voltages. By using four different threshold voltages, two bits could be stored per cell. Compared to Intel's geometric approach, the threshold approach was denser (since all the transistors could be as small as possible), but required more mask layers and processing steps to produce the multiple implantation levels. This approach used the new (at the time) technology of ion implantation to carefully tune the doping levels of each transistor.

Ion implantation's biggest impact on integrated circuits was its use to create depletion transistors (transistors with a negative threshold voltage), which worked much better as pull-up resistors in logic gates. Ion implantation was also used in the Z-80 microprocessor to create some transistor "traps", circuits that looked like regular transistors under a microscope but received doping implants that made them non-functional. This served as copy protection since a manufacturer that tried to produce clones on the Z-80 by copying the chip with a microscope would end up with a chip that failed in multiple ways, some of them very subtle. ↩
The current through the transistor is proportional to the ratio between the width and length of the gate. (The length is the distance between the source and drain.) The ROM transistors (and all but the smallest reference transistor) keep the length constant and modify the width, so shrinking the width reduces the current flow. For MOSFET equations, see Wikipedia. ↩
The gate of the smallest reference transistor is made longer rather than narrower, due to the properties of MOS transistors. The problem is that the reference transistors need to have sizes between the sizes of the ROM transistors. In particular, Reference 0 needs a transistor smaller than the smallest ROM transistor. But the smallest ROM transistor is already as small as possible using the manufacturing techniques. To solve this, note that the polysilicon crossing the middle reference transistor is much thicker horizontally. Since a MOS transistor's properties are determined by the width to height ratio of its gate, expanding the polysilicon is as good as shrinking the silicon for making the transistor act smaller (i.e. lower current). ↩
The ROM logic decodes the transistor size to bits as follows: No transistor = 00, small transistor = 01, medium transistor = 11, large transistor = 10. This bit ordering saves a few gates in the decoding logic; since the mapping from transistor to bits is arbitrary, it doesn't matter that the sequence is not in order. (See "Two Bits Per Cell ROM", Stark for details.) ↩
Intel's iAPX 43203 interface processor (1981) used a multiple-level ROM very similar to the one in the 8087 chip. For details, see "The interface processor for the Intel VLSI 432 32 bit computer," J. Bayliss et al., IEEE J. Solid-State Circuits, vol. SC-16, pp. 522-530, Oct. 1981.
The 43203 interface processor provided I/O support for the iAPX 432 processor. Intel started the iAPX 432 project in 1975 to produce a "micromainframe" that would be Intel's revolutionary processor for the 1980s. When the iAPX 432 project encountered delays, Intel produced the 8086 processor as a stopgap, releasing it in 1978. While the Intel 8086 was a huge success, leading to the desktop PC and the current x86 architecture, the iAPX 432 project ended up a failure and ended in 1986. ↩
The schematic below (from "Multiple-Valued ROM Output Circuits") provides details of the circuitry to read the ROM. Conceptually the ROM uses a pull-up resistor to convert the transistor's current to a voltage. The circuit actually uses a three transistor circuit (T3, T4, T5) as the pull-up. T4 and T5 are essentially an inverter providing negative feedback via T3, making the circuit less sensitive to perturbations (such as manufacturing variations). The comparator consists of a simple differential amplifier (yellow) with T6 acting as the current source. The differential amplifier output is converted into a stable logic-level signal by the latch (green).

↩
Diagram of 8087 ROM output circuit.
Flash memories are categorized as SLC (single level cell—one bit per cell), MLC (multi level cell—two bits per cell), TLC (triple level cell—three bits per cell) and QLC (quad level cell—four bits per cell). In general, flash with more bits per cell is cheaper but less reliable, slower, and wears out faster due to the smaller signal margins. ↩
The journal Electronics published a short article "Four-State Cell Doubles ROM Bit Capacity" (p39, Oct 9, 1980), describing Intel's technique, but the article is vague to the point of being misleading. Intel published a detailed article "Two bits per cell ROM" in COMPCON (pp209-212, Feb 1981). An external group attempted to reverse engineer more detailed specifications of the Intel circuits in "Multiple-valued ROM output circuits" (Proc. 14th Int. Symp. Multivalue Logic, 1984). Two papers describing multiple-value memories are A Survey of Multivalued Memories (IEEE Transactions on Computers, Feb 1986, pp 99-106) and A review of multiple-valued memory technology (IEEE Symposium on Multiple-Valued Logic, 1998). ↩

Bad relay: Fixing the card reader for a vintage IBM 1401 mainframe

As soon as we finished repairing a printer failure at the Computer History Museum, Murphy's law struck and the card reader started malfunctioning. The printer and card reader were attached to an IBM 1401, a business computer that was announced in 1959 and went on to become the best-selling computer of the mid-1960s. In that era, data records were punched onto 80-column punch cards and then loaded into the computer by the card reader, which read cards at the remarkable speed of 13 cards per second. This blog post describes how we debugged the card reader problem, eventually tracking down and replacing a faulty electromechanical relay inside the card reader.

The IBM 1402 card reader at the Computer History Museum. Cards are loaded into the hopper on the right. The front door of the card reader is open, revealing the relays and other circuitry.

The card reader malfunction started happening every time the "Non-Process Run-Out" mechanism (NPRO) was used. During normal use, if the card reader stopped in the middle of processing, unread cards could remain inside the reader. To remove them, the operator would press the "Non-Process Run-Out" switch, which would run the remaining cards through the reader without processing them. Normally, the "Reader Stop" light (below) would illuminate after performing an NPRO. The problem was the "Reader Check" light also came on, indicating an error in the card reader. Since there was no actual error and the light could be cleared simply by pressing the "Check Reset" button, this problem wasn't serious, but we still wanted to fix it.

Control panel of the 1402 card reader showing the "Reader Check" error. The Non-Process Run-Out switch is used to run cards out of the card reader without processing them.

To track down the problem, the 1401 restoration team started by probing the card reader error circuitry inside the computer with an oscilloscope. (The card reader itself is essentially electromechanical; the logic circuitry is all inside the computer.) The photo below shows the 1401 computer with a swing-out gate opened to access the circuitry. Finding a circuit inside the IBM 1401 is made possible by the binders full of documentation showing the location and wiring of all the computers circuitry. This documentation was computer generated (originally by a vacuum tube IBM 704 or 705 mainframe), so the diagrams were called Automated Logic Diagrams (ALD).

The IBM 1401 with gate 01B4 opened. The yellow wire-wrapped wiring connects the circuit boards that are plugged into the gate. The computer's console is visible on the front of the computer.

The reader check error condition is stored in a latch circuit; the latch is set when an error signal comes in, and cleared when you press the reset button. To find this circuit we turned to the ALD page that documented the card reader's error checking circuitry. The diagram below is a small part of this ALD page, showing the read check latch of interest: "READ CHK LAT". Each basic circuit (such as a logic gate) is drawn on the ALD as a box, with lines showing how they are connected. Inside the box, cryptic text indicates what the circuit does and where it is inside the computer. For example, the latch (lower two boxes) is constructed from a circuit card of type CQZV (an inverter) and a CHWW card (a NAND gate). The text inside the box also specifies the location of each card (slots D21 and D24 in gate 01B4), allowing us to find the cards in the computer. Note that the RD REL CHK signal comes from ALD page 56.70.21.2; this will be important later.

The read check latch circuit, excerpted from the ALD 36.14.11.2.

The schematic below shows the latch redrawn with modern symbols. If the RESET line goes low, it will force the inverted output high. Otherwise, the output will cycle around through the gates, latching the value. If any OR input is high (indicating an error), it will force the latch low. Note the use of wired-OR—instead of using an OR gate, signals are simply wired together so if any signal is high it will pull the line high. Because transistors were expensive when the IBM 1401 was built, IBM used tricks like wired-OR to reduce the transistor count. Unfortunately, the wired-OR made it harder to determine which error input was triggering the latch because the signals were all tied together.

The read check latch circuit redrawn with modern symbols.

Once we located the circuit cards for the latch, we used an oscilloscope to verify that the latch itself was operating properly. Next, we needed to determine why it was receiving an error input. After disconnecting wires to get around the wired-OR, we found that the error was not coming from the Read/punch error or the Feed error signal. The RD REL CHK was the obvious suspect; this signal was part of the optional Read Punch Release feature1. However, the team insisted that Read Punch Release wasn't installed in our 1401. The source of this signal was ALD 56.70.21.2 and
our documentation didn't include ALD section 56, confirming that this feature wasn't present in our system.

Additional oscilloscope tracing showed a lot of noise on some of the signals from the card reader. This wasn't unexpected since the card reader is built from electromechanical parts: relays, cam switches, brushes, motors, solenoids and other components that generate noise and voltage spikes. I considered the possibility that a noise spike was triggering the latch, but the noise wasn't reaching that circuit.

The plug charts show the type of card in each position in the computer, and the function assigned to it. This is part of the plug chart for gate 01B4.

At this point, I was at a dead end, so I took another look at the RD REL CHK signal to see if maybe it did exist. The 1401's documentation includes "plug charts," diagrams that show what circuit card is plugged into each position in the computer. I looked at the plug chart for the card reader circuitry in gate 01B4 (swing-out gate, not logic gate). The plug chart (above) showed cards assigned to the mysterious ALD 56.70.21.2, such as the cards in slots A15-A17 and B15. (The plug chart also had numerous pencil updates, crossing out cards and adding new ones, which didn't give me a lot of confidence in its accuracy.) I looked inside the computer and found that these cards, the Read Punch Release cards generating RD REL CHK, were indeed installed in the computer. So somehow our computer did have this feature.

The gate in the 1401 holding the card reader circuitry. Note the cards in positions A15 and B15.

The problem was that even though these cards were present in the system, we didn't have the ALDs that included them, leaving us in the dark for debugging. I checked the second 1401 at the Computer History Museum; although it too had the cards for Read Punch Release, its documentation binders also mysteriously lacked the section 56 ALDs. Fortunately, back in 2006, the Australian Computer Museum Society sent us scans of the ALDs for their 1401 computer. I took a look and found that the Australian scans included the mysterious section 56. There was no guarantee that their 1401 had the same wiring as ours (since the design changed over time), but this was all I had to track down RD REL CHK.

Simplified excerpt of ALD 56.70.21.2 from an Australian 1401 computer.

According to the Australian ALD above, RD REL CHK was generated by the CGVV card in slot A17 (upper right box above). The oscilloscope trace below confirmed that this card was generating the RD REL CHK signal (yellow), but its input (RD BR IMP CB) (cyan) looked bad. Notice that the yellow line jumps up suddenly (as you'd expect from a logic signal), but the cyan line takes a long time to drop from the high level to the low level. Perhaps a weak transistor in a circuit was pulling the signal down slowly, or some other component had failed.

Oscilloscope trace showing the "circuit breaker" signal from the card reader (cyan) and the READ REL CHK error signal (yellow).

We looked into the RD BR IMP CB signal, short for "ReaD BRush IMPulse Circuit Breakers". In IBM terminology, a "circuit breaker" is a cam-operated switch, not a modern circuit breaker that trips when overloaded. The read brush circuit breakers generate timing pulses when the read brushes that detect holes in the punch card are aligned with a row of holes, telling the computer to read the hole pattern.

The NGXX integrator card contains resistor-capacitor filters. Unlike most cards, this one doesn't have any transistors. Photo courtesy of Randall Neff.

We looked up yet another ALD page to find the source of the strangely slow RD BR IMP CB signal. That signal originated in the card reader and then passed through an NGXX integrator card (above). Earlier I mentioned that the signals from the card reader were full of noise. This isn't a big problem inside the card reader since brief noise spikes won't affect relays. But once signals reach the computer, the noise must be eliminated. This is done in the 1401 by putting the signal through a resistor-capacitor low-pass filter, which IBM calls an "integrator". That card eliminates noise by making the signal change very slowly. In other words, although the signal on the oscilloscope looked strange, it was the expected behavior and not a problem. But why was there any signal there at all?

Part of IBM 1402 card reader schematic showing the cams (circles) that generate the CB read pulses and the relay that blocks the pulses during NPRO.

After some discussion, the team hypothesized that the pulses on RD BR IMP CB shouldn't be getting to the 1401 at all doing a Non-Process Run-Out, since cards aren't being read. The schematic4 for the card reader (above) shows the complex arrangement of cams and microswitches that generates the pulses. During an NPRO, relay #4 will be energized, opening the "READ STOP 4-4" relay contacts. This will stop the BRUSH IMP CB pulses from reaching the 1401.53 In other words, relay #4 should have blocked the pulses that we were seeing.

Frank King replacing a bad relay in the 1402 card reader. The relays are next to his right shoulder.

The card reader contains rows of relays; the reader's hardware is a generation older than the 1401 and it implements its basic control functions with relays rather than logic gates. Frank pulled out relay #4 and inspected it.6 The relay (below) has 6 sets of contacts, activated by an electromagnet coil (yellow) and held in position by a second coil. Springs help move the contacts to the correct positions. One of the springs appeared to be weak, preventing the relay from functioning properly.7 Frank put in a replacement relay and found that the card reader now performed Non-Process Run-Outs without any errors. We loaded a program from cards just to make sure the card reader still performed its main task, and that worked too. We had fixed the problem, just in time for lunch.

The faulty relay from the IBM 1402 card reader.

Conclusions

It is still a mystery why section 56 of the ALDs was missing from our documentation. As for the presence of the Read Punch Release feature on our 1401, that feature turns out to be standard on 1401 systems like the ones at the museum.8 I think the belief that our 1401 didn't include this feature resulted from confusion with the Punch Feed Read feature, which we don't have. (That feature allowed a card to be read and then punched with additional data as it passed through the card reader.)

The team that fixed this problem included Frank King, Alexey Toptygin, Ron Williams and Bill Flora. My previous blog post about fixing the 1402 card reader is here, tracking down an elusive problem with a misaligned cam.

Notes and references

Read Punch Release was a feature to allow the CPU to operate while reading a card. IBM's large 7000 series mainframes used "data channels," which were high-performance I/O connections using DMA and controlled by separate I/O processors. With a data channel, the CPU could process data while the channel performed I/O. But the 1401 was a much simpler machine, designed to replace electromechanical accounting machines that would read a card, process the card, and print out results. On the 1401, when the CPU executed an instruction to read a card, the CPU would wait while the card moved through the card reader and passed under the brushes to be read. The mechanical cycle to read a card took 75 ms (corresponding to 800 cards per minute), of which only 10 ms was available for the CPU to perform computation and the rest was wasted (from the CPU's perspective). The Read Punch Release feature was a workaround for this. The programmer could issue an SRF (Start Read Feed) instruction, which would cause a card to start moving through the card reader. The program had 21 ms to perform computation and execute the read instruction before the card reached the reading station. (If the program executed the read instruction too late, the computer wouldn't be able to read the card and would halt with an error.) This provided extra computation time in each card read cycle. IBM charged a monthly fee for additional features; Read Punch Release was relatively inexpensive at $25 per month (equivalent to about $200 today). ↩
The Read Punch Release feature also provided a similar instruction for punching a card, allowing an extra 37 ms of computation while punching a card. See page 16 of the IBM 1402 Card Read-Punch Manual for details on card timing and the read punch release operation. ↩
The card reader schematic shows that six separate cams were required to generate the RD BR IMP CB signal. The problem is that cards are read at high speed, so rows on the card are read just 3.75 ms apart. Cams and microswitches are too slow to generate pulses at this rate. To get around this, pulses for odd rows and even rows are generated separately. In addition, one set of switches closes for the start of a pulse and a second set opens for the end of a pulse. Needless to say, it is a pain to adjust all these cams so the pulses have the right timing and duration. If this timing is off, cards won't read correctly.

To improve reliability and reduce maintenance, IBM eventually replaced these cams with a "solar cell" (i.e. a photo-cell), slotted disk, and light. The light passing through the slotted disk triggered a pulse from the photo-cell. Our ALDs had some penciled-in modifications suggesting that our 1401 was originally configured to work with a solar cell card reader and then modified to work with the older circuit breaker card reader. ↩
The schematic for the 1402 card reader is here. The read brush impulse CB signal is generated on pdf page 8. This document also includes instructions on how to upgrade from the "circuit breaker" circuit to the "solar cell" circuit, a change that is indicated as taking 1.0 to 1.5 hours for the hardware installation, 1.5 to 2.0 hours for miscellaneous electrical changes, and 2.3 to 3.7 hours to wire up the new circuit. (See pdf pages 47-53.) ↩
The relays involved in an NPRO operation are documented in 1402 Card Read-Punch Customer Engineering Manual of Instruction, page 4-3 or pdf page 31. ↩
The relay is a "permissive make" relay, a type of relay that IBM designed to be twice as fast as regular relays. For a detailed discussion of IBM's relays, see Commutation and Control. The permissive make relay is discussed on page 59 (pdf page 18). ↩
Stan Paddock on the 1401 team built a relay tester that we could use to check the bad relay. Unfortunately, the 1401 workshop at the Computer History Museum is closed due to construction so we couldn't access the tester (or the collection of spare relays in the workshop). Fortunately, we had a spare relay that wasn't in the workshop for some reason. ↩
The IBM Sales Manual lists the various 1401 features and their prices (pdf page 50). It states the Read Punch Release feature is standard on the 1401 Model C, the "729 Tape/Card System". ↩