Ken Shirriff's blog

Inside the tiny chip that powers Montreal subway tickets

To use the Montreal subway (the Métro), you tap a paper ticket against the turnstile and it opens. The ticket works through a system called NFC, but what's happening internally? How does the ticket work without a battery? How does it communicate with the turnstile? And how can it be so cheap that you can throw the ticket away after one use? To answer these questions, I opened up a ticket and examined the tiny chip inside.

The image below shows the chip inside the ticket, highly magnified. The four golden squares in the corner are the connections to the antenna. The tan-colored lines are the metal wiring layer on top of the chip; the thickest lines wire the antenna to other parts of the chip. The darker region that takes up the majority of the chip is the chip's digital logic. To the left is the analog circuitry that handles the signal from the antenna.

The MIFARE Ultralight die under the microscope. (Click this image (or any other) for a larger view.

The chip uses NFC (Near-Field Communication). The idea behind NFC is that a reader (i.e. the turnstile) and an NFC tag (i.e. the ticket) communicate over a short distance through magnetic fields, allowing them to exchange data. The reader generates a magnetic field that both powers the tag and sends data to the tag. Both the reader and the tag have coil-like antennas so the reader's magnetic field can be picked up by the tag.1 When you tap your ticket on the turnstile, the NFC communication happens in 35 milliseconds, faster than an eyeblink. The data provided by the NFC tag shows that you have a valid ticket and then you can enter the subway.

The photo below shows the subway ticket, made of printed paper.2 At the right, the ticket appears to have golden smart-card contacts, like a credit card with an EMV chip. However, those contacts are completely fake, just printed onto the card with ink, and there is no chip there. Presumably, the makers thought that making the card look like a smart card would help people understand it. The card actually uses an entirely different technology.

A Montreal subway card. This card is for occasional use and is disposable. Regular travel uses a rigid plastic card containing a different chip.

Although the subway card is paper on the outside, its core is a thin plastic sheet, shown below. The sheet has a coiled antenna made from a layer of metal foil. If you look closely, you can see the tiny NFC chip in the lower right, a black speck connected to two sides of the antenna wire.3 The diagonal metal stripe in the upper left makes the antenna into a loop; topologically, a spiral antenna won't work on a 2-D sheet, so the diagonal bridge completes the circuit.

The antenna and chip inside the subway card.

I want to emphasize the absurdly small size of the chip: 570 µm × 485 µm. The photo below shows that it is about the size of a grain of salt. The chip is also extremely thin—75 µm or 120 µm—so you can't even feel the chip inside the ticket.

The chip next to grains of salt. I composited two images, one illuminated from above to show the die and one illuminated from below to show the salt.

Functions of the chip

There are many different types of NFC chips with varying levels of functionality. 4 This one is called the MIFARE Ultralight EV1,5 a low-cost chip designed for one-time ticketing applications. The basic function of the Ultralight chip is simple: providing a block of data to the reader. The chip holds its data in a small EEPROM; this chip has 48 bytes of user memory, while another variant has 108 bytes of user memory.

The Ultralight chip lacks the cryptography support found in more advanced chips. The Ultralight isn't much more secure than a printed ticket with a QR code or barcode, like you'd download for a show. It's up to the reader to validate the data and make sure the same ticket isn't being used multiple times.6

The Ultralight chip has a few features beyond a printed ticket, though. The chips are manufactured with a unique 7-byte identification code (UID). Moreover, the UID is signed, ensuring that fake UIDs cannot be generated.7 The chip also supports password-protected memory access and locking of memory pages to prevent modification. Since the password is transmitted without encryption, the security is weak, but better than nothing.8

Another interesting feature of the chip is the one-way counter. The chip has three 24-bit counters that can be incremented but not decremented. The counters can be used to allow the ticket to be used a particular number of times, for instance.9

Photographing the chip

To photograph the chip, I went through several steps to remove the chip from the ticket and then strip the chip down to the bare silicon. First, to extract the plastic sheet with the chip and the antenna from the paper ticket, I simply soaked the ticket in water. This turned the paper into mush, which could be scraped off to reveal the plastic core. Next, I cut out a small square of plastic that included the chip and put it in boiling sulfuric acid for about 30 seconds. This removed the plastic and adhesive, leaving the silicon die. (I try to avoid boiling acids, but processing a tiny chip like this only required a few drops of sulfuric acid, minimizing the risk.)

The die was covered with a passivation layer to protect its surface, a sandwich of silicon nitride and PSG (phosphosilicate glass) 1.1 µm thick according to the datasheet. The chip's underlying circuitry was visible, but slightly hazy due to this layer. I removed the passivation layer by boiling the chip in phosphoric acid for a few minutes. The image below shows the chip after this step. The top metal layer is much more visible, although some of the metal was dissolved by the acid. The thick metal lines connect the four bond pads to various parts of the analog circuitry, while many thin vertical metal lines provide interconnections of the logic circuitry.

The die after treatment with phosphoric acid to remove the passivation layer. Click for a much larger version.

Next, I treated the die with several cycles of treatment with Armour Etch to dissolve the oxide layer and hydrochloric acid to dissolve the metal. I think the chip had three layers of metal wiring on top of the silicon. Unfortunately, my process doesn't remove the metal layers cleanly, but causes them to come off in chaotic tangles. Since I wasn't interested in tracing the circuitry layer-by-layer, this wasn't a significant problem.

With the metal layers and polysilicon removed, I was left with the bare silicon. At this point, the underlying structure of the chip is visible. The doped silicon regions show the transistors, although they are extremely small at this scale. The white rectangles are capacitors. The chip has capacitors for many reasons: producing the right resonant frequency with the antenna, filtering the power, and boosting the voltage with charge pumps.

The die after stripping it down to the silicon.

My biggest concern while processing this chip was to avoid losing it. With a chip this small, bumping the chip or even breathing on it can send the chip flying perhaps never to be seen again. Even trying to pick up the chip with tweezers is risky, since it can easily pop out and disappear. It's no fun examining the floor, inch by inch, trying to figure out if a speck is the lost chip or a bit of dirt. I found that the best way to move the chip between processing and a microscope slide was to put the chip in a few drops of water and move it with a pipette. Even so, there were a couple of times that I lost track of the chip and had to check some specks under the microscope to determine which was the chip and which were dirt.

Overview of the chip

The block diagram below shows the high-level structure of the chip. At the left, the antenna is connected to the RF interface, the analog circuitry that converts the high-frequency signals into digital data. This circuitry also extracts power from the antenna's signal to power the chip.

Block diagram of the MIFARE Ultralight chip, from the datasheet.

The majority of the chip contains digital logic to process the 18 different commands that it can receive from the reader. Some commands, such as Wake-up or Halt control the chip's state. Other commands, such as Read or Write provide access to the EEPROM storage. The specialized Read_Cnt and Incr_Cnt commands access the chip's counters.

The chip has an "intelligent anticollision function" that allows multiple cards to be read without conflict if they are presented to the reader simultaneously. If a conflict is detected, the reader uses a standard NFC algorithm to select the cards one at a time, based on their identification numbers. The anticollision algorithm uses four of the chip's commands.

Finally, the chip has an EEPROM to store its data. Unlike RAM, the EEPROM holds data even when unpowered; it is designed to hold data for 10 years. To store data in the EEPROM, it must be written with a higher voltage than the rest of the chip uses. The EEPROM interface circuit produces the necessary signals.

The diagram shows the chip with its functional blocks labeled. The majority of the die is occupied with digital logic; I'll explain below how it is implemented with standard-cell logic. At the top is the EEPROM, a square of storage cells. To the right of the EEPROM is a charge pump, a circuit to boost the voltage through switched capacitors. The EEPROM interface circuitry is between the EEPROM and the digital logic.

The die, stripped down to the silicon, with presumed functional blocks labeled.

The remainder of the chip contains analog circuitry that is harder to interpret, so my labels are somewhat speculative. The four bond pads are where the antenna is connected to the chip. There are four pads to support two parallel antennas if desired. The first die photo shows the metal wiring between the bond pads and the structures that I've labeled as RF transistors and RF diodes. The "RF transistors" in the upper left are large, oval-shaped structures. These may be the transistors that send data back to the reader by modifying the load. Alternatively, they could be Zener diodes to regulate the voltage powering the chip, since Zener diodes often have an oval shape. The "RF diodes" at the bottom may rectify the signal from the antenna, producing the power for the chip. The rectified signal is also demodulated and processed by the analog logic to extract the digital data sent from the reader.

Sending data from the tag to the reader: load modulation

You might expect the tag to send data back to the receiver by transmitting a signal through the antenna. However, transmitting a signal takes power and the tag doesn't have much power available, just the power that it extracts from the reader's signal. Instead, the tag uses a clever technique called load modulation to send data to the reader. The idea is that if the tag changes the load across the antenna, it will absorb more or less energy from the reader. The reader can detect this change as a small variation in voltage across its transmitting antenna. Thus, the tag can dynamically change its load to send data back to the reader. Even though the signal produced by load modulation is extremely weak (80 dB less than the transmitted signal), the reader can detect it and extract the data.

In more detail, the reader transmits at a carrier frequency of 13.56 MHz.10 To send data back, the tag switches its load on and off at 848 kHz (1/16 of the carrier frequency), producing a subcarrier on top of the reader's signal. To transmit bits, this load modulation is switched on or off to transmit 106 kilobits per second (1/8 of the modulation frequency). The reader, in turn, extracts the subcarrier with a filter to receive the data bits from the tag.

An NFC tag can apply a load that is either a resistor or a capacitor; a resistor absorbs the signal directly, while a capacitor changes the antenna's resonant frequency and thus the amount of signal transferred to the tag. The die contains many capacitors, but I didn't see any significant resistors, so I suspect that this chip uses a capacitor for the load.

The chip's manufacturing process

The image below shows an extreme closeup of the die. The red box surrounds a region of doped silicon, forming five MOS transistors in series. Each dark vertical line corresponds to the gate of one transistor so the width of this line corresponds to the feature size. I estimate that the chip's feature size is 180 nm. In comparison, the wavelength of visible light is 400-700 nm. Since the features are smaller than the wavelength of light, it's not surprising that image appears blurry.

A closeup of the die, pushing the limits of my microscope.

The 180 nm process was popular in the late 1990s. These features are very large, however, compared to recent chips with features that are a few nanometers across. At the time the MIFARE Ultralight EV1 chip was released (October 2012), the newest semiconductor manufacturing process was 22 nm, so the 180 nm process they used was old even then.

However, it makes sense that the chip would be manufactured with an older process for several reasons. First, much of the chip's area is occupied by analog circuitry and the four bond pads, so shrinking the digital logic won't reduce the overall size much. Moreover, a significantly smaller chip would be impractical to attach to the antenna; I expect even the current chip is a pain to mount. Finally, this chip is designed for the extremely low-cost (i.e. disposable) market, so the chip is manufactured as inexpensively as possible. With a more modern process, more chips would fit on a wafer, dropping the price, but manufacturing each wafer would be more expensive, so there is a tradeoff.

Standard-cell logic

The chip's digital circuitry is implemented with standard-cell logic, a common way of implementing digital logic. The idea behind standard-cell logic is to use automated tools to create the chip layout from a description of the desired logic. The process starts with a library of standard cells. Each cell is a standardized implementation of a simple circuit such as a NAND gate or a flip-flop. The cells are designed so they have a fixed height and can be arranged in rows. The cells are then connected by metal wiring on top of the cells to produce the desired circuitry. Although the resulting circuitry isn't as dense and efficient as a fully customized and optimized layout, standard cell logic is much faster (and thus cheaper) to design than a hand-tuned layout. Thus, standard-cell logic has been heavily used for integrated circuit design since the 1980s.

The photo below shows four rows of gates implemented with standard cell logic, The chip (like most modern chips) uses CMOS logic, with each logic gate built from two types of transistors: NMOS and PMOS. To simplify manufacturing, the NMOS and PMOS transistors are arranged in separate rows. Thus, each row of logic consists of a row of PMOS transistors on top and a row of NMOS transistors below, or vice versa. Due to the physics of semiconductors, the PMOS transistors are larger, which allows the transistor types to be distinguished in the image.

A closeup of the standard cell logic.

Looking at some of the cells and extrapolating, I estimate about 8000 gates in the logic section with about 45,000 transistors. One question is if the chip is implemented as a hardcoded state machine, or if it contains a processor (microcontroller). The transistor count is barely large enough to implement a simple microcontroller such as an 8051, but that wouldn't leave many transistors left over for other necessary circuitry. If a microcontroller were present, it would need software stored somewhere. Given the simplicity of the protocol and the relatively small number of transistors, my guess is that the chip is implemented in hardware (state machines and counters) rather than through a microcontroller.

The diagram below shows how a standard cell implements a 2-input NAND. (This cell is from the Intel 386, not the NFC chip, but the structures are similar.) The cell contains four transistors. The yellow region is the P-type silicon that forms two PMOS transistors; the transistor gates are where the polysilicon (red) crosses the yellow region. (The middle yellow region is the drain for both transistors; there is no discrete boundary between the transistors.) Likewise, the two NMOS transistors are at the bottom, where the polysilicon (red) crosses the active silicon (green). The blue lines indicate the metal wiring for the cell. The black circles are contacts, connections between the metal and the silicon or polysilicon. Finally, the well taps are the opposite type of silicon, connected to the underlying silicon well or substrate to keep it at the proper voltage.

A standard cell for NAND in the Intel 386.

EEPROM

The chip stores its data in an EEPROM, similar to flash memory. The chip provides 640 or 1312 bits of EEPROM, based on the part number; I believe both versions use the same EEPROM implementation, but the cheaper version limits the amount that can be used. I think the EEPROM is the matrix shown below, with row and column drive circuitry to the right and below. (The diagonal lines are accidental scratches while I was processing the chip.)

A closeup of the presumed EEPROM circuitry on the die.

In the photo, the EEPROM appears to be a 64×64 grid, 4K bits of storage rather than the advertised 1312 bits. There are several possible explanations. First, I could be miscounting the capacity (it is easy to be off by a factor of 2, depending on the cell structure). Second, the chip stores data that isn't reflected in the EEPROM memory map; for instance, the one-way counters and the UID signature are not included in the EEPROM storage count. Another possibility is that the extra EEPROM space holds code for a microcontroller (if the chip has one).

An EEPROM requires a relatively high voltage (10-20V) to force electrons into the storage cell for a bit. This voltage is generated by a charge pump circuit that switches capacitors at high frequency to boost the voltage. To the right of the EEPROM is a circuit with several large capacitors, presumably the charge pump.

Conclusions

It's remarkable that these NFC chips can be manufactured so cheaply that they are disposable. To keep the price down, the chips are sold by the wafer and then mounted in the tickets.11 You can buy an eight-inch silicon wafer with the chips for $9000 from Digikey. This may seem expensive until you realize that a single wafer provides an astonishing 100,587 chips, yielding a per-chip price of nine cents. According to the datasheet, a wafer has 103,682 potential good dies per wafer (PGDW). Some dies will be faulty, of course, so the wafer comes with a file telling you which dies are the good ones, 97% of them. (During the manufacturing of a typical chip, the faulty ones are marked with a spot of ink. But that won't work in this case since each die is much smaller than an ink spot.) If you need more chips, you can buy a 12" wafer for $19,000, providing 215,712 chips. A ticket manufacturer mounts each chip on an antenna sheet and then prints the ticket, adding a few cents to the cost of the ticket. The result is an inexpensive ticket that can be used once and discarded.

I'll leave you with one last die photo. In my first attempt at processing the chip, I treated it with Armour Etch. Although this failed to remove the passivation layer, it thinned it slightly, enough to generate some wild colors due to thin-film interference. I call this the "tie die".

The die after treatment with Armour Etch.

Follow me on Twitter @kenshirriff or RSS for more. I'm also on Mastodon as oldbytes.space@kenshirriff. If you're interested in this type of chip, a few years ago, I looked at two RFID race timing chips, the Monza R4 and Monza R6.

Notes and references

Because the card and the reader are positioned close together, the two antennas use "inductive coupling", coupled by magnetic fields rather than radio waves. That is, the two antennas act like transformer windings, transmitting the signal from the reader to the card. ↩
The Montreal subway uses multiple types of cards. In this blog post, I examine the Occasional card (L'Occasionnelle). This is a non-rechargeable card that works for a single trip or up to three days, and then is discarded. For long-term usage, Montreal uses the Opus card, which provides more security and implements the Calypso standard. An Opus card is plastic rather than paper, giving it a longer life. The Calypso standard is much more secure, using cryptography such as AES, DES, and ECC (spec) and provides much larger EEPROM storage. Thus, the transit system uses the Occasional card for cheap, disposable tickets and the Opus card for a long-term ticket, where spending a dollar or two on the physical card isn't an issue.

I haven't examined an Opus card, so I don't know what type of chip it uses or even who manufactures the chip. Many companies produce Calypso cards, for instance, the STMicroelectronics CD21 Calypso chip is based on an Arm core. ↩
If you look closely at the lower right corner of the NFC card, it has three positions that can hold a chip, with the chip in position #3. Presumably, this allows three different NFC chips to be mounted in one card, so one card could have three functions. The NFC protocol is designed to avoid collisions if multiple chips respond, so the three chips won't interfere with each other. ↩
You can easily examine NFC cards like this using your phone, with an app such as NFC Tools or NXP's Taginfo. Tapping a card will display the type of the card and allow the memory to be read (subject to security restrictions). It's entertaining to tap various NFC cards and see what type of chip they use; I found that hotels typically use the MIFARE Classic chip, more advanced than the MIFARE Ultralight chip in the subway ticket.

The NFC Tools app shows that this card is a MIFARE Ultralight EV1.

↩
The part number, as provided by the chip, is MF0UL1101DUx. "MF0UL" indicates the MIFARE Ultralight EV1, a chip in the Ultralight family manufactured by NXP. An "H" if present indicates 50 pF input capacitance, rather than 17 pF in the chip I examined, allowing a different antenna. Next, "1" indicates a chip with 384 bits of user memory, while "2" would indicate 1024 bits. This is followed by "101D", and then a code indicating the specific package: "U" indicates a wafer, while "A" indicates a plastic leadless module carrier (LCC). Other characters specify the wafer diameter and thickness. ↩
It is instructive to think about the security of a printed ticket for a concert with a barcode. You could print out a hundred copies of the ticket, but it will only get you into the concert once. (This assumes that the venue has a centralized database so they can keep track of which tickets have been scanned.) Most of the security is implemented in the backend system, not the ticket itself. The ticket numbers need to be unforgeable, either by generating random numbers or using cryptography. (If the tickets just have QR codes with the numbers 1 to 100, for instance, it would be trivial to make fake tickets.) Moreover, there is nothing to ensure that the person scanning the ticket is legitimate; someone malicious could scan your ticket in line, print out a copy, and get into the concert instead of you. The MIFARE Ultralight chip is similar to a paper ticket in many ways with only slightly more security. ↩
The UID signing is done with an ECC (elliptic-curve cryptography) algorithm. Note that the chip doesn't need any cryptographic support for this; the chip just holds the signature that was programmed during manufacturing. As far as the chip is concerned, it is just providing some stored bytes. ↩
The MIFARE Ultralight has enough security to work as a limited-use ticket, but more advanced applications such as reloadable stored-value cards require a chip that supports encryption such as the DESFire. This allows the market to be partitioned, with the inexpensive Ultralight supporting the low-end market, while the more costly DESFire is required for more advanced applications.

There are many types of MIFARE cards and it's hard to keep them straight, but the diagram below from NXP may help. The different families are arranged left to right: Ultralight, Classic, Plus, DESFire, and SmartMX. The Y dimension indicates the official security certification level. The Z dimension (front to back) shows the evolution within a family over time. I've added a red arrow to indicate the "Ultralight EV1" chip, the focus of this blog post. (Personally, if you need a three-dimensional diagram to explain your product line, the product line may be excessively complicated.)

The various MIFARE NFC types. Diagram from aMIFARE Plus Product Family.

↩
In more detail, a 3-byte counter can be incremented by a specified value until it reaches the all-1's state (0xFFFFFF), at which point it stops. If you wanted to allow, say, 5 uses of a ticket, you could initialize the counter to all-1's minus 5. Then the counter could be incremented 5 times before reaching the limit.

One complication is that the counters have an "anti-tearing" feature for additional security. The problem is that if you tear the card away from the reader in the middle of an update, there is a possibility for counters to be partially updated, yielding a bad result. The anti-tearing feature ensures that a counter will be atomically updated, avoiding a partial update. ↩
There are multiple NFC standards with differences in speed, protocol, and range, including NFC-A, NFC-B, NFC-C, NFC-F, and NFC-V. The MIFARE Ultralight cards use NFC-A, which is defined by the standard "ISO/IEC 14443 Type A". Annoyingly, each part of the standard costs $70. The NFC Forum Analog Technical Specification provides a lot of detail, though. ↩
Instead of a wafer, you can buy the chips on tape but it costs more than twice as much. ↩

Inside a vintage aerospace navigation computer of uncertain purpose

I recently obtained an aerospace computer from the early 1970s, apparently part of a navigation system. Aerospace computers are an interesting but mostly neglected area of computer hardware, so I'm always delighted to examine one up close. In an era when most computers were large mainframes, aerospace computers packed dense electronics into a small package, using technologies such as surface-mounted components and multi-layer printed circuit boards, technologies that wouldn't reach the mainstream for another decade. This blog post examines the circuitry and components inside this computer, including an unusual electromechanical display. Although I was unable to determine who manufactured this system or even its exact function, this system illustrates how hundreds of integrated circuits and a core memory stack can be crammed into a compact package.

The navigation computer, showing the front panel with the display and keyboard, with the electronics unit behind it. Click this image (or any other) for a larger version.

The keyboard

The device has a simple numeric keyboard with a few unexpected features. The numeric keypad can also be used for direction entry, as four of the keys have N, S, E, and W on them. The keys are large, roughly the size of the Apollo spacecraft's DSKY buttons. My theory is that these buttons are designed for operation with gloves, perhaps in a fighter plane where the pilot wears a pressure suit. The buttons are hinged at the top, so they don't push straight in, but pivot when pressed.

Numeric keypads typically use one of two layouts: a telephone-style keypad has the digits 123 at the top, while a calculator-style keypad has the digits 789 at the top. Interestingly, this device uses a calculator layout, while most aviation devices have a telephone layout. The Apollo DSKY also used a calculator layout, which could be a hint at a NASA connection for this device.

Above the keyboard are four codes for self-test: N4576, E9384, S9021, and W4830. Entering these codes on the keyboard presumably triggered the appropriate test of the system when the switch is in test mode.

The display

The computer's display is simple, showing a latitude and longitude. Each value has one decimal position, providing 0.1° of accuracy. The latitude and longitude are prefixed with a compass direction: North/South for latitude and East/West for longitude.

The front panel of the navigation computer, with a display and keyboard.

The display is constructed from an unusual type of electromechanical indicator, with an indicator module for each digit. Each digit position has a rotating wheel with 11 positions (ten digits and a blank). When the indicator module for a position is energized, the wheel spins to the specified position, showing the selected digit. The two leftmost indicators are slightly different as they show a compass direction instead of a digit: N, S, E, or W. Moreover, the direction indicators can also show the compass direction with a diagonal slash through it, as seen above. Perhaps the slashed direction indicates a problem with the value.

The diagram below shows how a digit indicator operates. Each digit position has an electromagnet with a wire to energize it. The dial wheel has an attached permanent magnet (indicated by N and S). Energizing one of the electromagnets causes the dial to spin to that position, aligning the permanent magnet on the dial with the electromagnet. This mechanism forms a reliable indicator with just one moving part. The displayed digit is clearer than a seven-segment display since the digit uses a real font rather than being created from segments.

A diagram illustrating the magnetic indicator construction. From Patent 3201785. The patent describes a different indicator but the construction is similar.

Looking at the back of the keyboard/display unit shows the wiring of the display indicators. Each indicator has a common connection and ten wires to energize one of the electromagnets.1 The electromagnets are connected in a matrix, with all the "1" wires connected, the "2" wires connected, and so forth. To rotate an indicator to a particular digit, a common wire and an electromagnet wire are energized. For instance, powering the common wire of the second indicator and the "5" electromagnetic wire causes the second indicator to rotate to the "5" position. The wiring has a three-dimensional structure with ten bare wires running between the boards, one for each digit value. A yellow wire hangs off each bare wire, linking it to the connector on the left. Each indicator has ten diodes on a circuit board to block "sneak" paths that would energize unselected electromagnets.

The back of the keyboard/display unit. The keyboard buttons are at the back of this photo, while the display modules are at the front.

This matrix circuit reduces the amount of wiring required: although there are 100 electromagnets in total, just 20 wires are sufficient to control them. The driver circuitry, however, is a bit more complex as it must scan through the ten digit positions, activating the right pair of driver wires at the right time. Some of the logic circuitry described below must implement this scanning, as well as the driver circuitry to energize the indicators.

The display and keyboard have many similarities to the Delco Carousel Inertial Navigation System (INS) shown below. (The Delco Carousel was used in many military and civilian aircraft, from the C-141 cargo plane to the Boeing 747 passenger plane.) Both devices have two digital displays, one for latitude North/South and one for longitude East/West. Also note the numeric keypads with four keys assigned to the four compass directions. The controls of the Carousel INS system are considerably more complicated, though. The Carousel has a knob position "TK/GS" (track/ground speed), which may correspond to the "T/G" position on my device.

Control unit for the Delco Carousel inertial navigation system. From Smithsonian collection, gift of Delphi Electronics & Safety.

Note that the display on my unit has just four digits of accuracy, with one digit after the decimal point. A tenth of a degree would provide an accuracy of about ±7 miles, which is low for a navigation device. In comparison, the Delco Carousel has six digits of accuracy (± 100 feet perhaps). This suggests that the device does not provide INS navigation, but some other guidance with lower accuracy.

Packaging the electronics

The unit contains 14 circuit boards, crammed with TTL integrated circuits, along with a core memory stack. The photo below shows how circuit boards surround the core memory stack. The mechanical design of the unit is advanced, allowing the boards to be opened up like a book. This provides compact packaging while allowing access to the boards.

The electronics unit can be disassembled and folds open like a book.

The circuit boards are four-layer printed circuit boards, more advanced than the common two-layer boards of the time. The boards use a mixture of surface-mounted and through-hole components. The flat-pack ICs and the tiny round transistors are surface mounted, which was rare at the time. On the other hand, the resistors, capacitors, diodes, and larger transistors use standard through-hole components. At the time, most electronics used through-hole components, although aerospace systems often used surface-mounted components for higher density. It wasn't until the late 1980s that surface-mount technology became commonplace.

The boards are mounted in solid metal frames, providing both structural integrity and heat conduction for cooling. Most of the frames hold two boards, mounted back-to-back for higher density.

The logic boards

Four of the circuit boards are logic boards, packed with flat-pack integrated circuits. The board below holds 55 integrated circuits, showing the high density that is possible with flat packs.

A board filled with flat-pack logic ICs.

The logic ICs are Signetics 400-series chips, an early type of TTL (Transistor-Transistor Logic) chip. Just three types of these ICs are used: SE440J "Dual exclusive OR" (really AND-OR-INVERT but XOR if provided with particular inputs), SE455J "Dual 4-input buffer/driver" (4-input NAND or NOR gates depending on polarity), and SE480J "Quad 2-input NAND/NOR". These integrated circuits cost $15.45 each in 1966 (about $150 each in current dollars).2

The schematic below shows the circuit that implements AND-OR-INVERT (or exclusive or) in the SE440J. The multiple-emitter transistors on the inputs may appear unusual, but this is the standard way to implement TTL gates. It is important to note that this chip only contains 12 transistors, so the density is low. (Since the chip contains two of these gates, this circuit is duplicated.) In the mid-1960s, integrated circuits only contained a few transistors—the Apollo Guidance Computer's ICs had just 6 transistors—but by the time this unit was built in the early 1970s, some chips had thousands of transistors, tracking Moore's Law. Thus, this unit both illustrates how aviation computers could be built from simple integrated circuits and how the dramatic improvements in IC technology rapidly obsoleted these computers.

Schematic of the SE440J integrated circuit. From datasheet.

The Signetics 400-series seems to have been obscure and short-lived, probably killed off by the wild success of 7400-series TTL chips. I was able to find only a few announcements and datasheets for these chips. The only users of these chips that I could find were NASA projects from the late 1960s.3 Signetics 400-series chips were used in the Mariner Mars and Venus probes, in the Data Automation Subsystem (DAS) (link, link). The Voyager Mars probes also used them. The SE455J gates were also used to interface the Apollo Guidance Computer to a core-rope simulator. JPL used the SE455J in a core memory system. NASA used the SE455J, SE480J, and other Signetics chips in its design for the MICROMIN computer. None of these systems appear to be related to the navigation system, but they illustrate that NASA was using these specific Signetics chips at the time in multiple designs.

The chips are labeled "CDC", raising the possibility that these chips were built by Control Data Corporation (CDC) under license from Signetics. The Aerospace Division of CDC was active at the time, building various compact computer systems. For instance, the CDC 480 computer (1976) was a 16-bit computer based on the Am2900 bit-slice chip. Also known as the AN/AYK-14, this system was used on numerous aircraft including the F-18. An earlier CDC aerospace computer is the AN/AWG-9 Airborne Missile Control System (1965), a 24-bit computer in a compact 1.1 cubic foot package. Used on the F-14 fighter plane, this computer guided the Phoenix air-to-air missile. Based on CDC's activity in aerospace computers at the time, the mystery computer could be a CDC system, although this hypothesis is based solely on integrated circuits labeled "CDC".

The CDC AN/AYK-14 computer with circuit boards. This is an example of an aerospace computer built by CDC slightly later than the mystery computer. From a 1983 brochure.

The photo below shows another logic board. This one has numerous red and white wires attached, linking it to the rest of the system. Curiously, this board has a single transistor, with two associated resistors, in the middle of the board.

Another logic board, with a similar grid of flat-pack integrated circuits.

Analog boards

The computer contains not only logic boards but also boards full of analog circuitry to interface with the core memory, keyboard, and display. The board below contains 17 of the logic ICs seen earlier. However, it also uses many resistors, capacitors (red cylinders), transistors (white circles), inductors (white banded cylinders), and glass diodes. The board also has some analog integrated circuits. In particular, it has three TI SN52709 op-amps, the smaller 10-pin packages. The board also contains some integrated circuits that I couldn't identify: UT1000, UT1027, UD4001, and D245F. The SM 60 ICs in white packages have a logo that I don't recognize. The op-amps could function as sense amplifiers for the core memory, or this board could provide other analog interfacing.

A board with some analog integrated circuits.

The board has multiple gray four-pin packages labeled "926D". Based on the + and - markings, these packages are probably bridge rectifiers, maybe providing power for the circuits. Many of the other boards have these rectifiers. The analog boards also contain a few Halex flat-pack devices labeled "HALEX 101205 727". Hanlex manufactured thin-film resistors in flat packs, so these are probably resistor networks. NASA used Halex resistor networks in some devices (link).4

The analog board shown below sits next to the core memory stack. It uses a different set of flat-pack components: Signetics C8930G and PL 98321. Unfortunately, I could not identify these ICs. This board, unlike the previous boards, has a copper ground plane in the second layer of the circuit board; this layer is visible in the photo as the copper-colored background occupying most of the board.

Another analog board in the aviation computer.

Core memory

The unit is built around a core memory stack, as was common in the era before semiconductor memory took over. Magnetic core memory consists of a grid of tiny ferrite cores with wires threaded through them, forming a core plane. Typically, a core memory unit consists of multiple planes, one for each bit in the word, stacked to form a three-dimensional block of memory.

The photo below shows a closeup of the stack. It appears to have 20 planes, suggesting a 20-bit processor. Soldered wires connect the planes together to provide continuous wiring through the stack. The soldering on these wires looks somewhat haphazard, suggesting that this was not a production unit.

A closeup of the core memory stack. Brightly colored wires connect the module to the rest of the system. Small wires connect the layers together.

The photo below shows the other side of the core memory stack, with similar wiring between the planes. At the right are a few layers of a different type, connected with 26 wires. The tape measure shows that the core memory stack is compact, about 6 cm on a side (2¼").

Measurement of the core memory stack.

Some of the boards are drivers for the core memory stack. The board below has 48 small round transistors, colored either blue or red. Note the green, white, and yellow wires in the lower right, mostly hidden under the brown ground ribbon. These wires are connected to the core memory stack.

A circuit board with many small transistors.

The board below also has numerous wires to the core stack, underneath the brown ground ribbon, so it is presumably another driver board. This board has some round driver transistors with yellow dots. Curiously, in the upper left there are a few circuit board pads where transistors could be mounted but are missing. Perhaps with the additional components the board would support a system with more of something: a larger keyboard? more memory?

A board with driver transistors.

Looking at the back of the unit, you can see the display indicator wiring at the top and a circuit board at the bottom. This board contains 20 transistors in metal cans, specifically Motorola 2N3736 NPN transistors. The core memory stack has 20 planes, matching the 20 transistors on this board, so the board probably implements the core memory "inhibit drivers", controlling the bit written to each plane. The board also has numerous tiny surface-mount transistors in white, red, and black packages. Close examination shows a few thin green "bodge" wires on this board, indicating that rework was performed on the board to fix a circuit problem, another piece of evidence that this unit is a prototype.

A view of the computer from the back, showing the display wiring and a circuit board.

The core memory stack is enclosed by two sheet metal boxes, which I removed for the photos. The stack also has two flexible ground planes attached to it. The designers clearly wanted to ensure that the memory was well shielded, to a degree that I haven't seen in other systems.

Conclusions

Despite my research, this aerospace computer remains a mystery. I was unable to identify who manufactured it or even its exact function. One hypothesis is a NASA connection since NASA was extensively using these Signetics chips at the time. Moreover, this computer was obtained in the Houston area. Another hypothesis, based on the "CDC" label on the chips, is that this computer was built by Control Data's Aerospace Division. If you have any leads on this mysterious aviation computer, please contact me.

This system may have been a prototype. It has no part numbers, manufacturer name, or identifying plate.5 Moreover, the soldering on the core memory stack doesn't seem to be flight quality. Finally, the boards don't have conformal coating, which is typically used for spaceflight systems. However, the mechanical design looks advanced for a prototype, with dense boards that fold together like a book.

This unit clearly has a navigation role, but seems to be too inaccurate for an inertial navigation system (INS). It contains many integrated circuits, but not enough to form a full computer. I hypothesize that this unit contains the circuitry to drive the core memory and the display, and handle keyboard input. Looking at the underside of the unit (below), there are three connectors. I suspect these connectors were plugged into a larger box that held the computer itself.

A view of the underside of the electronics unit with the core memory wrapped in sheet metal.

The date codes on the integrated circuits range from 1966 to 1973, so the computer was probably manufactured in 1973. The seven-year range for date codes is a bit surprising, since integrated circuit technology changed a lot during these years. I suspect that the Signetics 400-series ICs had older date codes because this line didn't catch on so there was a lot of old stock rather than newly-manufactured parts. I also suspect that this system was designed around 1969, based on the multiple NASA systems using these chips then, suggesting that the design and manufacturing of this unit was a multi-year project.

Despite the lingering mysteries of this device, it provides an interesting example of aerospace computers at the beginning of the 1970s. Even though integrated circuits were primitive at the time, with just a few transistors per chip, aerospace computers used these chips and high-density packaging to build computers that were compact, reliable, and low power. These miniature computers controlled aircraft, missiles, and spacecraft, worlds away from the room-filling mainframes that attracted most of the attention.

Thanks to Usagi Electric for providing the aerospace computer. Eric Schlaepfer and Marc Verdiell helped with the analysis. Thanks to Don Straney for his research and comments. Various commenters on Reddit and Twitter provided suggestions. Follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as oldbytes.space@kenshirriff.

Notes and references

The indicators have a blank position, so there are 11 electromagnets. However, only the ten electromagnets associated with digits are used in the device. The N/S/E/W indicators have a square box in one of the positions, which probably is not used. ↩
Signetics had multiple temperature ranges for the 400-series low-power ICs. The RE prefix indicated ultra high reliability aerospace components rated for a temperature range of -55°C to +125°C. The SE prefix on the chips in this unit indicated military airborne chips with the same temperature range. A NE or ST prefix indicated military prototype or industrial chips with a smaller temperature range (0°C to +70°C). A SP prefix indicated the commercial temperature rating, from +15°C to +55°C. A J suffix indicated a flat pack and an A suffix indicated a dual in-line pack (DIP). ↩
NASA computers are the only documented systems that I could find that used these Signetics chips. One possible conclusion is that NASA was the only organization to use these chips. However, it is likely that other companies used these chips but didn't document them as thoroughly as NASA. That is, detailed circuitry for military aerospace computers is unlikely to be on the Internet. ↩
Halex also made hybrid microcircuits, such as flip-flops, so these packages could be more complex than resistor networks. However, I think a resistor network is more likely. ↩
One of the circuit boards had the number "45333000" on it, along with a symbol like "+I-", as shown below.

Closeup of a circuit board showing a number, maybe identifying the board.

One board also had a mysterious symbol that resembles "mw". I couldn't match these symbols to any manufacturers, and it is unclear if they are logos, fiducials, or other symbols.

Closeup of a circuit board showing the "mw" mark.

↩

Talking to memory: Inside the Intel 8088 processor's bus interface state machine

In 1979, Intel introduced the 8088 microprocessor, a variant of the 16-bit 8086 processor. IBM's decision to use the 8088 processor in the IBM PC (1981) was a critical point in computer history, leading to the success of the x86 architecture. The designers of the IBM PC selected the 8088 for multiple reasons, but a key factor was that the 8088 processor's 8-bit bus was similar to the bus of the 8085 processor.1 The designers were familiar with the 8085 since they had selected it for the IBM System/23 Datamaster, a now-forgotten desktop computer, making the more-powerful 8088 processor an easy choice for the IBM PC.

The 8088 processor communicates over the bus with memory and I/O devices through a highly-structured sequence of steps called "T-states." A typical 8088 bus cycle consists of four T-states, with one T-state per clock cycle. Although a four-step bus cycle may sound straightforward, its implementation uses a complicated state machine making it one of the most difficult parts of the 8088 to explain. First, the 8088 has many special cases that complicate the bus cycle. Moreover, the bus cycle is really six steps, with two undocumented "extra" steps to make bus operations more efficient. Finally, the complexity of the bus cycle is largely arbitrary, a consequence of Intel's attempts to make the 8088's bus backward-compatible with the earlier 8080 and 8085 processors. However, investigating the bus cycle circuitry in detail provides insight into the timing of the processor's instructions. In addition, this circuitry illustrates the tradeoffs and implementation decisions that are necessary in a production processor. In this blog post, I look in detail at the circuitry that implements this state machine.

By examining the die of the 8088 microprocessor, I could reverse engineer the bus circuitry. The die photo below shows the 8088 microprocessor's silicon die under a microscope. Most visible in the photo is the metal layer on top of the chip, with the silicon and polysilicon mostly hidden underneath. Around the edges of the die, bond wires connect pads to the chip's 40 external pins. Architecturally, the chip is partitioned into a Bus Interface Unit (BIU) at the top and an Execution Unit (EU) below, with the two units running largely independently. The BIU handles bus communication (memory and I/O accesses), while the Execution Unit (EU) executes instructions. In the diagram, I've labeled the processor's key functional blocks. This article focuses on the bus state machine, highlighted in red, but other parts of the Bus Interface Unit will also play a role.

The 8088 die under a microscope, with main functional blocks labeled. This photo shows the chip's single metal layer; the polysilicon and silicon are underneath. Click on this image (or any other) for a larger version.

Although I'm focusing on the 8088 processor in this blog post, the 8086 is mostly the same. The 8086 and 8088 processors present the same 16-bit architecture to the programmer. The key difference is that the 8088 has an 8-bit data bus for communication with memory and I/O, rather than the 16-bit bus of the 8086. For the most part, the 8086 and 8088 are very similar internally, apart from trivial but numerous layout changes on the die. In this article, I'm focusing on the 8088 processor, but most of the description applies to the 8086 as well. Instead of constantly saying "8086/8088", I'll refer to the 8088 and try to point out places where the 8086 is different.

The bus cycle

In this section, I'll describe the basic four-step bus cycles that the 8088 performs.2 To start, the diagram below shows the states for a write cycle (slightly simplified3), when the 8088 writes to memory or an I/O device. The external bus activity is organized as four "T-states", each one clock cycle long and called T1, T2, T3, and T4, with specific actions during each state. During T1, the 8088 outputs the address on the pins. During the T2, T3, and T4 states, the 8088 outputs the data word on the same pins. The external memory or I/O device uses the T states to know when it is receiving address information or data over the bus lines.

A typical write bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

For a read, the bus cycle is slightly different from the write cycle, but uses the same four T-states. During T1, the address is provided on the pins, the same as for a write. After that, however, the processor's data pins are "tri-stated" so they float electrically, allowing the external memory to put data on the bus. The processor reads the data at the end of the T3 state.

A typical read bus cycle consists of four T states. Based on The 8086 Family Users Manual, B-16.

The purpose of the bus state machine is to move through these four T states for a read or a write. This process may seem straightforward, but (as is usually the case with the 8088) many complications make this process anything but easy. In the next sections, I'll discuss these complications. After that, I'll explain the state machine circuitry with a schematic.

Address calculation

One of the notable (if not hated) features of the 8088 processor is segmentation: the processor supports 1 megabyte of memory, but memory is partitioned into segments of 64 KB for compatibility with the earlier 8080 and 8085 processors. The 8088 calculates each 20-bit memory address by adding the value of a segment register to a 16-bit offset. This calculation is done by a dedicated address adder in the Bus Interface Unit, completely separate from the chip's ALU. (This address adder can be spotted in the upper left of the earlier die photo.)

Calculating the memory address complicates the bus cycle. As the timing diagrams above show, the processor issues the memory address during state T1 of the bus cycle. However, it takes time to perform the address calculation addition, so the address calculation must take place before T1. To accomplish this, there are two "invisible" bus states before T1; I call these states "TS" (T-start) and "T0". During these states, the Bus Interface Unit uses the address adder to compute the address, so the address will be available during the T1 state. These states are invisible to the external circuitry because they don't affect the signals from the chip.

Thus, a single memory operation takes six clock cycles: two preparatory cycles to compute the address before the four visible cycles. However, if multiple memory operations are performed, the operations are overlapped to achieve a degree of pipelining that improves performance. Specifically, the address calculation for the next memory operation takes place during the last two clock cycles of the current memory operation, saving two clock cycles. That is, for consecutive bus cycles, T3 and T4 of one bus cycle overlap with TS and T0 of the next cycle. In other words, during T3 and T4 of one bus cycle, the memory address gets computed for the next bus cycle. This pipelining significantly improves the performance of the 8088, compared to taking 6 clock cycles for each bus cycle.

With this timing, the address adder is free during cycles T1 and T2. To improve performance in another way, the 8088 uses the adder during this idle time to increment or decrement memory addresses. For instance, after popping a word from the stack, the stack pointer needs to be incremented by 2.5 Another case is block move operations (string operations), which need to increment or decrement the pointers each step. By using the address adder, the new pointer value is calculated "for free" as part of the memory cycle, without using the processors regular ALU.4

Address corrections

The address adder is used in one more context: correcting the Instruction Pointer value. Conceptually, the Instruction Pointer (or Program Counter) register points to the next instruction to execute. However, since the 8088 prefetches instructions, the Instruction Pointer indicates the next instruction to be fetched. Thus, the Instruction Pointer typically runs ahead of the "real" value. For the most part, this doesn't matter. This discrepancy becomes an issue, though, for a subroutine call, which needs to push the return address. It is also an issue for a relative branch, which jumps to an address relative to the current execution position.

To support instructions that need the next instruction address, the 8088 implements a micro-instruction CORR, which corrects the Instruction Pointer. This micro-instruction subtracts the length of the prefetch queue from the Instruction Pointer to determine the "real" Instruction Pointer. This subtraction is performed by the address adder, using correction constants that are stored in a small Constant ROM.

The tricky part is ensuring that using the address adder for correction doesn't conflict with other uses of the adder. The solution is to run a special shortened memory cycle—just the TS and T0 states—while the CORR micro-instruction is performed.6 These states block a regular memory cycle from starting, preventing a conflict over the address adder.

A closeup of the address adder circuitry in the 8086. From my article on the adder.

Prefetching

The 8088 prefetches instructions before they are needed, loading instructions from memory into a 4-byte prefetch queue. Prefetching usually improves performance, but can result in an instruction's memory access being delayed by a prefetch, hurting overall performance. To minimize this delay, a bus request from an instruction will preempt a prefetch, even if the prefetch has gone through TS and T0. At that point, the prefetch hasn't created any bus activity yet (which first happens in T1), so preempting the prefetch can be done cleanly. To preempt the prefetch, the bus cycle state machine jumps back to TS, skipping over T1 through T4, and starting the desired access.

A prefetch will also be preempted by the micro-instruction that stops prefetching (SUSP) or the micro-instruction that corrects addresses (CORR). In these cases, there is no point in completing the prefetch, so the state machine cycle will end with T0.

Wait states

One problem with memory accesses is that the memory may be slower than the system's clock speed, a characteristic of less-expensive memory chips. The solution in the 1970s was "wait states". If the memory couldn't respond fast enough, it would tell the processor to add idle clock cycles called wait states, until the memory could respond.7 To produce a wait state, the memory (or I/O device) lowers the processor's READY pin until it is ready to proceed. During this time, the Bus Interface Unit waits, although the Execution Unit continues operation if possible. Although Intel's documentation gives the wait cycle a separate name (Tw), internally the wait is implemented by repeating the T3 state as long as the READY pin is not active.

Halts

Another complication is that the 8088 has a HALT instruction that halts program execution until an interrupt comes in. One consequence is that HALT stops bus operations (specifically prefetching, since stopping execution will automatically stop instruction-driven bus operations). A complication is that the 8088 indicates the HALT state to external devices by performing a special T1 bus cycle without any following bus cycles. But wait: there's another complication. External devices can take control of the bus through the HOLD functionality, allowing external devices to perform operations such as DMA (Direct Memory Access). When the device ends the HOLD, the 8088 performs another special T1 bus cycle, indicating that the HALT is still in effect. Thus, the bus state machine must generate these special T1 states based on HALT and HOLD actions. (I discussed the HALT process in detail here.)

Putting it all together: the state diagram

The state diagram below summarizes the different types of bus cycles. Each circle indicates a specific T-state, and the arrows indicate the transitions between states. The green line shows the basic bus cycle or cycles, starting in TS and then going around the cycle. From T3, a new cycle can start with T0 or the cycle will end with T4. Thus, new cycles can start every four clocks, but a full cycle takes six states (counting the "invisible" TS and T0). The brown line shows that the bus cycle will stay in T3 as long as there is a wait state. The red line shows the two cycles for a CORR correction, while the purple line shows the special T1 state for a HALT instruction. The cyan line shows that a prefetch cycle can be preempted after T0; the cycle will either restart at TS or end.

A state diagram showing the basic bus cycle and various complications.

I'm showing states TS and T3 together since they overlap but aren't the same. Likewise, I'm showing T4 and T0 together. T4 is grayed out because it doesn't exist from the state machine's perspective; the circuitry doesn't take any particular action during T4.

The schematic below shows the implementation of the state machine. The four flip-flops represent the four states, with one flip-flop active at a time, generating states T0, T1, T2, and T3 (from top to bottom). Each output feeds into the logic for the next state, with T3 wrapping back to the top, so the circuit moves through the states in sequence. The flip-flops are clocked so the active state will move from one flip-flop to the next according to the system clock. State TS doesn't have its own flip-flop, but is represented by the input to the T0 flip-flop, so it happens one clock cycle earlier.8 State T4 doesn't have a flip-flop since it isn't "real" to the bus state machine. The logic gates handle the special cases: blocking the state transfer if necessary or starting a state.

Schematic of the state machine.

I'll explain the logic for each state in more detail. The circuitry for the TS state has two AND gates to generate new bus cycles starting from TS. The first one (a) causes TS to happen with T3 if there is a pending bus request (and no HOLD). The second AND gate (b) starts a bus cycle if the bus is not currently active and there is a bus request or a CORR micro-instruction. The flip-flop causes T0 to follow T3/TS, one clock cycle later.

The next gates (c) generate the T1 state following T0 if there is pending bus activity and the cycle isn't preempted to T3. The AND gate (d) starts the special T1 for the HALT instruction.9 The T2 state follows T1 unless T1 was generated by a HALT (e).

The T3 logic is more complicated. First, T3 will always follow T2 (f). Next, a wait state will cause T3 to remain in T3 (g). Finally, for a preempt, T3 will follow T0 (h) if there is a prefetch and a microcode bus operation (i.e. an instruction specified the bus operation).

Next, I'll explain BUS-ACTIVE, an important signal that indicates if the bus is active or not. The Bus Interface Unit generates the BUS-ACTIVE signal to help control the state machine. The BUS-ACTIVE signal is also widely used in the Bus Interface Unit, controlling many functions such as transfers to and from the address registers. BUS-ACTIVE is generated by the complex circuit below that determines if the bus will be active, specifically in states T0 through T3. Because of the flip-flop, the computation of BUS-ACTIVE happens in the previous clock cycle.

The circuit to determine if the bus will be active next cycle.

In more detail, the signal BUS-ACTIVE-PRE indicates if the bus cycle will continue or will end on the next clock cycle. Delaying this signal through the flip-flop generates BUS-ACTIVE, which indicates if the bus is currently active in states T0 through T3. The top AND gate (a) is responsible for starting a cycle or keeping a cycle going (a1). It will allow a new cycle if there is a bus request (without HOLD) (a3). It will also allow a new cycle if there is a CORR micro-instruction prior to the T1 state (even if there is a HOLD, since this "fake" cycle won't use the bus) (a2). Finally, it allows a new cycle for a HALT, using T1-pre (a2).10 Next are the special cases that end a bus cycle. The second AND gate (b) ends the bus cycle after T3 unless there is a wait state or another bus request. (But a HOLD will block the next bus request.) The remaining gates end the cycle after T0 to preempt a prefetch if a CORR or SUSP micro-instruction occurs (d), or end after T1 for a HALT (e).

The BUS-ACTIVE circuit above uses a complex gate, a 5-input NOR gate fed by 5 AND gates with two attached OR gates. Surprisingly, this is implemented in the processor as a single gate with 14 inputs. Due to how gates are implemented with NMOS transistors, it is straightforward to implement this as a single gate. The inverter and NOR gate on the left, however, needed to be implemented separately, as they involve inversion; an NMOS gate must have a single inversion.

The bus state machine circuitry on the die.

The diagram above shows the layout of the bus state machine circuitry on the die, zooming in on the top region of the die. The metal layer has been removed to expose the underlying silicon and polysilicon. The layout of each flip-flop is completely different, since the layout of each transistor is optimized to its surroundings. (This is in contrast to later processors such as the 386, which used standard-cell layout.) Even though the state machine consists of just a handful of flip-flops and gates, it takes a noticeable area on the die due to the large 3.2 µm feature size of the 8088. (Modern processors have features measured in nanometers, not micrometers.)

Conclusions

The bus state machine is an example of how the 8088's design consists of complications on top of complications. While the four-state bus cycle seems straightforward at first, it gets more complicated due to prefetching, wait states, the HALT instruction, and the bus hold feature, not to mention the interactions between these features. While there were good motivations behind these features, they made the processor considerably more complicated. Looking at the internals of the 8088 gives me a better understanding of why simple RISC processors became popular.

The bus state machine is a key part of the read and write circuitry, moving the bus operation through the necessary T-states. However, the state machine is not the only component in this process; a higher-level circuit decides when to perform a read, write, or prefetch, as well as breaking a 16-bit operation into two 8-bit operations.11 These circuits work together with the higher-level circuit telling the state machine when to go through the states.

In my next blog post, I'll describe the higher-level memory circuit so follow me on Twitter @kenshirriff or RSS for updates. I'm also on Mastodon as oldbytes.space@kenshirriff. If you're interested in the 8086, I wrote about the 8086 die, its die shrink process, and the 8086 registers earlier.

Notes and references

The 8085 and 8088 processors both use a 4-step bus cycle for instruction fetching. For other reads and writes, the 8085's bus cycle has three steps compared to four for the 8088. Thus, the 8085 and 8088 bus cycles are similar but not an exact match. ↩
The 8088 has separate instructions to read or write an I/O device. From the bus perspective, there's no difference between an I/O operation and a memory operation except that a pin on the chip indicates if the operation is for memory or I/O.

The 8088 supports I/O operations for historical reasons, going back through the 8086, 8080, 8008, and the Datapoint 2200 system. In contrast, many other contemporary processors such as the 6502 used memory-mapped I/O, using standard memory accesses for I/O devices.

The 8086 has a pin M/IO that is high for a memory access and low for an I/O access. External hardware uses this pin to determine how to handle the request. Confusingly, the pin's function is inverted on the 8088, providing IO/M. One motivation behind the 8088's 8-bit bus was to allow reuse of peripherals from the earlier 8-bit 8085 processor. Thus, the pin's function was inverted so it matched the 8085. (The pin is only available when the 8086/8088 is used in "minimum mode"; "maximum mode" remaps some of the pins, making the system more complicated but providing more control.) ↩
I've made the timing diagram somewhat idealized so actions line up with the clock. In the real datasheet, all the signals are skewed by various amounts so the timing is more complicated. See the datasheet for pages of timing constraints on exactly when signals can change. ↩
For more information on the implementation of the address adder, see my previous blog post. ↩
The POP operation is an example of how the address adder updates a memory pointer. In this case, the stack address is moved from the Stack Pointer to the IND register in order to perform the memory read. As part of the read operation, the IND register is incremented by 2. The address is then moved from the IND register to the Stack Pointer. Thus, the address adder not only performs the segment arithmetic, but also computes the new value for the SP register.

Note that the increment/decrement of the IND register happens after the memory operation. For stack operations, the SP must be decremented before a PUSH and incremented after a POP. The adder cannot perform a predecrement, so the PUSH instruction uses the ALU (Arithmetic/Logic Unit) to perform the decrement. ↩
During the CORR micro-instruction, the Bus Interface Unit performs special TS and T0 states. Note that these states don't have any external effect, so they are invisible outside the processor. ↩
The tradeoff with memory boards was that slower RAM chips were cheaper. The better RAM boards advertised "no wait states", but cheaper boards would add one or more wait states to every access, reducing performance. ↩
Only the second half of the TS state has an effect on the Bus Interface Unit, so TS is not a full state like the other states. Specifically, a delayed TS signal is taken from the first half of the T0 flip-flop, and this signal is used to control various actions in the Bus Interface Unit. (Alternatively, you could think of this as an early T0 state.) This is why there isn't a separate flip-flop for the TS state. I suspect this is due to timing issues; by the time the TS state is generated by the logic, there isn't enough time to do anything with the state in that half clock cycle, due to propagation delays. ↩
There is a bit more circuitry for the T1 state for a HALT. Specifically, there is a flip-flop that is set on this signal. On the next cycle, this flip-flop both blocks the generation of another T1 state and blocks the previous T1 state from progressing to T2. In other words, this flip-flop makes sure the special T1 lasts for one cycle. However, a HOLD state resets this flip-flop. That allows another special T1 to be generated when the HOLD ends. ↩
The trickiest part of this circuit is using T1-pre to start a (short) cycle for HALT. The way it works is that the T1-pre signal only makes a difference if there isn't a bus cycle already active. The only way to get an "unexpected" T1-pre signal is if the state machine generates it for the first cycle of a HALT. Thus, the HALT triggers T1-pre and thus the bus-active signal. You might wonder why the bus-active uses this roundabout technique rather than getting triggered directly by HALT. The motivation is that the special T1 state for HALT requires the AND of three signals to ensure that the state is generated once for the HALT rather than continuously, but happens again after a HOLD, and waits until the current bus cycle is done. Instead of duplicating that AND gate, the circuit uses T1-pre which incorporates that logic. (This took me a long time to figure out.) ↩
The 8088 has a 16-bit bus, compared to the 8088's 8-bit bus. Thus, a 16-bit bus operation on the 8088 will always require two 8-bit operations, while the 8086 can usually perform this operation in a single step. However, a 16-bit bus operation on the 8086 will still need to be broken into two 8-bit operations if the address is unaligned (i.e. odd). ↩