Booting the IBM 1401: How a 1959 punch-card computer loads a program

How do you boot a computer from punch cards when the computer has no operating system and no ROM? To make things worse, this computer requires special metadata called "word marks" that can't be represented on a card. In this blog post, I describe the interesting hardware and software techniques used in the vintage IBM 1401 computer to load software from a deck of punch cards. (Among other things, half of each card contains loader code that runs as each card is read.) I go through some IBM 1401 machine code in detail, which illustrates the strangeness of the 1401's architecture and instruction set compared to a modern machine.

The IBM 1401 was an early all-transistorized computer, so early that it didn't use silicon transistors but germanium transistors. It was announced in 1959, and went on to become the best-selling computer of the mid-1960s, with more than 10,000 systems in use. The 1401 leased for $2500 a month (about $20,000 in current dollars), a low price that opened up computing to many companies. Even a medium-sized business could use the 1401 for payroll, accounting, inventory, order processing, and invoicing.

An IBM 1401 mainframe computer at the Computer History Museum. IBM 729 tape drives are at the right.

An IBM 1401 mainframe computer at the Computer History Museum. IBM 729 tape drives are at the right.

To understand the 1401's architecture, it helps to understand how punch cards were used in that era. In 1928, IBM developed the 80-column punch card that became the standard for data processing for decades. A punch card held 80 characters, one per column, with the character represented by the holes punched in that column, as shown below. The 6-bit character set was limited to 64 different characters: upper case letters, numbers, and some special characters. Instead of binary, cards used a BCD-based encoding (which later was extended to create EBCDIC).1

Punch card code, from IBM 29 Card Punch Reference Manual.

Punch card code, from IBM 29 Card Punch Reference Manual.

Despite their limitations, punch cards were extensively used for data processing into the 1970s and beyond. A typical application used one card for each data record, so everything needed to fit into 80 columns2 which were divided up into fixed-length fields. Often, custom cards would be printed that showed the fields for an application, such as the card below designed for accounting.3 Each field has a fixed location. For instance, in the card below, the customer name is from columns 18 to 29 while the invoice amount is in columns 74 through 80.

Example card, from IBM 29 Card Punch Reference Manual.

The IBM 1401 has a peculiar architecture, optimized to support these punch-card applications. The idea is that fixed-length fields were be delimited in memory by word marks, a sort of metadata, and then instructions operated on these arbitrary-length fields. This let you move a 19-character name string with a single instruction. Or you could perform arithmetic on a 50-digit numeric field with a single instruction. Thus, word marks were convenient for fixed-field data, since you didn't need to loop over each character of the field.

To implement word marks, each memory location had 6 bits to hold a character as well as a separate bit to hold the word mark. (These were not bytes, as the IBM 1401 predated the popularity of byte-based computers.) It's important to note that the word marks were independent of the characters. Word marks were set or cleared using different instructions from the ones that acted on characters. Once word marks were configured, they remained unchanged as data records were read into memory.

Word marks were also critical for machine instructions since they indicated the length of the instruction. A machine instruction in the 1401 consisted of one to eight characters. The first character was the op code, potentially followed by addresses and/or a modifier. Each instruction needed to have a word mark set on the op code and a word mark on the next character after the instruction (i.e. the op code of the next instruction). Note that word marks create a problem. The machine instructions of a program are directly represented as characters on a punch card, but a punch card cannot hold the necessary word marks.

Thus, loading a program into the 1401 raised two problems. First was the standard computer bootstrap problem: if there's no program in the machine, what performs the load? But there was a second: word marks are a key component of 1401 machine code, but word marks cannot be represented on punch cards. In the next section, I'll explain in detail how the IBM 1401 solved these problems.

Loading a program

To load a program, a card deck, such as the short one below is placed into the card reader. Each card has the contents of the card printed at the top, with the holes punched in the columns below. The first two cards are bootstrap cards that initialize the computer's memory, clearing it out and setting necessary word marks. The bulk of the cards hold the machine code of the desired program on the left, and the machine code of the loader on the right. The last card runs the program.

A card deck for my Mandelbrot program.

A card deck for my Mandelbrot program.

At the far right of each card, columns 72-75 hold a sequence number (0001 through 0017). If you dropped a card deck, the cards could be put back into order by a card sorter, sorting on the sequence number.8

The load process was started by pressing the "Load" button on the card reader (the orange button near the center of the blue panel). This button causes several actions to take place.4 The first card was read, and the contents are placed in memory addresses 1 through 80. A word mark was set on address 1, and cleared from addresses 2 through 80. Finally, the instruction at address 1 was executed. Remember that these operations were implemented in hardware by boards with discrete transistors; there's no microcode or operating system to help out with these tasks.5

The IBM 1402 card reader/punch. The 1401 computer is in the background (left) and a tape drive is at the right.

The IBM 1402 card reader/punch. The 1401 computer is in the background (left) and a tape drive is at the right.

Bootstrap card 1

The first card contains the machine code:

,008015,022026,030040/019,001L020100   ,047054,061068,072072⌑08108110220001

The first instruction ,008015 is "Set Word Mark", a critical part of the bootstrap sequence. The comma is the op code and the address arguments are "008" and "015". (Since the 1401 is a decimal computer, not binary, the characters "015" are the same as the address 15.) This instruction sets word marks at the specified addresses, 8 and 15.

Remember that an instruction needs to have a word mark on the opcode and a second word mark on the character following the instruction. The "Load" button put a word mark at address 1, but what about the second word mark? It turns out that the hardware has an exception for the "Set Word Mark" instruction 6 allowing it to execute without the second word mark. (This exception is crucial, since otherwise the first instruction can't execute. Was this carefully planned or a hack to make things work? I don't know.)

The word marks that were set by the first instruction let the next two instructions run. They are also "Set Word Mark" instruction, putting word marks at addresses 22, 26, 30, and 40. Note that each "Set Word Mark" instruction sets two word marks but only "uses up" one, so the code is making progress, preparing word marks for future instructions.

Now we come to /019; with the slash opcode indicating the somewhat curious "Clear Storage" instruction. This instruction starts clearing storage at the specified address (19) and proceeds downwards until the address is a multiple of 100. Thus, in this case it will clear from address 19 down to address 0, erasing both characters and word marks. (These locations contained the instructions we just executed.) A location is erased by storing a blank; this may seem like a strange choice, but keep in mind that an empty punch card column is read as a blank. The next Set Word Mark instruction, ,001 puts a word mark back at location 1.

At this point, the contents of memory are as shown below. Word marks are indicated by underlined characters, which is how the IBM documentation indicated word marks.

                   40/019,001L020100   ,047054,061068,072072⌑08108110220001

The next instruction is L020100 "Load Characters to a Word Mark". This instruction copies the character at address 20 (i.e. "4") to address 100. The instruction then continues copying downwards (copying the blanks) until it hits a word mark (which is at address 1). To summarize, addresses 20 through 1 are copied to addresses 100 through 81. Locations 81 through 99 received blanks, while address 100 received a "4". This may seem pointless, but the "4" will turn out to be an important indicator shortly. This instruction also illustrates how word marks allow a long field to be copied with a single instruction.

The next three instructions set word marks at addresses 47, 54, 61, 68, and 72. (The boot code needs to go to a lot of effort to ensure that word marks are set up for future instructions.) The next instruction ⌑081081 has IBM's unusual "lozenge" character as the opcode. This instruction clears the word mark at address 81 (which had been copied from address 1). The final instruction on the card, 1022, reads the next card (opcode 1 is "Read") and then jumps to address 22. A lot has taken place to execute one card, but the next card has some remarkably tricky code.

Bootstrap card 2

After reading the second bootstrap card, memory locations 1 through 80 hold the data:

,008047/047046       /000H025B022100  4/061046,054061,068072,00104010400002

Execution of this card starts at address 22 with the Clear Storage instruction /000. Remember how the Clear Storage instruction proceeds downwards until the address is a multiple of 100? In this case, it will clear address 0 and then immediately stop on address 0 (a multiple of 100). However, a register called the B register will hold the next address (counting downwards), which will wrap from 0 to the top address in memory. For simplicity, I'll assume the code is running on a 1401 model with 1,000 characters of memory so the B register will hold the address 999.7

The next instruction H025 is a tricky bit of self-modifying code. It stores the contents of the B address register into locations 23-25, changing the "Clear Storage" instruction that we just executed to /999. Next, the B022100 4 instruction will branch to address 22 if address 100 holds a "4" (which is true because the first card put a "4" there.)

Back at address 22, the Clear Storage instruction was modified to be /999, so it will now clear addresses 999-900. It is followed by H025, which, as before will store the B register into the Clear Storage instruction. This time it will modify the Clear Storage to start at 899. Finally, the conditional branch loops back to address 22 as before.

The result is that this loop clears memory 100 characters at a time, using self-modifying code to update the position. This loop continues until addresses 100-199 are cleared. At this point, the branch instruction will fail because address 100 holds a blank and not a "4". At this point, the loop has cleared all of storage from 100 to the end of memory, erasing characters as well as any word marks.

The next instruction is Clear Storage /061046 which clears storage from address 46 down to 0 and then branches to 61. At address 61, ,001040 sets word marks at addresses 1 and 40. Finally, 1040 reads the next card and starts execution at address 40. As with the first card, columns 1 through 80 of the card are read into memory addresses 1 through 80.

The program cards

The next phase consists of reading the desired program into memory. A typical card is:

3332200999&2200&0000000100000          L029368,343346,351356,36136410400004

The left part of the card (columns 1-29) contains machine code for the program that we want to run. The right part (columns 41-71) contains the loader code that will execute card-by-card, loading that code into the right part of memory and setting word marks.

The first loader instruction L029368 copies the program code from the card reader buffer into the desired memory locations. Specifically, it will copy starting from address 29 down to the word mark at address 1. These characters will be copied into addresses 368 down to 340. The next instructions set the word marks in this code, at addresses 343, 346, 351, 356, 361, and 364. This answers the question of how the program in memory gets word marks even though punch cards can't explicitly store word marks. Finally, 1040 reads the next card and starts executing it at address 40.

The following cards have the same structure: the program on the left and the loader code on the right. Interestingly, the number of characters of program code is variable because the loader code can set at most 6 word marks per card. In the worst case, all the characters need word marks so only 6 characters can be provided by the card. In the best case, 40 characters can fit on the left side of the card.

The run card

The last card has the Clear Storage instruction /333080. This clears memory from address 80 downwards to 0, wiping out the card buffer and the loader code so the program will start with a clean slate. The Clear Storage instruction then jumps to address 333, starting the execution of the program. After all this work, the computer finally runs the program we wanted to run. While the loading process seems very long when written out, the card reader is fast for an electromechanical device, with over 13 cards per second zipping through it.

The program I used in the example is a Mandelbrot fractal generator that I wrote. The photo below shows the results of the program, which took 12 minutes to execute. I discuss the program in detail in this post.

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 line printer (right).

The IBM 1401 mainframe computer (left) at the Computer History Museum printing the Mandelbrot fractal on the 1403 line printer (right).

The bootstrap code I described above is just one of the possible bootstrap sequences. Programmers could write their own bootstrap code, trying to make it as short as possible. I described a longer three-card sequence here. The IBM 1401 could also boot from a magnetic tape using a similar process; pressing the "Tape Load" button on the console loaded a record from tape, just like booting from a card.

Console of the IBM 1401 computer. The "Tape Load" button is in the lower right.

Console of the IBM 1401 computer. The "Tape Load" button is in the lower right.

The origins of "bootstrapping"

The term "bootstrap" has an interesting history. It starts with physical boots, which often had boot straps on the top, physical straps to help pull the boots on (as shown below). In the 1800s, the saying "No man can lift himself by his own boot straps" was used as a metaphor for the impossibility of improvement solely through one's own effort. (Pulling on the straps on your boots superficially seems like it should lift you off the ground, but is of course physically impossible.)

Example of a boot strap at the heel of a boot, from patent 41087, not the first boot strap patent.

Example of a boot strap at the heel of a boot, from patent 41087, not the first boot strap patent.

By the mid-1940s, "bootstrap" was used in electronics to describe a circuit that started itself up through positive feedback, metaphorically pulling itself up by its boot straps. (See usages from 1943, 1944, and 1946). By 1952, analog computers used circuits called "bootstrap integrators".

When a digital computer loaded its program through its own efforts, this took on the name "bootstrap", dating back to the 1950s. (Using a program to load a program seems as paradoxical as lifting yourself up by your bootstraps, but fortunately it works.) A 1954 glossary defined "bootstrap" as "The coded instructions at the beginning of an input tape, together with one or two instructions inserted by switches or buttons into the computer, used to put a routine into the computer." A 1955 computer survey published by the Department of Commerce had a similar definition.

Conclusion

Bootstrapping the IBM 1401 was complicated, and the process has become even more complex in later computers. In the 1960s, computers such as the IBM System/360 had bootstrap microcode stored in read-only storage. This code could load a chain of bootstrap programs, first a 64-byte bootstrap card, which would then load a 4-kilobyte bootstrap program, which could then load the disk operating system. Some early minicomputers and microcomputers lacked ROM and took a step backward, requiring the user to tediously toggle in boot code through switches on the front panel.

Modern computers go through a much more complex bootstrap process. The initial boot code for an x86 system is stored in ROM, and booting happens through the BIOS in older computers or UEFI in more modern systems. The system starts in a primitive state without caches or virtual memory, running a single core in "8086 real mode". The boot code sets up the system and loads a bootloader program, which may then load another bootloader, which loads the kernel, which starts up the computer's various processes. Details are in this presentation.

Studying the 1401's machine code shows many of its unusual characteristics compared to modern computers and the strangeness of its instruction set. Needing to deal with word marks is the most obvious difference, with special instructions to set and erase them. From a modern perspective, it's unusual to see a computer that doesn't use bytes, although that was common back then. The use of decimal arithmetic and decimal addressing also seems strange from the modern perspective. Another curiosity is self-modifying code. Although self-modifying code is discouraged nowadays, it was common on the 1401 (as with other computers of that era).

I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

Notes and references

  1. While punch cards almost always held character data, an optional feature called "column binary" allowed binary data to be punched onto cards, 12 bits in each column. IBM charged $101 a month (in 1960s dollars) for the column binary feature. 

  2. The need to fit all the data into 80 columns was one of the factors that led to the Y2K problem. If you used four columns on a card to hold the year instead of two, you'd need to give up two precious columns somewhere else. 

  3. The punch card below is an example of a card custom-printed for a customer application. This card was used for payroll at the Phoenix Steel Corporation.

    A punch card designed for a steel mill.

    A punch card designed for a steel mill.

     

  4. The operation of the load key is specified in the 1401 reference manual (p118): "This key is used to start loading instruction cards. Pressing the load key operates the read feed until a card has passed the read station. The I -address register is set to 001, and a word mark is set in address 001. All other word marks in addresses 002 through 080 are removed." The instruction in the first columns is executed, and then continued operation is controlled by the first instruction. 

  5. How does the first word mark get set when you load the first card? I looked at the documentation of the circuitry and found the relevant flip-flop (below). It is set by the load button, sets the first word mark (WM), and then is cleared.

    The flip-flop to set word marks. From the 1401 logic diagrams, figure 81.

    The flip-flop to set word marks. From the 1401 logic diagrams, figure 81.

    The photo below shows the card that implements this flip-flop. With the 1401, you can actually see the physical transistors that implement each function.

    A flip-flop card, type "CW".

    A flip-flop card, type "CW".

     

  6. Instructions need to be indicated with word marks with a few specific exceptions. As documented in the 1401 reference manual (p15) "The 4-character unconditional branch instruction, the 7-character set word mark, and clear storage and branch instructions are the only instructions that can be followed by a blank without a word mark. All other instructions must be followed by a word mark." 

  7. The 1401 computer that I used has 16,000 characters of memory (not 16,384 because it's a decimal machine!) so after the Clear Storage instruction, the B register will hold 15,999, pointing to the top of memory. You might wonder how the address 15,999 is represented in three decimal characters. The trick is that a special address code uses the top two bits of the characters to hold the kilobyte part of the address. The resulting address is 999 with the top two bits of the hundreds and units characters set. The result is the three-character alphanumeric address I9I represents the address 15,999. 

  8. If you had the misfortune to drop your cards, a card sorter could put them back in order using the sequence numbers. A card sorter rapidly sorted cards into slots based on the digit punched in one column. By running the cards through several times, you could sort on the complete sequence number. I discuss card sorters in great detail here. (A low-tech way to keep cards in order was to draw a diagonal line across the top of the cards; it helped when putting cards back in order manually.)

    An IBM Type 83 card sorter. Cards enter the machine on the right, whiz along the top of the machine, and fall into the appropriate hopper underneath.

    An IBM Type 83 card sorter. Cards enter the machine on the right, whiz along the top of the machine, and fall into the appropriate hopper underneath.

    The use of sequence numbers in columns 73-80 goes back to the Fortran language. Fortran was developed for the IBM 704 vacuum tube computer. The 704 was a 36-bit machine. The punch-card reading process used two 36-bit words, so only 72 columns could be read. (These could be any 72 columns of the card, selected by a wiring panel, but typically columns 1-72 were used.) The result was that columns 1-72 were used for code (a restriction still often used), while columns 73-80 were free for sequence numbers. 

Teardown of a quartz crystal oscillator and the tiny IC inside

The quartz oscillator is an important electronic circuit, providing highly-accurate timing signals at a low cost. A quartz crystal has the special property of piezoelectricity, changing its electrical properties as it vibrates. Since a crystal can be cut to vibrate at a very precise frequency, quartz oscillators are useful for many applications. Quartz oscillators were introduced in the 1920s and provided accurate frequencies for radio stations. Wristwatches were revolutionized in the 1970s by the use of highly-accurate quartz oscillators. Computers use quartz oscillators to generate their clock signals, from ENIAC in the 1940s to modern computers.1

A quartz crystal requires additional circuitry to make it oscillate, and this analog circuitry can be tricky to design. In the 1970s, crystal oscillator modules became popular, combining the quartz crystal, an integrated circuit, and discrete components into a compact, easy-to-use module. Curious about the contents of these modules, I opened one up and reverse-engineered the chip inside. In this blog post, I discuss how the module works and examine the tiny CMOS integrated circuit that runs the oscillator. There's more happening in the module than I expected, so I hope you find it interesting.

The oscillator module

I examined the oscillator module from an IBM PC card.2 The module is packaged in a rectangular 4-pin metal can that protects the circuitry from electrical noise. (It is the "Rasco Plus" rectangular can on the right, not the square IBM integrated circuit.) This module produced a 4.7174 MHz clock signal, as indicated by the text on the package.

The quartz oscillator module is in the lower right, labeled Rasco Plus. 4.7174 MHZ, © Motorola 1987. The square module is an IBM integrated circuit. Click this (or any other image) for a larger version.

The quartz oscillator module is in the lower right, labeled Rasco Plus. 4.7174 MHZ, © Motorola 1987. The square module is an IBM integrated circuit. Click this (or any other image) for a larger version.

I cut open the can to reveal the hybrid circuitry inside. I was expecting a gem-like quartz crystal inside, but found that oscillators use a very thin disk of quartz. (I damaged the crystal while opening the package, so the upper part is missing..) The quartz crystal is visible on the left, with metal electrodes attached to either side of the crystal. The electrodes are attached to small pegs, raising the crystal above the surface so it can oscillate freely.

Inside the oscillator package, showing the components mounted on the ceramic substrate.

Inside the oscillator package, showing the components mounted on the ceramic substrate.

On the right side of the module is a tiny CMOS integrated circuit die. It is mounted on the ceramic substrate and connected to the circuitry by tiny golden bond wires. A surface-mount capacitor (3 nF) and a film resistor (10Ω) on the substrate filter out noise from the power pin.

The IC's circuitry

The photo below shows the tiny integrated circuit die under a microscope, with the pads and main functional blocks labeled. The brownish-green regions are the silicon that forms the integrated circuit. A metal layer (yellowish white) wires up the components of the IC. Below the metal, reddish polysilicon implements transistors, but it is mostly obscured by the metal layer. Around the outside of the chip, bond wires are connected to pads, wiring the chip to the rest of the oscillator module. Two pads (select and disable) are left unconnected. The chip was manufactured by Motorola, with a 1986 date. I couldn't find any information on the part number SC380003.

The integrated circuit die with key blocks labeled. "FF" indicates flip-flops. "sel" indicates select pads. "cap" indicates pads connected to the internal capacitors.

The integrated circuit die with key blocks labeled. "FF" indicates flip-flops. "sel" indicates select pads. "cap" indicates pads connected to the internal capacitors.

The IC has two functions. First, its analog circuitry drives the quartz crystal to produce oscillations. Second, the IC's digital circuitry divides the frequency by 1, 2, 4, or 8, and produces a high-current clock output signal. (The division factor is selected by the two select pins on the IC.)

The oscillator is implemented with a circuit (below) called a Colpitts oscillator, which is more complex than the usual quartz oscillator circuit.43 The basic idea is that the crystal and the two capacitors oscillate at the desired frequency. The oscillations would rapidly die out, however, except for the feedback boost from the drive transistor.

Simplified schematic of the oscillator.

Simplified schematic of the oscillator.

In more detail, as the voltage across the crystal increases, the transistor turns on, feeding current into the capacitors and boosting the voltage across the capacitors (and thus the crystal). But as the voltage across the crystal decreases, the transistor turns off and the current sink (circle with arrow) pulls current out of the capacitors, reducing the voltage across the crystal. Thus, the feedback from the drive transistor strengthens the crystal's oscillations to keep them going.

The bias voltage and current circuits are an important part of this circuit. The bias voltage sets the drive transistor's gate midway between "on" and "off", so the voltage oscillations on the crystal will turn it on and off. The bias current is set midway between the drive transistor's on and off currents so the current flowing in and out of the capacitors balances out.5 (I'm saying "on" and "off" for simplicity; the signal will be a sine wave.)

A large part of the integrated circuit is occupied by five capacitors. One is the upper capacitor in the schematic, three are paralleled to form the lower capacitor in the schematic, and one stabilizes the voltage bias circuit. The die photo below shows one of the capacitors after dissolving the metal layer on top. The red and green region is polysilicon, which forms the upper plate of the capacitor, along with the metal layer. Underneath the polysilicon, the pinkish region is probably silicon nitride, forming the insulating dielectric layer. The doped silicon (not visible underneath) forms the bottom plate of the capacitor.

A capacitor on the die. The large faint square to the left of the capacitor is a pad for connecting a bond wire to the IC.
The complex structures on the left are clamp diodes on the pins. The cloverleaf structures on the right are transistors, which will
be discussed later.

A capacitor on the die. The large faint square to the left of the capacitor is a pad for connecting a bond wire to the IC. The complex structures on the left are clamp diodes on the pins. The cloverleaf structures on the right are transistors, which will be discussed later.

Curiously, the capacitors aren't connected together on the chip, but are connected to three pads that are wired together by bond wires. Perhaps this provides flexibility; the capacitance in the circuit can be modified by omitting the wire to a capacitor.

The digital circuitry

The right side of the chip contains digital circuitry to divide the crystal's output frequency by 1, 2, 4, or 8. This lets the same crystal provide four different frequencies. The divider is implemented by three flip-flops in series. Each one divides its input pulses by 2. A 4-to-1 multiplexer selects between the original clock pulses, or the output from one of the flip-flops. The choice is made through the wiring to the two select pads on the right side of the die, fixing the ratio at manufacturing time. Four NAND gates (along with inverters) are used to decode these pins and generate four control signals to the multiplexer and flip-flops.

How CMOS logic is implemented

The chip is built with CMOS logic (complementary MOS), which uses two types of transistors, NMOS and PMOS, working together. The diagram below shows how an NMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain (green) consist of regions of silicon doped with impurities to change its semiconductor properties and called N+ silicon. The gate consists of a special type of silicon called polysilicon, separated from the underlying silicon by a very thin insulating oxide layer. The NMOS transistor turns on when the gate is pulled high.

Structure of an NMOS transistor. A PMOS transistor has the same structure, but with N-type and P-type silicon reversed.

Structure of an NMOS transistor. A PMOS transistor has the same structure, but with N-type and P-type silicon reversed.

A PMOS transistor has the opposite construction from NMOS: the source and drain consist of P+ silicon embedded in N silicon. The operation of a PMOS transistor is also opposite from the NMOS transistor: it turns on when the gate is pulled low. Typically PMOS transistors pull the drain (output) high, while NMOS transistors pull the drain low. In CMOS, the transistors act in a complementary fashion, pulling the output high or low as needed.

The diagram below shows how a NAND gate is implemented in CMOS. If an input is 0, the corresponding PMOS transistor (top) will turn on and pull the output high. But if both inputs are 1, the NMOS transistors (bottom) will turn on and pull the output low. Thus, the circuit implements the NAND function.

A CMOS NAND gate is implemented with two PMOS transistors (top) and two NMOS transistors (bottom).

A CMOS NAND gate is implemented with two PMOS transistors (top) and two NMOS transistors (bottom).

The diagram below shows how a NAND gate appears on the die. The transistors have complex, meandering shapes, unlike the rectangular layouts that appear in textbooks. The left side holds the PMOS transistors, while the right side holds the NMOS transistors. The polysilicon that forms the gates is the slightly reddish wiring on top of the silicon. Most of the underlying silicon is doped, making it conductive and slightly darker than the non-conductive undoped silicon along the left and right edges and in the center. For this photo, the metal layer was removed with acid to reveal the silicon and polysilicon underneath; the yellow line illustrates where some of the metal wiring was. The circles are connections between the metal layer and the underlying silicon or polysilicon.

A NAND gate as it appears on the die.

A NAND gate as it appears on the die.

The transistors in the die photo can be matched up with the NAND-gate schematic; look at the transistor gates formed by polysilicon and what they separate. There is a path from the +5 region to the output through the large elongated PMOS transistor on the left, and a second path through the small PMOS transistor near the center, indicating the transistors are in parallel. Each gate is controlled by one of the inputs. On the right, a path from ground to the output connection must go through both of the concentric NMOS transistors, indicating they are in series.

This integrated circuit also uses many circle-gate transistors, an unusual layout technique that allows multiple transistors in parallel at high density. The photo below shows 16 circle-gate transistors. The copper-colored cloverleaf patterns are the transistor gates, implemented with polysilicon. The inside of each "leaf" is the transistor drain, while the outside is the source. The metal layer (removed) wires all the sources, gates, and drains together respectively; the parallel transistors act as one larger transistor. Paralleled transistors are used in the output pin drivers to provide high current for the output. In the bias circuitry, different numbers of transistors are wired together (e.g. 6, 16, or 40) to provide the desired current ratios.

Sixteen circle-gate transistors with four gate connections.

Sixteen circle-gate transistors with four gate connections.

Transmission gate

Another key circuit in the chip is the transmission gate. This acts as a switch, either passing a signal through or blocking it. The schematic below shows how a transmission gate is constructed from two transistors, an NMOS transistor and a PMOS transistor. If the enable line is high, both transistors turn on, passing the input signal to the output. If the enable line is low, both transistors turn off, blocking the input signal. The schematic symbol for a transmission gate is shown on the right.

A transmission gate is constructed from two transistors. The transistors and their gates are indicated. The schematic symbol is on the right.

A transmission gate is constructed from two transistors. The transistors and their gates are indicated. The schematic symbol is on the right.

Multiplexer

A multiplexer is used to select one of the four clock signals. The diagram below shows how the multiplexer is implemented from transmission gates. The multiplexer takes four inputs: A, B, C, and D. One of the inputs is selected by activating the corresponding select line and its complement. That input is connected through the transmission gate to the output, while the other inputs are blocked. Although a multiplexer can be built with standard logic gates, the implementation with transmission gates is more efficient.

The 4-to-1 multiplexer is implemented with transmission gates.

The 4-to-1 multiplexer is implemented with transmission gates.

The schematic below shows the transistors that make up the multiplexer. Note that inputs B and C have pairs of transistors. I believe the motivation is that a pair of transistors presents half the resistance to the signal. Since inputs B and C are the higher-frequency signals, the pair of transistors allows them to pass through with less distortion and delay.

Schematic of the multiplexer, matching the physical layout on the chip.

Schematic of the multiplexer, matching the physical layout on the chip.

The image below shows how the multiplexer is physically implemented on the die. The polysilicon gate wiring is most prominent. The metal layer has been removed; the metal lines ran vertically connecting corresponding transistors segments. Note that the sources and drains of neighboring transistors are merged into single regions between the gates. The top rectangle holds the NMOS transistors while the lower rectangle holds the PMOS transistors; because PMOS transistors are less efficient, the lower rectangle needs to be larger.

Die photo of the multiplexer.

Die photo of the multiplexer.

Flip-flop

The chip contains three-flip-flops to divide the clock frequency. The oscillator uses toggle flip-flops, that flip between 0 and 1 each time they receive an input pulse. Since two input pulses result in one output pulse (0→1→0), the flip-flop divides the frequency by 2.

A flip-flop is constructed from transmission gates, inverters, and a NAND gate, as shown in the schematic below. When the input clock is high, the output passes through the inverter and the first transmission gate to point A. When the input clock switches low, the first transmission gate opens, so point A holds its previous value. Meanwhile, the second transmission gate closes, so the signal passes through the second inverter and transmission gate to point B. The NAND gate inverts it again, causing the output to flip from its previous value. A second cycle of the input clock repeats the process, causing the output to return to its initial value. The result is that two cycles of the input clock result in one cycle of the output, so the flip-flop divides the frequency by 2.

Implementation of a toggle flip-flop.

Implementation of a toggle flip-flop.

Each flip-flop has an enable input. If a flip-flop is not needed for the selected output, it is disabled. For instance, if the "divide by 2" mode is selected, only the first flip-flop is used, and the other two are disabled. I assume this is done to reduce power consumption. Note that this is independent from the module's disable pin, which blocks the module output entirely. This disable feature is optional; this particular module does not provide the disable feature and the disable pin is not wired to the IC.

The schematic above shows the inverters and transmission gates as separate structures. However, the flip-flop uses an interesting gate structure that combines the inverter and the transmission gate (left) into a single gate (right). The pair of transistors connected to data in function as an inverter. However, if the clock in is low, both power and ground are blocked so the gate will not affect the output and it will hold its previous voltage. This provides the transmission gate functionality.

Implementation of a combination inverter / transmission gate.

Implementation of a combination inverter / transmission gate.

The photo below shows how one of these gates appears on the die. This photo includes the metal layer on top; the reddish polysilicon gates are visible underneath. The two PMOS transistors are on the left, as concentric loops, while the NMOS transistors are on the right.

One of the combination inverter / transmission gates, as it appears on the die.

One of the combination inverter / transmission gates, as it appears on the die.

Conclusion

While the oscillator module looks simple from the outside, on the inside there's a lot more complexity than you might expect.6 It contains not just a quartz crystal but also discrete components and a tiny integrated circuit. The integrated circuit combines capacitors, analog circuitry to drive the oscillations, and digital circuitry to choose a frequency. By changing the wiring to the integrated circuit during manufacturing, four different frequencies can be selected.

I'll end with the die photo below showing the chip after removing the metal and oxide layers, showing the silicon and polysilicon underneath. The large pinkish capacitors are the most visible feature in this image, but the transistors can also be seen. (Click the image for a larger version.)

Die photo of the oscillator chip with metal removed to show the polysilicon and silicon underneath.

Die photo of the oscillator chip with metal removed to show the polysilicon and silicon underneath.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

  1. Modern PCs use quartz crystals, but with a more complex technique to get multi-gigahertz clock frequencies. A PC uses a crystal with a much lower frequency, and multiplies the frequency using a circuit called a phase-locked loop. Computers often used a 14.318 MHz crystal because that frequency was used in old television sets, so crystals with that frequency were common and cheap. 

  2. Why does the board use a 4.7174 MHz crystal, a somewhat unusual frequency? In the 1970s, the IBM 3270 was a very popular CRT terminal. These terminals were connected with coaxial cable and used the Interface Display System Standard protocol with a 2.3587 MHz bit rate. In the late 1980s, IBM produced interface cards to connect an IBM PC to a 3270 network. I obtained the crystal from one of these interface cards (type 56X4927), and the crystal frequency of 4.7174 MHz is exactly twice the 2.3587 MHz bit rate. 

  3. The terminology used for crystal oscillators is confusing with "Colpitts oscillator" and "Pierce oscillator" used in contradictory ways. I looked into the history of oscillators to try to sort out the naming, and I'll discuss it in this footnote.

    In 1918, Edwin Colpitts, the head researcher at Western Electric, invented an inductor/capacitor oscillator, now known as the Colpitts Oscillator. The idea is that the inductor and capacitors form a "resonant tank", which oscillates at a frequency set by the component values. (You can think of the electricity in the tank as sloshing back and forth between the inductor and the capacitors.) On their own, the oscillations would rapidly die out, so an amplifier is used to boost the oscillators. In the original Colpitts oscillator, the amplifier was a vacuum tube. Later circuits moved to transistors, but it can also be an op-amp or other type of amplifier. (Other circuits, such as the module I examined, ground an end and provide feedback to the middle. In that case, there is no inversion from the capacitors, so a non-inverting amplifier is used.)

    A simplified schematic of a Colpitts oscillator, showing the basic components.

    A simplified schematic of a Colpitts oscillator, showing the basic components.

    The key feature of the Colpitts oscillator is the two capacitors, which form a voltage divider. Since the capacitors are grounded in the middle, the two ends will have opposite voltages: when one end goes up, the other goes down. The amplifier takes the signal from one end, amplifies it, and feeds it into the other end. The amplifier inverts the signal and the capacitors provide a second inversion, so the feedback strengthens the original signal (i.e. it has a phase shift of 360°).

    In 1923, George Washington Pierce, a professor of physics at Harvard, replaced the inductor in the Colpitts oscillator with a crystal. The crystal made the oscillator much more accurate (higher Q factor), leading to its heavy use in radio transmission and other applications. Pierce patented his invention and made a lot of money off it from companies such as RCA and AT&T. The patents led to years of litigation, eventually reaching the Supreme Court. (For more information, see this thesis on crystal history.)

    For several decades, the common terminology was that a Pierce oscillator was a Colpitts oscillator that used a crystal. (See Air Force Manual, 1957 and Navy training, 1983 for instance.) The Pierce oscillator often omitted the characteristic voltage-divider capacitors, using the stray capacitance of the vacuum tube instead. But then terminology shifted, with "Colpitts oscillator" and "Pierce oscillator" indicating two different types of crystal oscillator: Colpitts with the capacitors and Pierce without the capacitors. (See, for example, the classic electronics text Horowitz and Hill.)

    Another change in terminology was to describe the Colpitts oscillator, Pierce oscillator, and Clapp oscillator as topologically identical crystal oscillators, just differing in what point in the circuit was considered AC ground (the collector, emitter, or base respectively). (See Frerking's Crystal Oscillator Design and Temperature Compensation (1978, p56) or Maxim's crystal oscillator tutorial.) Alternatively, these oscillators can all be called Colpitts, but common-collector, common-emitter, or common-base (details).

    The point of this history is that oscillator terminology is confusing, with different sources calling oscillators Colpitts or Pierce in contradictory ways. Getting back to the oscillator module I examined, it could be described as a common-drain Colpitts oscillator (analogous to common-collector). It would also be called a Colpitts oscillator using the terminology based on the ground position. Historically, it would be called a Pierce oscillator since it uses a crystal. It's also called a single-pin crystal oscillator since only one pin of the crystal is connected to the circuitry (and the other is grounded). 

  4. The typical quartz oscillator is built using a simple circuit called the Pierce-gate oscillator, where the crystal forms a feedback loop with an inverter. (The two capacitors grounded in the middle make this very similar to the classical Colpitts oscillator.)

    The Pierce oscillator circuit commonly used as a computer clock. Diagram by Omegatron, CC BY-SA 3.0.

    The Pierce oscillator circuit commonly used as a computer clock. Diagram by Omegatron, CC BY-SA 3.0.

    I'm not sure why the module I disassembled uses a more complex oscillator circuit that requires tricky biasing. 

  5. The voltage bias and current bias circuits are moderately complex analog circuits built with a bunch of transistors and a few resistors. I won't describe them in detail, but they use feedback loops to generate the desired fixed voltage and current. 

  6. If you want to learn more about quartz oscillators, there are interesting videos at EEVblog, electronupdate, and WizardTim. Colpitts oscillators are explained in videos at Hackaday

A one-bit processor explained: reverse-engineering the vintage MC14500B

The Motorola MC14500B1 is a 1-bit processor introduced in 1976. While a 1-bit processor might seem almost useless,2 it was marketed as an Industrial Control Unit for applications that made simple decisions based on Boolean logic, for example, air conditioning, motor control, or traffic lights.

The die photo below shows the processor under a microscope. This silicon appears greenish, while the white lines on top are the metal layer that wires the transistors together. The 16 black spots around the edges are the bond wires that connect the chip to its 16 external pins. The MC14500B has roughly 500 transistors, very few for a microprocessor. In comparison, the popular 8-bit Z-80 microprocessor, also released in 1976, had 8500 transistors. Even the first microprocessor, the 4-bit Intel 4004 (1971), contained 2250 transistors.

The die of the MC14500B with functional blocks labeled. The pins are labeled around the outside. Die photo from
siliconpr0n
(CC BY 4.0).

The die of the MC14500B with functional blocks labeled. The pins are labeled around the outside. Die photo from siliconpr0n (CC BY 4.0).

You might think that a 1-bit processor would only support two instructions, making it impractical. However, like many processors, the MC14500B uses different sizes for data and instructions. Although it used one bit for data, its instructions were 4 bits, giving it a small but usable instruction set of 16 instructions.3

The MC14500B has an unusual architecture, making it more of a building block than a complete microprocessor. In particular, the chip doesn't include any support for memory or addresses; it didn't even have a program counter. The program counter, instruction fetches, jumps, subroutine calls, and I/O needed to be implemented with external circuitry.4 This is a key reason that the chip was so simple. (The other reason, of course, was that it only supported one bit.)

Since the MC14500B was designed for industrial control applications, you'd expect it to be a microcontroller, but it's the opposite of a microcontroller in many ways. A typical microcontroller is a computer-on-a-chip including RAM and ROM, with strong I/O support, providing a single-chip solution. The MC14500B, however, requires multiple external chips to make it usable.

The MC14500B comes in a 16-pin DIP integrated circuit, much smaller than the 40-pin packages commonly used for microprocessors at the time. The "CP" suffix indicates a plastic package. Photo from siliconpr0n
(CC BY 4.0).

The MC14500B comes in a 16-pin DIP integrated circuit, much smaller than the 40-pin packages commonly used for microprocessors at the time. The "CP" suffix indicates a plastic package. Photo from siliconpr0n (CC BY 4.0).

The block diagram below shows the internal structure of the chip. The Data pin in the upper left provides the single-bit I/O line. It feeds into the Logical Unit (LU), which implements 1-bit Boolean logic functions such as AND and OR. The result is stored in the Result Register (RR), the chip's main storage register. The chip has an on-board oscillator OSC that uses an external resistor to control the clock speed. (The chip runs at up to 1 megahertz, faster than I expected.) The Instruction Register stores the 4-bit instruction; the circuitry to decode an instruction occupies the majority of the chip. The JMP, RTN, FLAG O, and FLAG F pins are activated by the corresponding instructions, but the functionality must be implemented externally. Note the lack of a program counter or address pins.

Block diagram of the MC14500B. From the datasheet.

Block diagram of the MC14500B. From the datasheet.

The motivation for making such a stripped-down processor was to provide a low-cost alternative for applications that didn't require a full microprocessor. In 1977, the MC14500B cost $7.58 in quantities of 100 ($32 in current dollars), which seems expensive. However, at the time, an 8080A CPU cost $20 and a Z80 cost $50 ($85 and $215 in current dollars) so there was a significant cost saving to the MC14500B.5 However, the steady fall of processor prices soon made the MC14500B less attractive.

How CMOS logic is implemented

The chip was one of the first processors built from CMOS circuitry,6 a low-power logic family now used in almost all processors. CMOS (complementary MOS) circuitry uses two types of transistors, NMOS and PMOS, working together. The diagram below shows how a PMOS transistor is constructed. The transistor can be considered a switch between the source and drain, controlled by the gate. The source and drain (green) consist of regions of silicon doped with impurities to change its semiconductor properties and called P+ silicon. The gate consists of an aluminum layer, separated from the silicon by a very thin insulating oxide layer.7 (These three layers—Metal, Oxide, Semiconductor—give the MOS transistor its name.) The PMOS transistor turns on when the gate is pulled low.

Structure of a PMOS transistor. An NMOS transistor has the same structure, but with N-type and P-type silicon reversed.

Structure of a PMOS transistor. An NMOS transistor has the same structure, but with N-type and P-type silicon reversed.

An NMOS transistor has the opposite construction from PMOS: the source and drain consist of N+ silicon embedded in P silicon. The operation of an NMOS transistor is also opposite from the PMOS transistor: it turns on when the gate is pulled high. Typically PMOS transistors pull the drain (output) high, while NMOS transistors pull the drain low. In CMOS, the transistors act in complementary fashion, pulling the output high or low as needed.

Because the NMOS transistor is built in P silicon, but the silicon die itself is N silicon, the NMOS transistors are surrounded by a tub or well of P silicon. The cross-section diagram below shows how the NMOS transistor on the right is embedded in the well of P-type silicon. The NMOS and PMOS transistors both require a bias voltage connection to the underlying silicon substrate to block signals from escaping from the transistors.8 These bias connections can be seen scattered across the chip.

Cross-section of CMOS transistors.

Cross-section of CMOS transistors.

The basic CMOS gate is an inverter, shown below. It is constructed from a PMOS transistor and an NMOS transistor acting in opposite (i.e. complementary) fashion. When the input is low, the PMOS transistor (top) turns on, pulling the output high. When the input is high, the NMOS transistor (bottom) turns on, pulling the output low.

A CMOS inverter is constructed from a PMOS transistor (top) and an NMOS transistor (bottom).

A CMOS inverter is constructed from a PMOS transistor (top) and an NMOS transistor (bottom).

The diagram below shows how an inverter, outlined in red, appears on the die. Note that a single inverter takes a visible part of the die. The next image zooms in on the inverter; the metal wiring is visible as the white lines, while the silicon is mostly obscured. The third image shows the silicon layer after removing the metal with acid. Note how the metal gate lines up with silicon underneath. The circular contacts or vias connect the metal layer to the silicon.

How an inverter appears on the die. The middle image shows the metal layer. The metal was removed for the last image to show the underlying silicon.

How an inverter appears on the die. The middle image shows the metal layer. The metal was removed for the last image to show the underlying silicon.

The inverter consists of a PMOS transistor on top and an NMOS transistor below, connected together as described in the schematic earlier. In the diagram below, the silicon regions have been colored to show how they form transistors. Note that the source and drain aren't necessarily discrete, but can merge with neighboring transistors.

The inverter consists of silicon regions doped to form PMOS and NMOS transistors.

The inverter consists of silicon regions doped to form PMOS and NMOS transistors.

Other logic gates are constructed using the same concepts as the inverter, but with additional transistors. In a NAND gate, the PMOS transistors on top are in parallel, so the output will be pulled high if either input is 0. The NMOS transistors on the bottom are in series, so the output will be pulled low if both inputs are 1. Thus, the circuit implements the NAND function. (Note how the PMOS and NMOS transistors act in complementary fashion.) The NOR gate is implemented similarly, swapping the series and parallel transistors. The chip also uses more complex gates, discussed in the footnote.9

A NAND gate and a NOR gate are constructed in CMOS by putting transistors in series and parallel.

A NAND gate and a NOR gate are constructed in CMOS by putting transistors in series and parallel.

Transmission gate

Another key circuit in the processor is the transmission gate. This acts as a switch, either passing a signal through or blocking it. The schematic below shows how a transmission gate is constructed from two transistors, an NMOS transistor and a PMOS transistor. If the enable line is high, both transistors turn on, passing the input signal to the output. If the enable line is low, both transistors turn off, blocking the input signal. The schematic symbol for a transmission gate is shown on the right.

A transmission gate is constructed from two transistors. The transistors and their gates are indicated. The schematic symbol is on the right.

A transmission gate is constructed from two transistors. The transistors and their gates are indicated. The schematic symbol is on the right.

The photo below shows how a transmission gate appears on the die. This photo shows the metal layer, so the underlying silicon is difficult to see. The two transistors are outlined. Note that an inverter has the same input to both gates, so one transistor turns on at a time. In the transmission gate, however, the gates have opposite inputs, so the transistors turn on or off together.

A transmission gate as it appears on the die.

A transmission gate as it appears on the die.

Flip-flop

By combining inverters and transmission gates, an important circuit called the flip-flop is constructed. A flip-flop stores one bit, controlled by a clock signal. The flip-flops have a key role in the chip as they keep the processor synchronized to the clock.

A flip-flop is based on a latch built from two inverters, below By connecting two inverters in a loop, the circuit can store either a 0 or a 1. If the input to an inverter is a 1, it outputs a 0; this causes the other inverter to output a 1, feeding back to the first inverter. Thus, the circuit is stable in either the 0 or 1 state.

Two cross-coupled inverters can store a 0 or a 1.

Two cross-coupled inverters can store a 0 or a 1.

The circuit above requires two changes to form a useful flip-flop. First, it requires a way of storing a value in the latch. This is solved in a brute-force way. One of the inverters uses a weak, low-current transistor.10 An input signal can override this signal, forcing the inverters into the desired state. This input is controlled by a transmission gate: when the gate is active, the input signal is stored in the inverter latch. When the transmission gate is inactive, the inverter latch loop holds the value.

A flip-flop is constructed from two latches separated by transmission gates.

A flip-flop is constructed from two latches separated by transmission gates.

The second change is that two inverter latches are used. The first is controlled by the clock, while the second is controlled by the inverted clock. While the clock is high, the input value passes into the first inverter latch. But when the clock goes low, the transmission gates switch state: the first transmission gate blocks any additional changes, while the second transmission gate passes the value from the first inverter latch to the second latch and thus the output. In effect, the flip-flop grabs the input value when the clock switches low, and holds this output until the next time the clock switches low.

The diagram below shows one of the flip-flops in detail (specifically the instruction register bit I3). It consists of four inverters and two transmission gates, as described earlier but arranged top-to-bottom. The left half consists of the well of P-type silicon for the NMOS transistors, while the right half holds the NMOS transistors. As a result, the inverters and transmission gates have one transistor on each side. This forces some of the gates to have their two transistors widely separated, as seen below.

Implementation of a flip-flop, as seen on the die. This photo shows the metal layer.

Implementation of a flip-flop, as seen on the die. This photo shows the metal layer.

The diagram below shows the locations of the chip's flip-flops. Each flip-flop takes up a substantial part of the chip, despite being the simple circuit described above. This should give you an idea of the small amount of circuitry in the chip. On the left, the 4-bit instruction register consists of four flip-flops. The IEN and OEN registers, as well as two flip-flops to control write operations are on the right. At the bottom, six flip-flops hold the values for the Flag O, RTN, Flag F, and JMP pins, as well as buffering the data mux and holding the instruction skip state. The Result Register (RR) in the lower right is a more complex latch; it has circuitry to hold its existing value as well as a reset circuit.

Locations of the flip-flops on the die.

Locations of the flip-flops on the die.

The logic unit

The logic unit (LU) performs 1-bit operations.11 It takes a bit from the data pin, a bit from the RR register, and stores the result in the RR. It implements seven functions: load a bit into the RR, load the complement into the RR, logical AND, logical AND complement, logical OR, logical OR complement, and exclusive NOR. The table below summarizes these operations. Each column shows the results of an operation, for the four possible input combinations.

The logic unit implements seven different operations, using the RR bit and data pin bit as inputs.

The logic unit implements seven different operations, using the RR bit and data pin bit as inputs.

The colored rectangles indicate how the logic unit is implemented internally. A green rectangle indicates the data value is copied to the output. An orange rectangle indicates the complement of the data value is copied to the output. A blue rectangle indicates that the RR register is OR'd into the result.

The diagram below shows the three complex logic gates that implement this table. (These are AND-OR-INVERT and OR-AND-INVERT gates, but I've removed the inverters to simplify the explanation.) The appropriate inputs (i.e. colors) are selected based on the instruction and the value of the RR register. The upper-left gate is active for the combinations of instruction and RR value that use the inverted data value (i.e. orange). The lower-left gate is active for the combinations that use data (i.e. green). The right gate selects inverted data, RR, and data as appropriate, and ORs them together to form the final result, stored in the RR register.

The implementation of the logic unit. Note that the colors are associated with conceptual paths, not separate gates.

The implementation of the logic unit. Note that the colors are associated with conceptual paths, not separate gates.

I've looked at a lot of Arithmetic-Logic Units (ALUs) before, and this implementation is rather unusual. The main factor is that it doesn't perform arithmetic operations, so it's not dealing with sums, differences, carry-in, and carry-out. It's also one bit wide, rather than 8 or 16 bits. Due to these factors, it's implemented with the gates shown above, rather than a more typical combination of adders. Another interesting thing about the implementation is that the logic unit's circuitry is mixed in with the instruction decoding circuitry, rather than physically separating the two, as in most processors. (See the die photo at the top of the article.)

Instruction decoding

Much of the chip is devoted to instruction decoding, converting a 4-bit opcode into an instruction signal. Although many microprocessors, such as the 6502, use a Programmable Logic Array (PLA) for instruction decoding, the MC14500B doesn't use anything structured like that. Instead, it just has a bunch of gates. First, it decodes pairs of instruction bits (bit 0 with bit 1, and bit 2 with bit 3) into their combinations. Then it combines these signals for the full decoding.

For instance, one signal is generated if instruction bits I3 and I2 are high, by NOR of I3' and I2'. Another signal is generated if I1 is high and I0 is low, by NOR of I1' and I0. Combining these two signals with a NAND gate generates a signal that is low for the SKZ (skip on zero) instruction which has the opcode 1110. This signal is fed into the instruction skip circuitry to implement the instruction. The other instructions are decoded by similar combinations of gates.

Control flow

The MC14500B uses several techniques to provide control flow in programs. Its conditional instruction is SKZ, which skips the next instruction if the RR register is zero. The chip implements the skip instruction by setting a flip-flop if the next instruction should be skipped. If so, the next instruction is overridden by the opcode 0000 through some gates on the instruction pins. This opcode corresponds to the NOP O instruction. This instruction normally energizes the O pin, but the skip circuit suppresses this too. The result is that the skip circuit suppresses the next instruction, and then execution continues.

The chip has opcodes for jump (JMP) and return from subroutine (RTN). These instructions don't do much other than energizing the JMP or RTN pins.12 These operations must be implemented by external circuitry if desired.

The chip provides an unusual technique for implementing larger conditional code blocks. Write operations are controlled by the OEN (Output ENable) flip-flop. To suppress a block of code, the OEN flip-flop can be cleared. The code will still be executed, but it won't have any effect since the output is disabled, so it acts like an IF-THEN block. Similarly, the IEN (Input Enable) instruction will disable the input. These instructions provide conditional execution even if the hardware isn't implemented for the jump instruction.

Finally, a recommended implementation is to wire the F pin to the program counter's reset line. Then, the NOP F instruction will cause the program counter to return to the start of the code. This permits a processing loop to be implemented very simply.

Conclusion

The MC14500B processor is simple enough that its circuitry can be reverse-engineered and understood. To summarize the chip's operations, it takes a 4-bit instruction, which is stored in the instruction register (four flip-flops) and then decoded (using a large number of gates). A logic instruction takes a value from the RR register and the data pin. The "Logic Unit" uses three complex gates in a clever arrangement to perform the selected Boolean operation, and the result is stored back in the RR register. The processor has other flip-flops to handle write operations and other instructions. Execution is controlled by the on-chip clock.

The MC14500B is an unusual processor, handling just one bit of data, while off-loading functionality such as the program counter. Although a one-bit processor might seem like a joke at first, it had genuine uses for implementing logic-based industrial controllers. It seems like an evolutionary dead-end, though. Larger 4-bit and 8-bit microcontrollers were very popular, while the MC14500B was a niche product.

Thanks to David of Usagi Electric for driving the MC14500B analysis project and thanks to John McMaster for decapping the chips and creating the MC14500B images (CC BY 4.0). I first heard about the MC14500B from jonsen back in 2013. I announce my latest blog posts on Twitter, so follow me @kenshirriff. I also have an RSS feed.

Notes and references

  1. You might wonder why the MC14500B ends with a "B". Motorola used the B suffix to indicate a buffered chip, while UB indicated an unbuffered chip. For instance, the MC14001UB NOR gate took its output directly from the NOR circuit, while the MC14001B added a double-inverter buffering stage to the output. The buffering stage provided better noise immunity, but the unbuffered chips were faster and better for semi-analog circuits such as oscillators. (See Motorola 1978 Databook p4-3 for discussion.) Motorola was often haphazard with using the B suffix on their part numbers. However, Motorola also used MC14500 to refer to a family of assorted CMOS chips, so the "B" was necessary to distinguish between the MC14500 family and the MC14500B processor chip. Thus, Motorola never used MC14500 without the suffix to refer to the processor chip, as far as I can tell.

    Also see A Strong Commitment to Complementary MOS (1972), an interesting article from when CMOS was just starting its growth, and it was unclear how successful it would be. In this article, Motorola described its line of CMOS products, which it called "McMOS". Motorola had the 14000 series of standard parts, and the 14500 series for newer, more complex designs. Thus, the 14500 series included parts such as the MC14501 triple gate, MC14514 latch/decoder, and the MC14518 dual up counters.

    For more information on the MC14500B processor, see the datasheet and the detailed MC14500B handbook

  2. Note that the MC14500B is not a bit-slice processor, but intended for 1-bit applications. The idea of a bit-slice processor is that the processor chip is designed for 2 or 4 bits, for example, but you combine multiple chips to build a 16-bit processor, for instance. Each processor handles a slice of the complete word. Many systems were built in the 1970s with bit-slice processors such as the Am2901. Bit-slice processors were popular when an individual chip couldn't hold the circuitry for a complete processor, so it needed to be partitioned across multiple chips. 

  3. The table below gives the full instruction set for the MC14500B. It uses a 4-bit instruction so it has 16 different instructions.

    The MC14500B instruction set. From the datasheet.

    The MC14500B instruction set. From the datasheet.

     

  4. The MC14500B omitted a lot of circuitry that you expect to find in a processor, requiring it to be implemented separately. While the chip has opcodes for Jump and Subroutine Return, it doesn't do anything other than activate an external pin for those instructions; supporting subroutines required an external chip. The chip has a single data in/out pin; that pin is typically connected to multiple I/O devices with an external multiplexer/demultiplexer. Finally, while the chip itself has a 4-bit instruction set, a system typically added more instruction bits to address I/O devices or memory. A full system could use 8-bit instructions with four bits going to the processor and four bits selecting the I/O port or storage location. Alternatively, the system could use a larger program counter, more instruction bits, or external RAM depending on the application.

    Motorola sold chips that could work with the MC14500B to build a complete system. The MC14512 Input Selector could multiplex eight inputs into the processor, while the MC14599B Output Latch provided eight outputs or eight bits of storage. The program counter could be implemented by an MC14516B Program Counter (a 4-bit up/down counter chip) or two. 

  5. The 8-bit HP Nanoprocessor from 1974 was another low-cost, minimal processor. However, it was more complex than the MC14500B with about 10 times as many transistors. The Nanoprocessor included a program counter, subroutine support, and multiple registers. Like the MC14500B, the Nanoprocessor omitted arithmetic operations. 

  6. Early CMOS microprocessors include the 12-bit Intersil 6100 (1974) and the 8-bit RCA 1802 COSMAC (1974). The 1802 is said to be the first CMOS microprocessor. Mainstream microprocessors didn't switch to CMOS until the mid-1980s. 

  7. The MC14500B used metal-gate transistors, with aluminum forming the transistor gate. These transistors were not as advanced as the silicon-gate transistors that were developed in the late 1960s. Silicon gate technology was much better in several ways. First, silicon-gate transistors were smaller, faster, and more reliable. Second, silicon-gate chips had a layer of polysilicon wiring in addition to the metal wiring; this made chip layouts about twice as dense. In comparison, many of the signals on the MC14500B have long, winding paths in the metal layer due to the difficulties of routing with a single metal layer. The Intel 4004 used silicon gates in 1971, so the MC14500B was far behind technologically to use metal gates in 1976. I assume this was done for cost reasons. 

  8. The bias voltage makes the boundary between a transistor and the substrate act as a reverse-biased diode, so current can't flow across the boundary. Specifically, for a PMOS transistor, the N-silicon substrate is connected to +5 volts. For an NMOS transistor, the P-silicon well is connected to ground. A P-N junction acts as a diode, with current flowing from P to N. But the bias voltages put P at ground and N at +5, blocking any current flow. The result is that the substrate can be considered an insulator, with current restricted to the N+ and P+ doped regions (to simplify a bit). 

  9. More complex gates can be created by combining transistors in series and parallel. The AND-OR-INVERT gate below is an example. Because of the way CMOS works, this gate is constructed as a single gate (rather than three); it's no more difficult to make it than a NAND gate. CMOS gates, however, require inversion on the output, so you can't make an AND or OR gate directly.

    An AND-OR-INVERT gate implemented with CMOS.

    An AND-OR-INVERT gate implemented with CMOS.

     

  10. The current that a transistor can provide is proportional to the ratio between the gate length (the distance between the source and drain) and the gate width. The diagram below shows two transistors, a strong one on top and a weak one below. The strong transistor has a wide gate so it can provide a relatively high current. (Imagine slicing the transistor into parallel transistors, each one providing current.) The weak transistor has a narrow, but long gate, so it provides a much smaller current. (Think of the current needing to travel a longer distance through the current-blocking gate.) Based on the ratio of sizes, the upper transistor will pass about 16 times as much current as the lower transistor.

    The top transistor is a strong transistor, while the bottom transistor is a weak transistor. This photo shows the silicon after dissolving the metal layer.

    The top transistor is a strong transistor, while the bottom transistor is a weak transistor. This photo shows the silicon after dissolving the metal layer.

     

  11. The processor has a logic unit (LU), not an arithmetic/logic unit (ALU), because it doesn't perform any arithmetic operations. The handbook (chapter 14) explains how you can use the chip's logic instructions to do arithmetic if you really need to; it takes 12 operations to do a 1-bit add, repeated N times for an N-bit addition. 

  12. The return from subroutine instruction (RTN) causes the next instruction to be skipped. The motivation is that external circuitry can push the current address on the stack when doing a subroutine call. When popped for a return, this address will still point to the subroutine call instruction. By skipping the instruction after a return, the MC14500B avoids an infinite loop. (The external circuitry could, of course, increment the return address but that would have required more hardware.)