Reverse-engineering the clock chip in the first MOS calculator

In 1969, Sharp introduced the first calculator built from high-density MOS chips, the QT-8D, followed by the handheld Sharp EL-8, the world's smallest calculator at the time.1 These calculators were high-end products, selling for $345 (about $1800 today). Integrated circuits at the time couldn't fit the entire calculator on one chip, so these calculators contained five ICs: an arithmetic chip, a decimal point chip, a keypad/display chip, a control chip, and a clock chip.

This blog post discusses the clock chip and how it generated the unusual four-phase clock signals required by the calculator. The die photo below, provided by calculator researcher Francois Gueissaz, shows the silicon die of the clock chip. the silicon substrate has a purple tint while the doped, conductive silicon is green. The metal layer on top is white. Around the edges, seven thin bond wires connect the die to the external pins.2 This chip has about 200 transistors and implements just a dozen moderately complex logic gates. While the density of this chip is absurdly low by modern standards, it illustrates the progress of MOS integrated circuits in the late 1960s.

Die photo of the CG2341 clock generator. This photo (and many others) courtesy of Francois Gueissaz.

Die photo of the CG2341 clock generator. This photo (and many others) courtesy of Francois Gueissaz.

Although computers now all use MOS integrated circuits, the path to MOS was rocky, with MOS integrated circuits viewed as slow and unreliable in the 1960s.4 Handheld calculators were a good match for the characteristics of MOS, though: they needed to be compact and lightweight with low power consumption, but computational speed was not important. In 1969, the Japanese calculator company Sharp signed a $30 million deal with Rockwell for this MOS-based calculator chipset, the largest MOS order in history at the time. The five chips were implemented by the Autonetics division of Rockwell.3

The Sharp EL-8 calculator. Note the unusual 8-segment display for the digits. Photo by  Mister rf (CC BY-SA 4.0).

The Sharp EL-8 calculator. Note the unusual 8-segment display for the digits. Photo by Mister rf (CC BY-SA 4.0).

Although the Sharp calculator (above) was handheld, you can see that it was rather thick and chunky, with unusual 8-segment vacuum fluorescent display tubes for its display. The photo below shows the circuit board inside the calculator. The board is dominated by the four large integrated circuits with circular golden lids. These integrated circuits were packaged as 42-pin ceramic ICs with staggered pins. Unlike modern printed circuit boards, the traces on this board are curved, showing its hand-drawn layout.

The circuit board for the Sharp EL-8 calculator. The clock IC is the small metal-can package in the middle. Photo from Mister rf (CC BY-SA 4.0).

The circuit board for the Sharp EL-8 calculator. The clock IC is the small metal-can package in the middle. Photo from Mister rf (CC BY-SA 4.0).

The clock IC is packaged in the small 10-pin metal can, marked with a blurry Rockwell logo (the inset shows the logo). This part number is CG1121 (probably standing for Clock Generator) and is similar to the CG2341 I examined. The date code 7047 indicates this IC was manufactured in the 47th week of 1970, i.e. late November.

The clock integrated circuit was packaged in a 10-pin metal can. The logo on the integrated circuits isn't clear, but it is the Rockwell logo as shown in the inset.

The clock integrated circuit was packaged in a 10-pin metal can. The logo on the integrated circuits isn't clear, but it is the Rockwell logo as shown in the inset.

Cutting the top off the metal can integrated circuit reveals the tiny silicon die. Although the metal can has 10 pins, only seven pins are wired to the die. The metal tab at the top of the photo indicates pin 1 of the integrated circuit.

The metal can of the CG2341 with the lid removed, showing the silicon die inside.

The metal can of the CG2341 with the lid removed, showing the silicon die inside.

Why do the calculator chips require a complex four-phase clock? In 1966, Autonetics invented a technique for building logic circuits called four-phase logic. Unlike standard static logic gates, these logic gates held values dynamically using the capacitance of the wiring. The four-phase clock stepped the gates through sequences of precharging and then computing the logic function. This sounds complicated, but four-phase logic had ten times the density of standard logic gates, as well as using 1/10 the power and having 10 times the speed. As a result, many early high-density MOS chips used four-phase logic.5

Constructing transistors, resistors, and capacitors

Transistors are the key component of the chip. The diagram below shows a metal-gate PMOS transistor, the (somewhat primitive) type of transistor used in this IC. At the bottom, two regions of silicon (green) are doped to make them conductive, forming the source and drain of the transistor The gate is formed by a metal strip between the silicon regions, separated from the silicon by a thin layer of insulating oxide. (These layers—Metal, Oxide, Semiconductor—--give the MOS transistor its name.) The transistor can be considered a switch between the source and drain, controlled by the gate. To simplify the behavior, a PMOS transistor turns on when the gate is pulled negative (-25 volts), while the transistor turns off when the gate is at 0 volts. (These early PMOS transistors required an inconveniently large negative voltage.)

Structure of a PMOS metal-gate transistor.

Structure of a PMOS metal-gate transistor.

The photos below show transistors on the die as they appear under a microscope. The silicon and metal layers match the diagram above; the doped silicon is greenish while the metal layer on top is white. The gate is formed where the metal and silicon overlap, with a faint oval where the oxide is thinned. These transistors are three different sizes: the wider transistors allow higher current. The transistors are carefully sized in the circuits based on the required current.

Three transistors of various sizes, as seen on the die.

Three transistors of various sizes, as seen on the die.

The next important component is the resistor; the photo below shows three resistors. These resistors may look like transistors, and that's because they are transistors. While the transistors above were widened to support more current, these transistors are made longer so the long path reduces the current flow through the transistors. This makes them act as resistors. The metal gate of these transistors is tied to -25 volts, so the transistors are always on, rather than operating as switches.

Resistors of various sizes.

Resistors of various sizes.

The final important component of the integrated circuit is the capacitor. A capacitor is formed by using metal for one plate and doped silicon (green) for the other plate, separated by the insulating oxide layer. The photo below shows two small capacitors and one large capacitor, at the same scale. The large capacitor is used in the output circuitry; the metal stripes above and below it are transistors that drive it.

Two small capacitors and one very large capacitor.

Two small capacitors and one very large capacitor.

Implementing an inverter and NAND gate

With these components, logic gates can be constructed. The schematic below shows how an inverter is implemented in the IC. The layout of the schematic matches the die image underneath, so hopefully the transistors and capacitor can be recognized. If the input is low, the input transistor turns on, pulling the output to ground (i.e. high). If the input is high, the input transistor turns off and the "bootstrap load", the tricky circuit on the right pulls the output to -25V (i.e. low). Thus, the circuit inverts the input.

An inverter using a bootstrap load.

An inverter using a bootstrap load.

Conceptually, you can think of the bootstrap load as a pull-down resistor. The implementation is complex to compensate for the poor characteristics of transistors at the time. The capacitor acts as a charge pump, providing a necessary voltage boost when the circuit switches. (For more details on bootstrap loads, see my earlier article.)

The implementation of a NAND gate is similar to the inverter above, but with multiple input transistors in parallel. If any input is low, the corresponding input transistor turns on, pulling the output to ground (i.e. high), as required by a NAND gate.

The NAND delay gate

The die photo below shows the functional blocks of the clock chip. Eight NAND gates (red) form an oscillating 4-bit shift register. Four gates (yellow) generate the four-phase clock signals from the shift register outputs. Finally, four output driver circuits (orange) amplify these signals to produce high-current outputs.

The clock chip die with key components labeled.

The clock chip die with key components labeled.

The main building block of the clock chip is a NAND gate that has a delay when its output goes low. This delay creates the timing of the clock signal.6 The diagram below shows how the gate is constructed; the schematic corresponds to the layout of the circuit on the die. The delay makes this circuit somewhat complex and partially analog, but I'll try to explain it.

The NAND delay gate uses an R-C circuit to provide the delay. For simplicity, the bootstrap load is represented by a resistor.

The NAND delay gate uses an R-C circuit to provide the delay. For simplicity, the bootstrap load is represented by a resistor.

The NAND circuit is in the upper right; two input transistors and a bootstrap load implement the NAND circuit described earlier. The output of the NAND gate goes through a resistor-capacitor circuit. This delays the output as the capacitor slowly charges through the resistor. The speed of the clock is controlled by the bias pin, which sets a threshold voltage. This voltage controls the point in the resistor-capacitor curve when the level switching transistor turns on.7 By lowering the voltage on the bias pin, the transistor switches sooner, increasing the clock speed. The typical clock speed is 60 kHz, a slow clock even compared to early microprocessors, but calculators didn't require much speed.

When the level switching transistor turns on, it pulls the buffer high,8 and driving the inverter's output low. The inverter has a bootstrap load to provide sufficient output current. Finally, the output is fed back to the bias circuit, probably to sharpen the transition and provide hysteresis. To summarize, this complex circuit implements a delayed NAND gate. It is the key functional block of the chip, repeated ten times.

The clock shift register

The clock is built from a 4-stage shift register. The idea is that each stage of the shift register shifts its bit to the right, after a delay. The bit on the right is inverted and shifted into the left side of the shift register. Thus, the shift register implements a ring counter, first shifting in 1's at the left and then shifting in 0's: the bit pattern is 0000, 1000, 1100, 1110, 1111, 0111, 0011, 0001, and back to 0000. This complete cycle corresponds to one 60 kilohertz clock cycle for the calculator.

The schematic below shows how the shift register is built from eight cross-coupled NAND gates with delay, using the circuit described earlier. Each pair of NAND gates forms a latch, storing either a 0 or a 1. The latch outputs are labeled Q0 through Q3 while the inverted outputs are labeled Q0 through Q3. The outputs from each latch are connected to the inputs of the next stage, so the bits are shifted to the right. Note that the wires from the last stage back to the first stage are crossed; this causes the bit to be inverted. Each stage consists of two cross-coupled NAND gates, forming a latch that holds one bit. If the delay is decreased (through the bias pin), the speed of the shift register increases, increasing the clock speed.

The 4-stage shift register.

The 4-stage shift register.

The shift register must be initialized to the proper state, which is the job of the reset gate. When the shift register is powered up, the reset gate initializes the latches to hold zeros by pulling the lower inputs to the latches low.

Output circuit

The output circuitry generates the four clock phase outputs from the shift register values. Two phases come from the last shift register stage and its complement. The other two phases are more complex. An unwired "select" pin selects between two outputs for these pins; presumably this pin was wired in other versions of the clock chip to provide different clock signals for a different calculator. In the normal case, these clock outputs are formed by NANDing together two shift register outputs to produce a shorter pulse.

The output circuit produces four clock outputs from the shift register values.

The output circuit produces four clock outputs from the shift register values.

The photo below shows one of the output buffers. The output signal enters at the left, travels through the buffer circuitry, and exits the chip through the bond wire on the right. The right half consists of two large transistors to provide the high output currents: one transistor pulls the output up to ground, while the other transistor pulls the output down to -25V. The remainder of the circuitry amplifies the small internal signal so it can drive the output transistors. Note the large bootstrap capacitor near the center; it helps drive one of the output transistors. There are also much smaller bootstrap capacitors in the upper left. This output buffer circuit is repeated four times, once for each output pin.

One output buffer as it appears on the die.

One output buffer as it appears on the die.

The output buffer transistors must be large due to an unusual characteristic of four-phase logic. Normal clocked logic uses the clock signals for timing, while the logic gates are connected to power and ground. In four-phase logic, however, the clock signals provide the power for the logic gates; there are no separate power and ground connections. When the gates are precharged and discharged by the clock signals, this provides the power for the gates. Thus, four-phase logic requires relatively high-current clock signals, since they are powering the circuits.9

To see the chip in action, the oscilloscope trace below shows the four clock outputs as measured from the chip. The yellow and blue traces are the main phases; note that the active (low) parts do not overlap. The magenta and green outputs are active during the first part of the yellow and blue phases, respectively. These clocks are used to precharge the logic circuits. (The clock phases match those on Wikipedia's four-phase article, except the polarity is reversed because of the PMOS transistors.)

Oscilloscope trace showing the four output phases from the clock chip.

Oscilloscope trace showing the four output phases from the clock chip.

Conclusion

Rockwell fit a calculator onto five chips, making the handheld calculator possible. However, Texas Instruments, Mostek, and other companies soon fit all the circuitry onto a single chip, creating the calculator-on-a-chip. Selling calculators was highly profitable for a short time and 11 million calculators were sold in the US in 1974. Although calculators sold for hundreds of dollars in 1969, competition and the improvements in technology caused calculator prices to plummet to $15 by 1975. The profit margin collapsed during the "calculator wars"; Texas Instruments alone lost $16 million in 1975.4

Although the calculator market was risky, the massive sales of calculators provided an important boost to MOS chip technology in the early 1970s, and thus the computer industry. In particular, microprocessors started with the Intel 4004, a chip designed for a calculator. And microcontrollers were created out of Texas Instruments' line of calculator chips. While a chip such as the CG2341 clock generator is trivial by modern standards with about 200 transistors, it provides a historical window into how chips were constructed in the early days of MOS ICs.

Thanks to Francois Gueissaz for doing all the hard work of obtaining the calculator ICs, decapping them, and providing me with die photos and other information. I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed.

Notes and references

  1. See this interesting vintage commercial for the Sharp EL-8 calculator for more information. 

  2. Measuring the die photo, I believe this chip uses a 15 µm process, so the transistors and features are very large by modern standards. (This is why five chips were required to implement the calculator.) In comparison, many modern chips use a 14 nm process, so the width of a modern transistor is roughly 1000 times smaller, and the area is roughly a million times smaller. This shows the amazing progress in silicon technology described by Moore's Law. 

  3. It's hard to follow the spin-offs and acquisitions of the companies involved. Autonetics was founded as the research laboratory for North American Aviation in 1945. Among other things, Autonetics developed guidance computers for the Minuteman missile. Although North American Aviation is mostly forgotten now, it was a major aerospace company, building everything from the P-51 Mustang in World War II to the command and service module for the Apollo landing. It merged with Rockwell in 1967, becoming North American Rockwell. In 1970, about 800 employees from Autonetics were split off to form North American Rockwell MicroElectronics to develop and manufacture commercial integrated circuits. This later became Rockwell Semiconductor, then spun off into Conexant, which was later acquired by Synaptics. Rockwell was sold to Boeing in 1996.

    Sharp, on the other hand, started as Hayakawa Metal Works in 1924, eventually being renamed Sharp Corporation in 1970. (The name came from the Ever-Sharp mechanical pencil, one of Hayakawa's early inventions.) Foxconn bought the majority of Sharp in 2016; Foxconn, also known as Hon Hai Precision Industry, is a Taiwanese electronics manufacturer. Although best known for manufacturing the iPhone for Apple, Foxconn is estimated to manufacture 40% of the world's consumer electronics. 

  4. Much of the historical information in this post comes from the books To the Digital Age and History of Semiconductor Engineering. These books provide a detailed look at the rise of MOS integrated circuits. 

  5. One of the main proponents of four-phase logic was Lee Boysel, who founded a company Four-Phase Systems around it. The company built 24-bit computers, which were some of the earliest MOS-based computers. Boysel's EECS presentation describes the advantages of four-phase logic. 

  6. One important characteristic of the delayed NAND gate is that the delay is much larger when the output goes low than when the output goes high. This ensures that the output clock phases do not overlap while active (low). This is necessary for four-phase logic to ensure that logic gates don't conflict with each other. 

  7. The level switching transistor (like other PMOS transistors) will turn on when the gate voltage is lower than the source voltage by Vt (the transistor's threshold voltage). Thus, by controlling the bias voltage on the transistor's source, the transistor can be made to turn on sooner or later, controlling the frequency. 

  8. Note that the buffer circuit is constructed "backward" compared to a standard PMOS inverter. A PMOS inverter has the transistor connected to ground with a load resistor to -25V, while the buffer has the transistor connected to -25V and the load resistor to ground. I think it is constructed this way to shift the voltage levels from the level switching transistor. 

  9. Although the four-phase clocks power the logic gates, the chips also have regular power and ground connections. These power the output pins since the current demands are too large to be reasonably satisfied by the clocks. 

Reverse engineering RAM storage in early Texas Instruments calculator chips

Texas Instruments introduced the first commercial single-chip computer in 1974, combining the CPU, RAM, ROM, and I/O into one chip. This family of 4-bit processors was called the TMS1000.1 A 4-bit processor now seems very limited, but it was a good match for calculators, where each decimal digit fit into four bits. This microcontroller was also used in hand-held games2 and simple control applications such as microwave ovens.3 Since its software was in ROM, the TMS1000 needed to be custom-manufactured for each application, but it was inexpensive and sold for $2-$4 in quantity. It became very popular and was said to be the best-selling "computer on a chip".

TMS-1000 die with key functional blocks labeled. Die photo courtesy of Sean Riddle.

TMS-1000 die with key functional blocks labeled. Die photo courtesy of Sean Riddle.

The die photo above shows the main functional blocks of the TMS1000. One thing that distinguishes the TMS1000 (and most microcontrollers) from regular processors is the "Harvard architecture", where code and data are stored and accessed separately. In the TMS1000, code and data even have different sizes: instructions were 8 bits and stored in a 1-kilobyte ROM, while data was 4 bits and stored in a 64×4 (256-bit) RAM.4 Since the space for RAM was limited, Texas Instruments developed new circuits for RAM. In this blog post, I look at how the TMS1000 and later TI chips implemented their on-chip RAM.

TMS1000 RAM

Dynamic RAM revolutionized memory storage in the early 1970s; its low cost and high density rapidly made magnetic core memory obsolete. Dynamic RAM uses a tiny capacitor to store each bit, with a 0 or 1 represented by a low or high voltage stored in the capacitor. The problem with dynamic RAM is that the charge leaks away after a few milliseconds, so the values need to be constantly refreshed by reading the data, amplifying the voltages, and storing the values back in the capacitors.5 Texas Instruments developed a new dynamic RAM circuit for the TMS1000 to avoid the complexity of an external refresh circuit. Instead, each memory cell uses a clock signal to refresh itself internally.

The diagram below zooms in on the TMS1000 die photo, showing the 16×16 grid of RAM storage cells. The inset at the right shows a single storage cell. This photo shows the chip's metal layer; the transistors are underneath.

Zooming in on the RAM array, and then a single bit of storage.

Zooming in on the RAM array, and then a single bit of storage.

The TMS1000 is constructed from a type of transistor called PMOS, shown below. At the bottom, two regions of silicon (red) are doped to make them conductive, forming the source and drain of the transistor. A metal strip in between forms the gate, separated from the silicon by a thin layer of insulating oxide. (These layers—Metal, Oxide, Semiconductor—give the MOS transistor its name.) The transistor can be considered a switch between the source and drain, controlled by the gate. The metal layer also provides the main wiring of the integrated circuit, although the silicon layer is also used for some wiring.

Structure of a PMOS metal-gate transistor.

Structure of a PMOS metal-gate transistor.

The diagram below shows a closeup of one bit of storage in the TMS1000. The first die photo shows the yellowish metal layer. The metal layer both connects the circuitry and forms the gates of the transistors. The second photo shows the die after the metal has been dissolved with acid to reveal the silicon underneath. The conductive doped silicon appears pinkish, while the transistors are yellow squares. The black spot in the lower left is a via connecting the silicon to the metal above. Since the photo is hard to interpret, I created the diagram at the right, clarifying the components. The five white squares are the transistors, between pink silicon regions. There are also two capacitors (labeled) created by overlapping the metal and silicon.

One bit of RAM storage. The first photo shows the metal layer, the second shows the underlying silicon, and the third illustrates the silicon structures. Die photos from Sean Riddle here.

One bit of RAM storage. The first photo shows the metal layer, the second shows the underlying silicon, and the third illustrates the silicon structures. Die photos from Sean Riddle here.

The schematic below corresponds to the above circuit, with the transistors in their approximate physical locations. To write a bit, the bit is placed on the data I/O line and the address line is activated.8 This turns on transistor Q4 and allows the bit to flow to point A, where it is maintained (temporarily) by the capacitor there. The bit can be read out the same way, by activating the address line. In a typical dynamic RAM chip, each cell consists of just this transistor and capacitor, but the TMS1000 uses the additional transistors to refresh the voltage on the capacitor.

Schematic of a dynamic RAM storage cell in the TMS1000.

Schematic of a dynamic RAM storage cell in the TMS1000.

The TMS1000 refresh circuit is driven by two clock signals, clock phase 1 (Φ1) and clock phase 5 (Φ5).7 Activating clock phase 5 turns on Q3 and allows the bit to flow to point C, the gate of transistor Q1. Large transistor Q1 is the key component of the refresh circuit, as it amplifies the signal C. Next, during clock phase 1, the amplified signal at B flows through Q2, restoring the original bit stored at A. This circuit is repeated 256 times for the 256 bits of RAM storage in the chip. These clock signals are activated at about 80 kilohertz, ensuring the bit is refreshed before it can drain away.

The move to CMOS

CMOS (Complementary MOS) is a type of circuitry that combines NMOS and PMOS transistors to reduce power consumption. In 1978, TI began building CMOS calculator chips, starting with the TP0310 and TP0320 chips.6 These chips were used in calculators such as the TI-30-II (below), TI-35, and TI-50. The switch to CMOS coincided with TI's switch from power-hungry LED or vacuum fluorescent displays (VFD) to low-power LCD (details). These improvements led to better battery life. TI also used CMOS to implement "Constant Memory™", preserving calculator data even when the calculator was off; CMOS's low power consumption meant that the memory could be continuously powered without draining the battery.

The TI-30-II calculator used the TP0320 processor. Photo courtesy of Datamath Calculator Museum.

The TI-30-II calculator used the TP0320 processor. Photo courtesy of Datamath Calculator Museum.

CMOS has a long history, starting with its invention in 1963. RCA did a lot of early development of CMOS, introducing the 4000-series of integrated circuits in 1968 and the first CMOS processor, the RCA 1802, in 1974. RCA was unfortunately a decade too early for market success with CMOS; although CMOS's lower power consumption made it useful for niche aerospace markets, NMOS processors dominated the microprocessor industry. Eventually, however, mainstream microprocessors switched to CMOS with the Intel 80386 in 1985 and Motorola's 68030 in 1987, and CMOS is the dominant technology today.

TI's move from metal-gate PMOS to CMOS in 1978 is unusual. Other manufacturers (such as Intel) switched from metal-gate transistors to the much superior silicon-gate transistors around 1971, and then moved from PMOS to NMOS around 1974. It's unclear why Texas Instruments continued using inferior metal-gate PMOS circuitry for several years; perhaps calculators didn't need the improved performance so it wasn't cost-effective to switch. But then Texas Instruments skipped over the NMOS generation entirely, jumping to CMOS a decade before the mainstream microprocessor industry. This decision is easier to justify, since low-power CMOS was a clear advantage for battery-powered calculators. Curiously, TI continued to use inferior metal-gate transistors, even after moving to CMOS.

This history illustrates that technological progress isn't a straightforward path with new and improved technologies replacing older technologies. Instead, a new technology like CMOS may take years to catch on, becoming successful in particular markets but being not making headway in other markets until economic factors and engineering tradeoffs changed.

Getting back to the TP0320, the die photo below shows the TP0320 die, zooming in on the RAM array. This 32×24 array holds 768 bits, a significant upgrade from the TMS1000. The closeup at the right zooms in on a single bit. The bit cell has a different layout from the TMS1000 RAM. The design switched from dynamic RAM to static RAM, eliminating the capacitors and the need for refresh. In this section, I'll explain how this RAM cell is implemented.

Die of the TMS-0320, zooming in on the 32×24 RAM array and a single storage cell. Original die photo from Sean Riddle.

Die of the TMS-0320, zooming in on the 32×24 RAM array and a single storage cell. Original die photo from Sean Riddle.

The diagram below shows how two inverters can be connected in a loop to store either a 0 or a 1. If the upper signal is 1, the inverter on the right outputs a 0 on the bottom, and the inverter on the left outputs a 1 at the top, reinforcing the original signal. Alternatively, the top signal can be a 0 as shown on the righ. The key difference between this static circuit and the previous dynamic circuit is that the static circuit will hold a bit for an arbitrarily long time. The bit won't leak out of a capacitor as in a dynamic RAM, so refresh is not needed.

Two cross-coupled inverters can store either a 0 or a 1.

Two cross-coupled inverters can store either a 0 or a 1.

To make a usable storage cell, an addressing mechanism is added to the inverter circuit above. When the address select line is activated, the transistors connect the inverters to the data lines. For a read, the value of the cell is read from the data line. For a write, the desired bit and its complement are applied to the data lines, overpowering the value stored in the inverters and switching them to the new bit value. This type of storage cell is used to implement registers in many processors, including the Zilog Z80 and the Intel 8085.

To make a usable storage cell, transistors are added to select the cell.

To make a usable storage cell, transistors are added to select the cell.

The diagram below shows how a CMOS inverter is constructed from two transistors. The upper transistor is a PMOS transistor, while the lower transistor is an NMOS transistor. With a 0 input, the PMOS transistor turns on, connecting the output to the positive voltage (1). With a 1 input, the NMOS transistor turns on, connecting the output to ground (0). Thus, the output is the opposite of the input, as you'd expect from an inverter.

A CMOS inverter is built from an NMOS transistor and a PMOS transistor.

A CMOS inverter is built from an NMOS transistor and a PMOS transistor.

Putting this all together yields the schematic below. Transistors Q1 and Q3 implement one inverter, while transistors Q2 and Q4 implement the second inverter. Transistors Q5 and Q6 select the cell based on the address. The transistors are arranged on the schematic to match their physical locations.

Schematic of one bit of storage in the TP0320 chip.

Schematic of one bit of storage in the TP0320 chip.

The die photos below show how the storage cell is implemented in the TP0320 processor. The first photo shows three vertical metal traces that wire the cell together. In the second photo, the metal was removed with acid to reveal the silicon underneath. The upper section holds the PMOS transistors (Q1 and Q2) while the lower section holds the NMOS transistors (Q3 to Q6). The transistors appear as whitish rectangles, while the doped silicon appears as greenish or reddish lines. The black spots are vias connecting the silicon to the metal above. The diagram can be compared with the schematic above.

One RAM cell in the TP0320. The first photo shows the metal layer. The second photo shows the underlying silicon. The diagram shows the combined layers. Die photos courtesy of Sean Riddle.

One RAM cell in the TP0320. The first photo shows the metal layer. The second photo shows the underlying silicon. The diagram shows the combined layers. Die photos courtesy of Sean Riddle.

The photo below zooms out a bit to show how the NMOS and PMOS transistors are arranged. Note the "P ring" that surrounds the NMOS transistors. This forms a tub of P-type silicon that holds the NMOS transistors. (This P ring is the horizontal green line below Q2 in the die photo above.) The chip contains many of these tubs, separating the PMOS and NMOS transistors.

The NMOS transistors are located in a P-type "tub" surrounded by a ring of P-type silicon.

The NMOS transistors are located in a P-type "tub" surrounded by a ring of P-type silicon.

TP0456

In 1981, Texas Instruments introduced a more powerful architecture, the TP0455, followed shortlly by the TP0456. The TP0456 chip was used in calculators such as the TI-55-II scientific calculator, TI-35, and TI-60, as well as educational toys such as Little Professor and Spelling B.

The Texas Instruments Little Professor. Photo courtesy of Datamath Calculator Museum.

The Texas Instruments Little Professor. Photo courtesy of Datamath Calculator Museum.

The die photo below shows the TP0456. The RAM array is in the lower-left corner of the die photo below, while the ROM is in the lower-right. The TP0456's RAM array is 32 cells wide and 16 cells tall, providing 512 bits of storage, less than the 768 bits of the TP0320.

Die photo of a TP0456 as used in the TI-55-II calculator; the calculator uses two TP0456 chips. Die photo courtesy of Sean Riddle.

Die photo of a TP0456 as used in the TI-55-II calculator; the calculator uses two TP0456 chips. Die photo courtesy of Sean Riddle.

The TP0456 uses almost the same static cell structure as the earlier CMOS chips, but the layout was changed slightly. In particular, the select line runs between the two inverter lines, rather than on the side. I don't know why they made this change, as it doesn't appear to change the density. The static RAM circuit is same as the TP0320 described earlier, so I won't discuss it here.

Two RAM cells in the TP0456. The long vertical select lines run between the shorter inverter lines, unlike the layout of earlier cells.

Two RAM cells in the TP0456. The long vertical select lines run between the shorter inverter lines, unlike the layout of earlier cells.

Conclusion

While RAM storage may seem trivial, early microcontrollers required new ways to fit storage into the limited space on a die. Even just 256 bits took up a substantial fraction of the chip. Texas Instruments developed new dynamic RAM circuits for the TMS1000 microcontroller, followed by a completely different static circuit when they switched to CMOS microcontrollers.

Decades later, microcontrollers still have limited memory capacity. The Arduino Uno, for example, has 32 kilobytes of flash for program storage and 2 kilobytes of RAM. Modern high-end microcontrollers can have megabytes for program storage and hundreds of kilobytes of RAM, but this is still orders of magnitude less than a typical microcomputer. The constraints of fitting everything onto a single chip still limit capacity and still require novel solutions, just as in the TMS1000.

I announce my latest blog posts on Twitter, so follow me at kenshirriff. I also have an RSS feed. Thanks to Joerg Woerner at Datamath for suggesting this topic and thanks to Sean Riddle for die photos.

Notes and references

  1. Texas Instruments is considered the inventor of the microcontroller for developing the TMS0100 (different from the TMS1000) in 1971. While the TMS0100 has the characteristics of a microcontroller, it was marketed as a "calculator-on-a-chip". The TMS1000, however, was marketed as a "single-chip computer" for both calculator-type applications and small to medium control applications. 

  2. Some handheld games using the TMS1000 are listed here

  3. The architecture of the TMS1000 is rather unusual due to its roots as a calculator chip. It has just four input lines, designed to be connected to a grid of buttons. The outputs are also unusual: it has 8 "O" output lines, but these are not individually controllable. Instead, a 5-bit value is converted to the eight outputs by a customizable PLA decoder. The motivation behind this is to drive a 7-segment display. The microcontroller also has 11 "R" outputs, which are typically used to multiplex the LED display and to scan the keyboard. Another curious feature of the TMS1000 is that the instruction set was somewhat customizable.

    In comparison, Intel's microcontrollers such as the popular 8048 (1976) and 8051 (1980) were much more like standard 8-bit microprocessors. Unlike the TMS1000, the Intel microcontrollers had familiar features such as an 8-bit CPU, 8-bit I/O ports, interrupts, a stack, and a fixed instruction set with Boolean operations (AND, OR, XOR) and shifts. Looking at the TMS1000 instruction set, it seems slightly alien, while the 8048's instruction set is similar to microprocessors of the time. 

  4. Detailed information on the TMS1000 is in the TMS1000 manual

  5. Dynamic RAM is sometimes used for register storage in a processor, such as the Intel 8008, although static RAM is more common since it doesn't require refreshing. 

  6. The Datamath Calculator Museum has tons of information on Texas Instruments calculators. The list of ICs is particularly relevant. 

  7. The TMS1000 is implemented with complex logic circuitry, using a five-phase clock. The TMS1000 uses a mixture of depletion loads, gated loads, or precharge logic, for power savings. I'm not sure why the TMS1000 uses a five-phase clock. Four-phase logic was a logic design methodology at the time, but the TMS1000 circuitry doesn't appear to use four-phase principles. Among other things, the TMS1000 phases are irregular and Φ4 pulses twice per cycle. 

  8. TI's Random access memory cell patent (1974) describes the memory cell used in the TMS1000. The layout in the patent is similar but not identical to the actual layout. Transistor Q5 appears in the circuit but not the patent. It pulls point B to 0 when clock phase 5 is active, making sure that a 0 bit at C is restored to a stronger 0 bit.

    Diagram of a dynamic RAM cell, based on the Random Access Memory Cell Patent.

    Diagram of a dynamic RAM cell, based on the Random Access Memory Cell Patent.

    While most patents don't provide much useful information, Texas Instruments' calculator patents are unusually detailed and informative, providing schematics, source code, and clear explanations; they seem like they were written by engineers. (I feel that I should give TI credit for the quality of their patents.) 

Reverse-engineering the classic MK4116 16-kilobit DRAM chip

Back in the late 1970s, the most popular memory chip was Mostek's MK4116, holding a whopping (for the time) 16 kilobits. It provided storage for computers such as the Apple II, TRS-80, ZX Spectrum, Commodore PET, IBM PC, and Xerox Alto as well as video games such as Defender and Missile Command. To see how the chip is implemented I opened one up and reverse-engineered it. I expected the circuitry to be similar to other chips of the era, using standard NMOS gates, but it was much more complex than I expected, built from low-power dynamic logic. The MK4116 also used advanced manufacturing processes to fit 16,384 high-density memory cells on the chip.12

I created the die photo below from multiple microscope images. The white lines are the metal wiring on top of the chip, while the silicon underneath appears dark red. The two large rectangular regions are the 16,384 memory cells, arranged as a 128×128 matrix split in two. In between the two memory arrays are the amplifiers and selection circuits. The control and interface circuitry is at the left and right, connected to the external pins via tiny bond wires.

Die photo of the 4116 memory chip. Click for a larger image.

Die photo of the 4116 memory chip. Click for a larger image.

In dynamic RAM, each bit is stored in a capacitor with the bit's value, 0 or 1, represented by the voltage on the capacitor.3 The advantage of dynamic RAM is that each memory cell is very small, so a lot of data can be stored on one chip.4 The downside of dynamic RAM is that the charge on a capacitor leaks away after a few milliseconds. To avoid losing data, dynamic RAM must be constantly refreshed: bits are read from the capacitors, amplified, and then written back to the capacitors. For the MK4116, all the data must be refreshed every two milliseconds.

The diagram below illustrates four of the 16,384 memory cells. Each memory cell has a capacitor, along with a transistor that connects the capacitor to the associated bit line. To read or write data, a row select line is energized, turning on the transistors in that row. The row's capacitors are connected to the bit lines, allowing the bits in that row to be accessed.

Structure of the memory cells, based on the patent.

Structure of the memory cells, based on the patent.

One of Mostek's key innovations was to multiplex the address pins.6 Earlier memory chips used a separate pin for each address bit; as memory sizes increased, so did the number of address pins. This forced Intel's 4096-bit memory chip, for instance, to use a large, more costly 22-pin package.5 Mostek cut the number of address pins in half by using each address pin twice, first for a "row" address, and then a "column" address. This approach became the industry standard, allowing memory chips to fit into inexpensive 16-pin packages.

Externally, the chip stores a single bit for 16,384 different addresses. (Typically, eight of these chips were used in parallel to store bytes.) Internally, however, the chip is implemented as a 128×128 matrix of storage cells. The row address selects a row of 128 cells7 and then the column address selects one of these 128 cells to read or write.8 Meanwhile, the entire row of 128 cells is refreshed by amplifying the signals and storing them back in the capacitors.

The 4116 die with key blocks labeled. Most of the memory cell area has been cut out.

The 4116 die with key blocks labeled. Most of the memory cell area has been cut out.

The die image above is labeled with the main functional blocks.9 The chip's 16 pins are labeled around the perimeter,10 including the seven address pins (A0-A6). The Row Address Strobe pin (RAS) is used to indicate the row address is ready, while the Column Address Strobe pin (CAS) indicates that the column address is ready. The two memory arrays are in the center; I've cut out most of the cells to keep the diagram compact. The column select circuitry and sense amplifiers are between the two memory arrays. At the right, the row decode circuitry selects a row based on the address pins, while the column address circuitry buffers the address for the column select circuitry. At the left, the clock circuits generate the chip's timing pulses, triggered by the RAS, CAS, and WRITE pins. Finally, the Data Out and Data In pins provide access to the selected data bit.

Memory cell structure

The key to the DRAM chip is the memory storage cell, designed to be as compact as possible. The highly magnified photo below shows some of the storage cells, densely packed together. It's a bit hard to visualize what's going on because the chip is constructed from multiple layers. The bottom layer is the grayish silicon die. On top of the silicon are two layers of polysilicon, a special type of deposited silicon used for transistor gates, capacitors, and wiring. The top layer of the chip is the metal wiring, which was removed for this photo. The photo shows three bit lines in the silicon, with bulb-shaped storage cells connected on either side. Vertical strips of polysilicon (poly 1) over the storage cells implement capacitors: the silicon forms the lower plate, while the polysilicon forms the upper plate. The second layer of polysilicon (poly 2) is arranged in diagonal regions to implement the selection transistors, where square notches in the poly 1 layer allow the poly 2 layer to approach the silicon.

A closeup of the memory chip under the microscope, showing individual storage cells.

A closeup of the memory chip under the microscope, showing individual storage cells.

The cross-section diagram below shows the three-dimensional, layered structure of a memory cell. At the bottom is the silicon (brown); the bit line (dark brown) is made from doped silicon. Above the silicon are the two polysilicon layers (red) and the metal layer (purple), separated by insulating silicon dioxide (gray). At the far left, the poly 1 layer and underlying silicon form a capacitor. In between the capacitor and the bit line, the poly 2 layer forms the gate of the transistor. At the left, the poly 2 layer is connected to the metal of the word line, which turns the transistor on, connecting the capacitor to the bit line.

Cross-section structure of a storage cell. Based on 16K—The new generation dynamic RAM.

Cross-section structure of a storage cell. Based on 16K—The new generation dynamic RAM.

The diagram below illustrates how bits are addressed in the storage matrix. The arrangement is somewhat confusing because columns of cells are offset and interlocked like zippers. A row select line is connected to the centers of diagonal poly 2 regions, so each region controls two transistors on neighboring bit lines. (For instance, in the upper left, the poly region connected to row select 0 forms transistors 0A and 0B.) The result is that each row select line activates 128 cells, one for each bit line in a staggered arrangement.

Arrangement of bits in the matrix. The transistors are labeled according to their corresponding row and bit line.

Arrangement of bits in the matrix. The transistors are labeled according to their corresponding row and bit line.

Low-power circuitry

A key feature of the MK4116 memory chip is that it uses almost no power when it is sitting idle. Although it consumes 462 milliwatts when active, it uses just 20 milliwatts in standby mode. Although low-power circuitry is straightforward to build with modern CMOS technology, the 4116 used earlier NMOS transistors. Most NMOS integrated circuits constructed logic gates with load transistors, a simple technique with the disadvantage of wasting power. Instead, the MK4116 memory chip uses dynamic logic, which is considerably more complex but saves power while idle.

A typical dynamic logic gate (below) operates in two phases. In the first phase, a clock signal turns on the upper transistor, precharging the output to +12 volts, the "1" state. The upper transistor then turns off, but the output remains high due to the capacitance of the wire. In the second phase, the lower transistors can pull the output low. In particular, if either input is 1, the corresponding transistor turns on and pulls the output low, so the circuit implements a NOR gate. This circuit doesn't consume any static power, just a small current to charge the wire capacitance when switching. (The inputs must be carefully timed so they don't overlap with the precharge clock.) The use of dynamic circuitry makes the 4116 much more complex than it would be otherwise since the gates are controlled by clock signals, which need to be generated.

A NOR gate using dynamic logic.

A NOR gate using dynamic logic.

The row select circuitry

The purpose of the row-select circuitry is to decode the 7 address bits and energize the corresponding row select line (out of 128) to read one row of memory. In the first step, 32 5-input NOR gates decode address bits A0 through A4. These NOR gates are implemented in the compact circuit shown below. Each NOR gate takes a different combination of non-inverted and inverted address bits and matches a particular 5-bit address. These NOR gates use dynamic logic, first pulled high and then discharged to ground, except for the selected address which remains high. Next, each NOR output is split into four, based on A5 and A6. The result is that one of 128 row select lines is activated, turning on the transistors for that row in the matrix.

The NOR gates are implemented in several compact blocks; one block of three NOR gates is shown below. Each NOR gate is a horizontal stripe of doped silicon, with ground above and below it. Each NOR gate has transistors (pink stripes) connected to ground alternating above and below it. A transistor will pull the NOR gate low if the connected address line is high. The precharge transistors at the left pull the NOR gates to +12 volts, while the output control transistors control the flow of the decoded outputs to the rest of the circuitry.

Three NOR gates in the row decoder. The vertical yellow strips indicate metal wiring, removed for this photo.

Three NOR gates in the row decoder. The vertical yellow strips indicate metal wiring, removed for this photo.

The small greenish blobs at the end of a transistor gate (pink stripe) are connections (vias) between a transistor gate and an address line. The address lines are represented as vertical yellow stripes (since the metal layer was removed). Note that each transistor gate has an address line at the right and the inverted address line at the left; thus, the NOR gates all have the same basic layout, but with the contacts changed to match a particular address. For instance, the upper NOR gate has transistors connected to A0, A2, A1, A3, and A4, so it will be active for address 00000; any other address will pull it low.

The sense amplifiers

The sense amplifiers are one of the most challenging parts of designing a memory chip. The job of the sense amplifier is to take the tiny voltage from a capacitor and amplify it into a binary 0 or 1.11 The challenge is that even though 12 volts is stored in a capacitor, the signal from the capacitor is very small, is only 100 millivolts or so. (Because the bit line is much larger than the tiny memory cell capacitor, the capacitor causes a very small voltage swing.)12 It is critically important for the sense amplifier to operate accurately, even in the presence of noise or voltage fluctuations, because any error will corrupt the data. The sense amplifier circuit must also be compact and low power since there are 128 sense amplifiers.

The 128 sense amplifiers are in the middle of the die, between the upper and lower memory arrays.

The 128 sense amplifiers are in the middle of the die, between the upper and lower memory arrays.

The chip's 128 sense amplifiers, one for each column, are located between the two memory arrays as shown above. During a read, 128 values in a row are accessed in parallel and amplified by the sense amplifiers. These 128 values are then written back to refresh the values in the capacitor. For a write operation, one of the bits is updated with the new value before they are written back.

The sense amplifier as it appears on the die, and corresponding schematic.

The sense amplifier as it appears on the die, and corresponding schematic.

Each sense amplifier (above) is a very simple circuit. It takes two inputs and compares them, pulling the lower one to 0.13 It is built from two cross-coupled transistors, each trying to pull the other one low. Whichever transistor has the higher voltage to start with will "win", forcing the other side low.14 The sense amplifier is sensitive to very small voltage differentials, allowing it to distinguish the small signals from a storage cell.

Locating the sense amplifiers between the two memory arrays isn't arbitrary, but the key to their operation: this is the "divided bit line" architecture introduced in 1972. The idea is that one input to the sense amp is the voltage from the desired memory cell, while the other input is a threshold voltage from a "dummy cell" in the opposite memory array. Dummy cells are constructed and precharged like real memory cells except the capacitor is half-sized, so they provide a voltage midway between a 0 bit and a 1 bit.3 If the voltage from the real memory cell is lower, the sense amp outputs a 0, and if higher, it outputs a 1.

Dummy cells provide the threshold voltage for deciding if a bit is 0 or 1. The dummy cells are located at the top and bottom of the memory arrays. They are on the same bit lines as real memory cells.

Dummy cells provide the threshold voltage for deciding if a bit is 0 or 1. The dummy cells are located at the top and bottom of the memory arrays. They are on the same bit lines as real memory cells.

The dummy cells are located on the edges of the memory arrays, as shown above. They consist of capacitors and transistors (similar to real memory cells), but with a separate line to charge them. The advantage of the dummy cell approach is that manufacturing differences or fluctuations during operation will (hopefully) affect the real cells and dummy cells equally, so the voltage from the dummy cell will remain at the correct level to distinguish beween a 0 and a 1. Address bit A0 controls which half of the array provides real data to the bit lines and which half connects dummy cells to the bit lines.

The column select circuitry

The purpose of the column select circuitry is to select one column out of the 128-bit row; this is the bit that is read or written. Each column select circuit is twice as wide as a memory cell, so they only decode one of 64 columns. The result is that two bits are selected at a time, and circuitry elsewhere selects one of the two bits. Like the row select circuitry, the column select circuitry is implemented by numerous NOR gates, each matching one address. For column select address bits A0 through A5 select one of 64 lines, selecting two columns at a time. These two bit lines are connected to data lines transmitting the signals to the I/O circuitry. (Since the bit lines for the upper and lower halves of the matrix are separate, there are actually four bit lines selected by the column select circuit.) As with the row select circuitry, dynamic logic is used, controlled by various timing signals. Note that each NOR gate is physically split into two parts with the sense amp in the middle.

Footnote 15

Five of the column decoders, with one highlighted.

Five of the column decoders, with one highlighted.

The schematic below shows how the column decoder works with the sense amplifier. The diagram shows two bit lines and the top half of the column decoder and sense circuitry; it is mirrored for the lower array. At the top, the sense precharge circuit pulls all the bit lines high. At the bottom, the sense amplifiers amplify and refresh the signals as explained above. The column decoder matches a particular 6-bit address, so one of the 64 decoders will activate the associated sense select circuit, connecting the chip's I/O circuitry to four bit lines (two from the upper memory array as shown here and two from the lower memory array).

Schematic of half the column decoder and sense amplifier.

Schematic of half the column decoder and sense amplifier.

At this point, four bit lines have been selected for use and their signals are passed to the input/output circuitry; the column select circuitry only decoded 1-of-64, while there are 128 columns, and each half of the array has separate bit lines. Column address bit A6 provides the final selection between the two columns. The selected bit is sent to the data-out pin for a read. For a write, the value on the data-in pin is sent back through the appropriate wire to overwrite the value in the sense amplifier. This circuitry is implemented using dynamic logic and latches, controlled by various timing signals. Much of the circuitry is duplicated, with one copy for the upper half of the memory array and one copy for the lower half. Row address bit A0 distinguishes which half of the matrix is active and which half is providing dummy data). (Note that row address bit A0 was already used to select a particular row, but the circuitry has "lost track" of which was the real row and which was the dummy row, so it must make the selection again.)

Clock generation

The chip requires many timing signals for the various steps in a memory operations. The memory chip doesn't use an external clock, unlike a CPU, but generates its own timing signals internally. The diagram below illustrates the clock generators, using buffers to create a delay between each successive clock output. The first set of timing signals is triggered by the row-access strobe (RAS), indicating that the computer has put the row address on the address pins. The next set of timing signals is triggered by the column-access strobe (CAS), indicating the column address is on the address pins. Other timing signals are triggered by the WRITE pin.

Conceptual diagram of the clock generation, from
16K—The new generation dynamic RAM.

Conceptual diagram of the clock generation, from 16K—The new generation dynamic RAM.

The real clock circuitry is much more complex than the diagram indicates, consisting of dozens of transistors in multiple chains, feeding back in complex ways to shape the pulses. (Among other things, using dynamic logic requires each buffer to have both an input that pulls it high and an input that pulls it low, forming almost a circular problem.) These gates are mostly built from large transistors, as shown below, to provide enough current to drive the circuitry, and to increase the gate delay sufficiently. The clock circuitry also uses many capacitors, probably bootstrap loads to pull signals up sharper. I'm not going to describe the clocks in detail since it's a complicated mess.

A small part of the clock circuitry.

A small part of the clock circuitry.

Input pins

The chip uses surprisingly complex circuits for the address pins and the data input pin. Mostek's earlier memory chip had problems due to noise margins on the inputs, so the MK4116 uses a complex circuit with an analog threshold, capacitor drive, and multiple controls and latches.

The diagram below shows the threshold generation circuit, which generates a 1.5-volt reference. It uses many tiny transistors in series to generate the voltage level. Conceptually, it is similar to a resistor divider between power and ground to produce an output voltage. However, resistors are both power-hungry and difficult to build in integrated circuits, so transistors are used instead. Since this circuit is always active, the designers needed to minimize its current; this was achieved by using many transistors in series.

The voltage reference used for the address pins. (×18 indicates 18 transistors in series, for instance.)

The voltage reference used for the address pins. (×18 indicates 18 transistors in series, for instance.)

The voltage on the input pin and the threshold voltage are fed into a differential amplifier/comparator, conceptually similar to the sense amplifiers. Each side tries to pull the other side low, ending up with a 1 for the "winning" side and 0 for the "losing" side. Thus, the input is converted into a binary value. The result from the comparator is stored in a latch. Multiple timing signals gate the input signal, precharge the circuitry, and control the latch.

The data-in circuitry: the pin, latch circuit, and voltage reference. This circuitry is in the lower-left corner of the die.

The data-in circuitry: the pin, latch circuit, and voltage reference. This circuitry is in the lower-left corner of the die.

The photo above shows the input circuit for the data-in pin. Next to the pin's bond wire is the threshold circuit and latch; the two capacitors are the large rectangles of metal. The voltage reference circuit is next; the data-in voltage reference is similar to the address voltage reference described above. (I left the metal layer on for this photo; the polysilicon and silicon underneath is obscured by the oxide layer.)

Conclusion

This memory chip was much more complex than I expected. I studied a simple Intel memory chip earlier so I assumed this DRAM would be larger but not much more complicated. Instead, the MK4116 has complex circuitry with over 1000 transistors controlling it, in addition to the 16,384 transistors for the memory cells and about 1500 transistors for the column selects and sense amps. A cause of the complexity is that the design needed to optimize multiple axes: density, speed, and power efficiency.16

The table below shows that each generation of DRAM chips required substantial technological changes and new developments. Memory designers don't just sit around waiting for Moore's Law to increase the memory capacity; they have to constantly develop new techniques because DRAM storage cells are fundamentally analog. Fortunately, DRAM designers have continued to solve memory scaling problems; 16-gigabit DRAMs recently went into production, an amazing factor of a million larger than the 16-kilobit MK4116 DRAM chip of 1976.

DRAM cell evolution from 4 kilobits to 16 megabits. From Impact of Processing Technology on DRAM Sense Amplifier Design (Gealow, 1990).

DRAM cell evolution from 4 kilobits to 16 megabits. From Impact of Processing Technology on DRAM Sense Amplifier Design (Gealow, 1990).

I announce my latest blog posts on Twitter, so follow me @kenshirriff or my RSS feed. Thanks to Mike Braden for suggesting the MK4332 chip to me.

Notes and references

  1. A brief history of memory innovations is here. For detailed information on DRAM circuits, see this 1990 thesis on sense amplifier design. For history, Storage array and sense/refresh circuit for single-transistor memory cells (1972) introduced the concepts of dummy cells and cross-coupled sense amplifiers. Intel's chip is discussed in A 16 384-Bit Dynamic RAM (1976) while Mostek's chip is discussed in A 16K × 1 bit dynamic RAM (1977) and 16K - The new generation dynamic RAM (1977). Inconveniently, I found most of these references after I had this blog post nearly completed. 

  2. An unusual characteristic of the chip is that it doesn't use "buried contacts". The issue is how to connect a polysilicon wire to a silicon circuit. In integrated circuits of the 1960s, polysilicon couldn't be connected to silicon directly, so a via connected the polysilicon wire to the metal layer, which had a short connection to a second via that connected down to the silicon. In 1968 at Fairchild, Federico Faggin invented the buried contact, a way to connect the polysilicon and silicon directly. This was much more convenient, so all the NMOS chips that I have examined use buried contacts.

    However, the 4116 doesn't use buried contacts. Instead, it uses the obsolete connections through the metal layer. It's a mystery why they did this. Perhaps the metal wiring density was low enough that the additional segments weren't a problem and they could eliminate one masking and processing step. (Another theory is maybe there were patent issues, but I'm not aware of any patent on the buried contact.) But this illustrates that technological progress isn't consistently linear. Even an advanced chip like the 4116 can use obsolete techniques in some areas. 

  3. In the MK4116, a 0 bit is represented by storing 12 volts on the capacitor, while a 1 bit is represented by 0 volts on the capacitor. This is backward from what you might expect, but probably saved an inverter somewhere in the circuitry. To avoid confusion, I ignore this in the text. 

  4. Early dynamic RAMs such as the Intel 1103 used three transistors per cell and used separate lines for reading and writing data. Improvements in memory technology shrunk the circuit to a single transistor and a single data line. Static RAM, in comparison, often requires 6 transistors per bit, but has the advantage of not needing to be refreshed. 

  5. For example, Intel's 2107 4096-bit DRAM required 22 pins, as did the 2101 256×4 static RAM chip. It's ironic that Intel used larger packages for these memory chips because a few years earlier, Intel had steadfastly refused to go beyond 16 pins, forcing the Intel 4004 microprocessor to use a 16-pin package. The 8008 microprocessor was barely allowed 18 pins, when 24 pins would have been more convenient. This made the 8008 slower and harder to use. 

  6. Although multiplexing the address pins might seem trivial, Mostek claims that they bet the company on this idea. The problem is how to implement multiplexing without making memory accesses wait while both parts of the address are loaded. (The time to read memory was a key factor in computer design, so every nanosecond counted.) In Mostek's solution, first the row address is put on the address pins, and the row-access strobe (RAS) is activated. While the chip is reading that row from memory, the computer puts the column address on the address pins and activates the column-access strobe (CAS). By the time the 128 bits of the storage row have been read, the column address is available and the desired bit is selected from the row of 128 bits. In other words, reading of the row is overlapped with loading of the column address, so multiplexing doesn't slow the system. However, careful timing is required to make this multiplexing work; much of the chip is devoted to clock circuitry to generate the necessary timing pulses. 

  7. The RAM chip operates on memory a row at a time, and then selects one entry from the row. This isn't the obvious way to access memory. In comparison, magnetic core memory also holds memory cells (cores) in a matrix, but accesses a single cell using X and Y select lines. A key reason for a DRAM to operate a row at a time is so the entire row can be refreshed at once, dramatically reducing the performance overhead from refresh operations. 

  8. You might wonder if it's possible to read multiple bits from a row without repeating the entire row-read operation. The chip designers thought of that and provided several techniques to boost efficiency. The page-read and page-write functions let you rapidly access multiple bits in a 128-bit page (i.e. row). A read-modify-write sequence lets you read a row, modify bits in it, and write it back without repeating the row-read. A RAS-only refresh operation lets you read and refresh a row without providing a column address. The point of this is that the chip designers implemented clever features so customers could squeeze as much performance out of their memory system as possible. See the datasheet for details. 

  9. The block diagram below shows the main functional blocks of the 4116. Many parts of this block diagram didn't make sense to me until after I had reverse-engineered the chip, such as the clock generator, dummy cells, and "1 of 2 data bus select". Many datasheets present a somewhat abstracted view of how the chip operates, but the 4116 datasheet accurately matches the implementation.

    Block diagram of the 4116 memory chip, from the databook.

    Block diagram of the 4116 memory chip, from the databook.

     

  10. One inconvenient feature of the memory chip is it requires three different voltages: +12 volts, +5 volts, and -5 volts. Almost all the circuitry runs on 12 volts. The 5-volt supply is used only to provide a standard TTL voltage level for the data out pin. The -5 volts is a substrate bias, connected to the underlying silicon die to improve the characteristics of the transistors. Later chips implemented a charge pump circuit to generate the bias voltage, eliminating the need for an external bias voltage. Later memory chips also eliminated the need for +12 volts. This simplified use of the chips, since only a single-voltage power supply was required. A less-obvious benefit is that this made two of the chip's 16 pins available for other uses. Specifically, these pins were used as additional address bits in the next two generations of memory chips, the 64-kilobit and 256-kilobit chips. As a side effect, the address pins are in a somewhat scrambled order, due to the location of the available pins. 

  11. It's not a coincidence that the input to the sense amp is very small, just enough to be reliably amplified. This is a consequence of economics: if the DRAM produced a large voltage difference, the designers would shrink the cells to save money. But if the voltage difference was too small for reliability, the designers would need to increase the cells. The result is a design where the voltage difference is just barely large enough to be reliably amplified by advanced circuitry. (We noticed the same thing when using a vintage 1960s IBM core memory (video); we were just barely able to read the core values. The cause is the same: if the cores had produced nice clean pulses, they were larger than they needed to be.) 

  12. When the capacitor is connected to the bit line, the resulting voltage will depend on the relative capacitances of the capacitor and the bit line. The bit line capacitance is said to be 800 fF, while the storage cell has 40 fF capacitance, for a 20:1 ratio. Thus, the resulting voltage will be very close to the +12V precharge voltage on the bit line, but perturbed a few hundred millivolts. 

  13. The sense amplifier can only pull a signal low, not raise it, so you might wonder where the amplification happens. Both sides are precharged to +12 volts and the memory cell capacitance only pulls the sides down by 100 millivolts or so. The "winning" side will remain very close to 12 volts, while the other side is pulled to 0 by the sense amp. Thus a 1 bit is pulled higher by the precharge, while a 0 bit is pulled lower by the sense amp. 

  14. The diagram below shows the sense amplifier voltages during operation of a prototype DRAM sense amp. First, the two sides of the sense amp are precharged to the same voltage. Next, a DRAM storage node is selected on one side and a dummy node on the other. Note that the voltage difference between the two sides is very small, maybe 200 millivolts. Finally, the difference is amplified, forcing the higher side up and the lower side down. In this case, the storage node held a 1 so it started slightly higher. If it held a 0, it would start slightly lower and the two lines would diverge in opposite directions. The point is that the sense amp takes a very small voltage differential and amplifies it into a large binary signal.

    Voltage diagram for a prototype sense amp (not the 4116). Based on  Storage Array and Sense/Refresh Circuit for Single-Transistor Memory Cells, 1972.

    Voltage diagram for a prototype sense amp (not the 4116). Based on Storage Array and Sense/Refresh Circuit for Single-Transistor Memory Cells, 1972.

    One difference between this sense amp and the MK4116 is that this circuit is precharged to a midpoint voltage, while the MK4116's is precharged to +12 volts. In this sense amp, one signal must be pulled high, while in the MK4116 both signals start near +12V and one is forced low.  

  15. Robert Proebsting, co-founder of Mostek and developer of address multiplexing, has an oral history that provide some information on the 4116. He discusses why the column decoder selects one of 64 columns and the selection between the pair happens earlier. The reason is they wanted the noise from the address lines to be equal on both sides of the sense amp, so they have three address line pairs on each side.  

  16. Intel produced 16,384-bit DRAM chips before Mostek, the 2116 and others, but Mostek's chips beat Intel in the marketplace. Interestingly, the internal structure was completely different from the MK4116. The 2116 contained four memory arrays internally and was structured as two independent 8-kilobit memories. This saved on power since the unused half could be left unpowered during a memory access. Moreover, if a 2116 chip had a manufacturing flaw in one half, Intel repackaged it as an 8-kilobit 2108 chip with either the upper or lower half operational. The user had to set address bit A6 appropriately to get the working half.