Solving the NYTimes Pips puzzle with a constraint solver

The New York Times recently introduced a new daily puzzle called Pips. You place a set of dominoes on a grid, satisfying various conditions. For instance, in the puzzle below, the pips (dots) in the purple squares must sum to 8, there must be fewer than 5 pips in the red square, and the pips in the three green squares must be equal. (It doesn't take much thought to solve this "easy" puzzle, but the "medium" and "hard" puzzles are more challenging.)

The New York Times Pips puzzle from Oct 5, 2025 (easy). Hint: What value must go in the three green squares?

The New York Times Pips puzzle from Oct 5, 2025 (easy). Hint: What value must go in the three green squares?

I was wondering about how to solve these puzzles with a computer. Recently, I saw an article on Hacker News—"Many hard LeetCode problems are easy constraint problems"—that described the benefits and flexibility of a system called a constraint solver. A constraint solver takes a set of constraints and finds solutions that satisfy the constraints: exactly what Pips requires.

I figured that solving Pips with a constraint solver would be a good way to learn more about these solvers, but I had several questions. Did constraint solvers require incomprehensible mathematics? How hard was it to express a problem? Would the solver quickly solve the problem, or would it get caught in an exponential search?

It turns out that using a constraint solver was straightforward; it took me under two hours from knowing nothing about constraint solvers to solving the problem. The solver found solutions in milliseconds (for the most part). However, there were a few bumps along the way. In this blog post, I'll discuss my experience with the MiniZinc1 constraint modeling system and show how it can solve Pips.

Approaching the problem

Writing a program for a constraint solver is very different from writing a regular program. Instead of telling the computer how to solve the problem, you tell it what you want: the conditions that must be satisfied. The solver then "magically" finds solutions that satisfy the problem.

To solve the problem, I created an array called pips that holds the number of domino pips at each position in the grid. Then, the three constraints for the above problem can be expressed as follows. You can see how the constraints directly express the conditions in the puzzle.

constraint pips[1,1] + pips[2,1] == 8;
constraint pips[2,3] < 5;
constraint all_equal([pips[3,1], pips[3,2], pips[3,3]]);

Next, I needed to specify where dominoes could be placed for the puzzle. To do this, I defined an array called grid that indicated the allowable positions: 1 indicates a valid position and 0 indicates an invalid position. (If you compare with the puzzle at the top of the article, you can see that the grid below matches its shape.)

grid = [|
1,1,0|
1,1,1|
1,1,1|];

I also defined the set of dominoes for the problem above, specifying the number of spots in each half:

spots = [|5,1| 1,4| 4,2| 1,3|];

So far, the constraints directly match the problem. However, I needed to write some more code to specify how these pieces interact. But before I describe that code, I'll show a solution. I wasn't sure what to expect: would the constraint solver give me a solution or would it spin forever? It turned out to find the unique solution in 109 milliseconds, printing out the solution arrays. The pips array shows the number of pips in each position, while the dominogrid array shows which domino (1 through 4) is in each position.

pips = 
[| 4, 2, 0
 | 4, 5, 3
 | 1, 1, 1
 |];
dominogrid = 
[| 3, 3, 0
 | 2, 1, 4
 | 2, 1, 4
 |];

The text-based solution above is a bit ugly. But it is easy to create graphical output. MiniZinc provides a JavaScript API, so you can easily display solutions on a web page. I wrote a few lines of JavaScript to draw the solution, as shown below. (I just display the numbers since I was too lazy to draw the dots.) Solving this puzzle is not too impressive—it's an "easy" puzzle after all—but I'll show below that the solver can also handle considerably more difficult puzzles.

Graphical display of the solution.

Graphical display of the solution.

Details of the code

While the above code specifies a particular puzzle, a bit more code is required to define how dominoes and the grid interact. This code may appear strange because it is implemented as constraints, rather than the procedural operations in a normal program.

My main design decision was how to specify the locations of dominoes. I considered assigning a grid position and orientation to each domino, but it seemed inconvenient to deal with multiple orientations. Instead, I decided to position each half of the domino independently, with an x and y coordinate in the grid.2 I added a constraint that the two halves of each domino had to be in neighboring cells, that is, either the X or Y coordinates had to differ by 1.

constraint forall(i in DOMINO) (abs(x[i, 1] - x[i, 2]) + abs(y[i, 1] - y[i, 2]) == 1);

It took a bit of thought to fill in the pips array with the number of spots on each domino. In a normal programming language, one would loop over the dominoes and store the values into pips. However, here it is done with a constraint so the solver makes sure the values are assigned. Specifically, for each half-domino, the pips array entry at the domino's x/y coordinate must equal the corresponding spots on the domino:

constraint forall(i in DOMINO, j in HALF) (pips[y[i,j], x[i, j]] == spots[i, j]);

I decided to add another array to keep track of which domino is in which position. This array is useful to see the domino locations in the output, but it also keeps dominoes from overlapping. I used a constraint to put each domino's number (1, 2, 3, etc.) into the occupied position of dominogrid:

constraint forall(i in DOMINO, j in HALF) (dominogrid[y[i,j], x[i, j]] == i);

Next, how do we make sure that dominoes only go into positions allowed by grid? I used a constraint that a square in dominogrid must be empty or the corresponding grid must allow a domino.3 This uses the "or" condition, which is expressed as \/, an unusual stylistic choice. (Likewise, "and" is expressed as /\. These correspond to the logical symbols ∨ and ∧.)

constraint forall(i in 1..H, j in 1..W) (dominogrid[i, j] == 0 \/ grid[i, j] != 0);

Honestly, I was worried that I had too many arrays and the solver would end up in a rathole ensuring that the arrays were consistent. But I figured I'd try this brute-force approach and see if it worked. It turns out that it worked for the most part, so I didn't need to do anything more clever.

Finally, the program requires a few lines to define some constants and variables. The constants below define the number of dominoes and the size of the grid for a particular problem:

int: NDOMINO = 4; % Number of dominoes in the puzzle
int: W = 3; % Width of the grid in this puzzle
int: H = 3; % Height of the grid in this puzzle

Next, datatypes are defined to specify the allowable values. This is very important for the solver; it is a "finite domain" solver, so limiting the size of the domains reduces the size of the problem. For this problem, the values are integers in a particular range, called a set:

set of int: DOMINO = 1..NDOMINO; % Dominoes are numbered 1 to NDOMINO
set of int: HALF = 1..2; % The domino half is 1 or 2
set of int: xcoord = 1..W; % Coordinate into the grid
set of int: ycoord = 1..H;

At last, I define the sizes and types of the various arrays that I use. One very important syntax is var, which indicates variables that the solver must determine. Note that the first two arrays, grid and spots do not have var since they are constant, initialized to specify the problem.

array[1..H,1..W] of 0..1: grid; % The grid defining where dominoes can go
array[DOMINO, HALF] of int: spots; % The number of spots on each half of each domino
array[DOMINO, HALF] of var xcoord: x; % X coordinate of each domino half
array[DOMINO, HALF] of var ycoord: y; % Y coordinate of each domino half
array[1..H,1..W] of var 0..6: pips; % The number of pips (0 to 6) at each location.
array[1..H,1..W] of var 0..NDOMINO: dominogrid; % The domino sequence number at each location

You can find all the code on GitHub. One weird thing is that because the code is not procedural, the lines can be in any order. You can use arrays or constants before you use them. You can even move include statements to the end of the file if you want!

Complications

Overall, the solver was much easier to use than I expected. However, there were a few complications.

By changing a setting, the solver can find multiple solutions instead of stopping after the first. However, when I tried this, the solver generated thousands of meaningless solutions. A closer look showed that the problem was that the solver was putting arbitrary numbers into the "empty" cells, creating valid but pointlessly different solutions. It turns out that I didn't explicitly forbid this, so the sneaky constraint solver went ahead and generated tons of solutions that I didn't want. Adding another constraint fixed the problem. The moral is that even if you think your constraints are clear, solvers are very good at finding unwanted solutions that technically satisfy the constraints. 4

A second problem is that if you do something wrong, the solver simply says that the problem is unsatisfiable. Maybe there's a clever way of debugging, but I ended up removing constraints until the problem can be satisfied, and then see what I did wrong with that constraint. (For instance, I got the array indices backward at one point, making the problem insoluble.)

The most concerning issue is the unpredictability of the solver: maybe it will take milliseconds or maybe it will take hours. For instance, the Oct 5 hard Pips puzzle (below) caused the solver to take minutes for no apparent reason. However, the MiniZinc IDE supports different solver backends. I switched from the default Gecode solver to Chuffed, and it immediately found numerous solutions, 384 to be precise. (Sometimes the Pips puzzles sometimes have multiple solutions, which players find controversial.) I suspect that the multiple solutions messed up the Gecode solver somehow, perhaps because it couldn't narrow down a "good" branch in the search tree. For a benchmark of the different solvers, see the footnote.5

Two of the 384 solutions to the NYT Pips puzzle from Oct 5, 2025 (hard difficulty).

Two of the 384 solutions to the NYT Pips puzzle from Oct 5, 2025 (hard difficulty).

How does a constraint solver work?

If you were writing a program to solve Pips from scratch, you'd probably have a loop to try assigning dominoes to positions. The problem is that the problem grows exponentially. If you have 16 dominoes, there are 16 choices for the first domino, 15 choices for the second, and so forth, so about 16! combinations in total, and that's ignoring orientations. You can think of this as a search tree: at the first step, you have 16 branches. For the next step, each branch has 15 sub-branches. Each sub-branch has 14 sub-sub-branches, and so forth.

An easy optimization is to check the constraints after each domino is added. For instance, as soon as the "less than 5" constraint is violated, you can backtrack and skip that entire section of the tree. In this way, only a subset of the tree needs to be searched; the number of branches will be large, but hopefully manageable.

A constraint solver works similarly, but in a more abstract way. The constraint solver assigns values to the variables, backtracking when a conflict is detected. Since the underlying problem is typically NP-complete, the solver uses heuristics to attempt to improve performance. For instance, variables can be assigned in different orders. The solver attempts to generate conflicts as soon as possible so large pieces of the search tree can be pruned sooner rather than later. (In the domino case, this corresponds to placing dominoes in places with the tightest constraints, rather than scattering them around the puzzle in "easy" spots.)

Another technique is constraint propagation. The idea is that you can derive new constraints and catch conflicts earlier. For instance, suppose you have a problem with the constraints "a equals c" and "b equals c". If you assign "a=1" and "b=2", you won't find a conflict until later, when you try to find a value for "c". But with constraint propagation, you can derive a new constraint "a equals b", and the problem will turn up immediately. (Solvers handle more complicated constraint propagation, such as inequalities.) The tradeoff is that generating new constraints takes time and makes the problem larger, so constraint propagation can make the solver slower. Thus, heuristics are used to decide when to apply constraint propagation.

Researchers are actively developing new algorithms, heuristics, and optimizations6 such as backtracking more aggressively (called "backjumping"), keeping track of failing variable assignments (called "nogoods"), and leveraging Boolean SAT (satisfiability) solvers. Solvers compete in annual challenges to test these techniques against each other. The nice thing about a constraint solver is that you don't need to know anything about these techniques; they are applied automatically.

Conclusions

I hope this has convinced you that constraint solvers are interesting, not too scary, and can solve real problems with little effort. Even as a beginner, I was able to get started with MiniZinc quickly. (I read half the tutorial and then jumped into programming.)

One reason to look at constraint solvers is that they are a completely different programming paradigm. Using a constraint solver is like programming on a higher level, not worrying about how the problem gets solved or what algorithm gets used. Moreover, analyzing a problem in terms of constraints is a different way of thinking about algorithms. Some of the time it's frustrating when you can't use familiar constructs such as loops and assignments, but it expands your horizons.

Finally, writing code to solve Pips is more fun than solving the problems by hand, at least in my opinion, so give it a try!

For more, follow me on Bluesky (@righto.com), Mastodon (@[email protected]), RSS, or subscribe here.

Solution to the Pips puzzle, September 21, 2005 (hard). This puzzle has regions that must all be equal (=) and regions that must all be different (≠). Conveniently, MiniZinc has all_equal and alldifferent constraint functions.

Solution to the Pips puzzle, September 21, 2005 (hard). This puzzle has regions that must all be equal (=) and regions that must all be different (≠). Conveniently, MiniZinc has all_equal and alldifferent constraint functions.

Notes and references

  1. I started by downloading the MiniZinc IDE and reading the MiniZinc tutorial. The MiniZinc IDE is straightforward, with an editor window at the top and an output window at the bottom. Clicking the "Run" button causes it to generate a solution.

    Screenshot of the MiniZinc IDE. Click for a larger view.

    Screenshot of the MiniZinc IDE. Click for a larger view.

     

  2. It might be cleaner to combine the X and Y coordinates into a single Point type, using a MiniZinc record type

  3. I later decided that it made more sense to enforce that dominogrid is empty if and only if grid is 0 at that point, although it doesn't affect the solution. This constraint uses the "if and only if" operator <->.

    constraint forall(i in 1..H, j in 1..W) (dominogrid[i, j] == 0 <-> grid[i, j] == 0);
    
     

  4. To prevent the solver from putting arbitrary numbers in the unused positions of pips, I added a constraint to force these values to be zero:

    constraint forall(i in 1..H, j in 1..W) (grid[i, j] == 0 -> pips[i, j] == 0);
    

    Generating multiple solutions had a second issue, which I expected: A symmetric domino can be placed in two redundant ways. For instance, a double-six domino can be flipped to produce a solution that is technically different but looks the same. I fixed this by adding constraints for each symmetric domino to allow only one of the two redundant positions. The constraint below forces a preferred orientation for symmetric dominoes.

    constraint forall(i in DOMINO) (spots[i,1] != spots[i,2] \/ x[i,1] > x[i,2] \/ (x[i,1] == x[i,2] /\ y[i,1] > y[i,2]));
    

    To enable multiple solutions in MiniZinc, the setting is under Show Configuration Editor > User Defined Behavior > Satisfaction Problems or the --all flag from the command line. 

  5. MiniZinc has five solvers that can solve this sort of integer problem: Chuffed, OR Tools CP-SAT, Gecode, HiGHS, and Coin-OR BC. I measured the performance of the five solvers against 20 different Pips puzzles. Most of the solvers found solutions in under a second, most of the time, but there is a lot of variation.

    Timings for different solvers on 20 Pip puzzles.

    Timings for different solvers on 20 Pip puzzles.

    Overall, Chuffed had the best performance on the puzzles that I tested, taking well under a second. Google's OR-Tools won all the categories in the 2025 MiniZinc challenge, but it was considerably slower than Chuffed for my Pips programs. The default Gecode solver performed very well most of the time, but it did terribly on a few problems, taking over 15 minutes. HiGHs was slower in general, taking a few minutes on the hardest problems, but it didn't fail as badly as Gecode. (Curiously, Gecode and HiGHS sometimes found different problems to be difficult.) Finally, Coin-OR BC was uniformly bad; at best it took a few seconds, but one puzzle took almost two hours and others weren't solved before I gave up after two hours. (I left Coin-OR BC off the graph because it messed up the scale.)

    Don't treat these results too seriously because different solvers are optimized for different purposes. (In particular, Coin-OR BC is designed for linear problems.) But the results demonstrate the unpredictability of solvers: maybe you get a solution in a second and maybe you get a solution in hours. 

  6. If you want to read more about solvers, Constraint Satisfaction Problems is an overview presentation. The Gecode algorithms are described in a nice technical report: Constraint Programming Algorithms used in Gecode. Chuffed is more complicated: "Chuffed is a state of the art lazy clause solver designed from the ground up with lazy clause generation in mind. Lazy clause generation is a hybrid approach to constraint solving that combines features of finite domain propagation and Boolean satisfiability." The Chuffed paper Lazy clause generation reengineered and slides are more of a challenge.  

A Navajo weaving of an integrated circuit: the 555 timer

The noted Diné (Navajo) weaver Marilou Schultz recently completed an intricate weaving composed of thick white lines on a black background, punctuated with reddish-orange diamonds. Although this striking rug may appear abstract, it shows the internal circuitry of a tiny silicon chip known as the 555 timer. This chip has hundreds of applications in everything from a sound generator to a windshield wiper controller. At one point, the 555 was the world's best-selling integrated circuit with billions sold. But how did the chip get turned into a rug?

"Popular Chip" by Marilou Schultz.
 Photo courtesy of First American Art Magazine.

"Popular Chip" by Marilou Schultz. Photo courtesy of First American Art Magazine.

The 555 chip is constructed from a tiny flake of silicon with a layer of metallic wiring on top. In the rug, this wiring is visible as the thick white lines, while the silicon forms the black background. One conspicuous feature of the rug is the reddish-orange diamonds around the perimeter. These correspond to the connections between the silicon chip and its eight pins. Tiny golden bond wires—thinner than a human hair—are attached to the square bond pads to provide these connections. The circuitry of the 555 chip contains 25 transistors, silicon devices that can switch on and off. The rug is dominated by three large transistors, the filled squares with a pattern inside, while the remaining transistors are represented by small dots.

The weaving was inspired by a photo of the 555 timer die taken by Antoine Bercovici (Siliconinsider); I suggested this photo to Schultz as a possible subject for a rug. The diagram below compares the weaving (left) with the die photo (right). As you can see, the weaving closely follows the actual chip, but there are a few artistic differences. For instance, two of the bond pads have been removed, the circuitry at the top has been simplified, and the part number at the bottom has been removed.

A comparison of the rug (left) and the original photograph (right).
Dark-field image of the 555 timer is courtesy of Antoine Bercovici.

A comparison of the rug (left) and the original photograph (right). Dark-field image of the 555 timer is courtesy of Antoine Bercovici.

Antoine took the die photo with a dark field microscope, a special type of microscope that produces an image on a black background. This image emphasizes the metal layer on the top of the die. In comparison, a standard bright-field microscope produced the image below. When a chip is manufactured, regions of silicon are "doped" with impurities to create transistors and resistors. These regions are visible in the image below as subtle changes in the color of the silicon.

The RCA CA555 chip. Photo courtesy of Tiny Transistors.

The RCA CA555 chip. Photo courtesy of Tiny Transistors.

In the weaving, the chip's design appears almost monumental, making it easy to forget that the actual chip is microscopic. For the photo below, I obtained a version of the chip packaged in a metal can, rather than the typical rectangle of black plastic. Cutting the top off the metal can reveals the tiny chip inside, with eight gold bond wires connecting the die to the pins of the package. If you zoom in on the photo, you may recognize the three large transistors that dominate the rug.

The 555 timer die inside a metal-can package, with a penny for comparison. Click this image (or any other) for a larger version.

The 555 timer die inside a metal-can package, with a penny for comparison. Click this image (or any other) for a larger version.

The artist, Marilou Schultz, has been creating chip rugs since 1994, when Intel commissioned a rug based on the Pentium as a gift to AISES (American Indian Science & Engineering Society). Although Schultz learned weaving as a child, the Pentium rug was a challenge due to its complex pattern and lack of symmetry; a day's work might add just an inch to the rug. This dramatic weaving was created with wool from the long-horned Navajo-Churro sheep, colored with traditional plant dyes.

"Replica of a Chip", created by Marilou Schultz, 1994. Wool. Photo taken at the National Gallery of Art, 2024.

"Replica of a Chip", created by Marilou Schultz, 1994. Wool. Photo taken at the National Gallery of Art, 2024.

For the 555 timer weaving, Schultz experimented with different materials. Silver and gold metallic threads represent the aluminum and copper in the chip. The artist explains that "it took a lot more time to incorporate the metallic threads," but it was worth the effort because "it is spectacular to see the rug with the metallics in the dark with a little light hitting it." Aniline dyes provided the black and lavender colors. Although natural logwood dye produces a beautiful purple, it fades over time, so Schultz used an aniline dye instead. The lavender colors are dedicated to the weaver's mother, who passed away in February; purple was her favorite color.

Inside the chip

How does the 555 chip produce a particular time delay? You add external components—resistors and a capacitor—to select the time. The capacitor is filled (charged) at a speed controlled by the resistor. When the capacitor get "full", the 555 chip switches operation and starts emptying (discharging) the capacitor. It's like filling a sink: if you have a large sink (capacitor) and a trickle of water (large resistor), the sink fills slowly. But if you have a small sink (capacitor) and a lot of water (small resistor), the sink fills quickly. By using different resistors and capacitors, the 555 timer can provide time intervals from microseconds to hours.

I've constructed an interactive chip browser that shows how the regions of the rug correspond to specific electronic components in the physical chip. Click on any part of the rug to learn the function of the corresponding component in the chip.

Click the die or schematic for details...

For instance, two of the large square transistors turn the chip's output on or off, while the third large transistor discharges the capacitor when it is full. (To be precise, the capacitor goes between 1/3 full and 2/3 full to avoid issues near "empty" and "full".) The chip has circuits called comparators that detect when the capacitor's voltage reaches 1/3 or 2/3, switching between emptying and filling at those points. If you want more technical details about the 555 chip, see my previous articles: an early 555 chip, a 555 timer similar to the rug, and a more modern CMOS version of the 555.

Conclusions

The similarities between Navajo weavings and the patterns in integrated circuits have long been recognized. Marilou Schultz's weavings of integrated circuits make these visual metaphors into concrete works of art. This connection is not just metaphorical, however; in the 1960s, the semiconductor company Fairchild employed numerous Navajo workers to assemble chips in Shiprock, New Mexico. I wrote about this complicated history in The Pentium as a Navajo Weaving.

This work is being shown at SITE Santa Fe's Once Within a Time exhibition (running until January 2026). I haven't seen the exhibition in person, so let me know if you visit it. For more about Marilou Schultz's art, see The Diné Weaver Who Turns Microchips Into Art, or A Conversation with Marilou Schultz on YouTube.

Many thanks to Marilou Schultz for discussing her art with me. Thanks to First American Art Magazine for providing the photo of her 555 rug. Follow me on Mastodon (@[email protected]), Bluesky (@righto.com), or RSS for updates.

Why do people keep writing about the imaginary compound Cr2Gr2Te6?

I was reading the latest issue of the journal Science, and a paper mentioned the compound Cr2Gr2Te6. For a moment, I thought my knowledge of the periodic table was slipping, since I couldn't remember the element Gr. It turns out that Gr was supposed to be Ge, germanium, but that raises two issues. First, shouldn't the peer reviewers and proofreaders at a top journal catch this error? But more curiously, it appears that this formula is a mistake that has been copied around several times.

The Science paper [1] states, "Intrinsic ferromagnetism in these materials was discovered in Cr2Gr2Te6 and CrI3 down to the bilayer and monolayer thickness limit in 2017." I checked the referenced paper [2] and verified that the correct compound is Cr2Ge2Te6, with Ge for germanium.

But in the process, I found more publications that specifically mention the 2017 discovery of intrinsic ferromagnetism in both Cr2Gr2Te6 and CrI3. A 2021 paper in Nanoscale [3] says, "Since the discovery of intrinsic ferromagnetism in atomically thin Cr2Gr2Te6 and CrI3 in 2017, research on two-dimensional (2D) magnetic materials has become a highlighted topic." Then, a 2023 book chaper [4] opens with the abstract: "Since the discovery of intrinsic long-range magnetic order in two-dimensional (2D) layered magnets, e.g., Cr2Gr2Te6 and CrI3 in 2017, [...]"

This illustrates how easy it is for a random phrase to get copied around with nobody checking it. (Earlier, I found a bogus computer definition that has persisted for over 50 years.) To be sure, these could all be independent typos—it's an easy typo to make since Ge and Gr are neighbors on the keyboard and Cr2Gr2 scans better than Cr2Ge2. A few other papers [5, 6, 7] have the same typo, but in different contexts. My bigger concern is that once AI picks up the erroneous formula, it will propagate as misinformation forever. I hope that by calling out this error, I can bring an end to it. In any case, if anyone ends up here after a web search, I can at least confirm that there isn't a new element Gr and the real compound is Cr2Ge2Te6, chromium germanium telluride.

A shiny crystal of Cr2Ge2Te6, about 5mm across. Photo courtesy of 2D Semiconductors, a supplier of quantum materials.

A shiny crystal of Cr2Ge2Te6 about 5mm across. Photo courtesy of 2D Semiconductors, a supplier of quantum materials.

References

[1] He, B. et al. (2025) ‘Strain-coupled, crystalline polymer-inorganic interfaces for efficient magnetoelectric sensing’, Science, 389(6760), pp 623-631. (link)

[2] Gong, C. et al. (2017) ‘Discovery of intrinsic ferromagnetism in two-dimensional van der Waals crystals’, Nature, 546(7657), pp. 265–269. (link)

[3] Zhang, S. et al. (2021) ‘Two-dimensional magnetic materials: structures, properties and external controls’, Nanoscale, 13(3), pp. 1398–1424. (link)

[4] Yin, T. (2024) ‘Novel Light-Matter Interactions in 2D Magnets’, in D. Ranjan Sahu (ed.) Modern Permanent Magnets - Fundamentals and Applications. (link)

[5] Zhao, B. et al. (2023) ‘Strong perpendicular anisotropic ferromagnet Fe3GeTe2/graphene van der Waals heterostructure’, Journal of Physics D: Applied Physics, 56(9) 094001. (link)

[6] Ren, H. and Lan, M. (2023) ‘Progress and Prospects in Metallic FexGeTe2 (3≤x≤7) Ferromagnets’, Molecules, 28(21), p. 7244. (link)

[7] Hu, S. et al. (2019) 'Anomalous Hall effect in Cr2Gr2Te6/Pt hybride structure', Taiwan-Japan Joint Workshop on Condensed Matter Physics for Young Researchers, Saga, Japan. (link)

Here be dragons: Preventing static damage, latchup, and metastability in the 386

I've been reverse-engineering the Intel 386 processor (from 1985), and I've come across some interesting circuits for the chip's input/output (I/O) pins. Since these pins communicate with the outside world, they face special dangers: static electricity and latchup can destroy the chip, while metastability can cause serious malfunctions. These I/O circuits are completely different from the logic circuits in the 386, and I've come across a previously-undescribed flip-flop circuit, so I'm venturing into uncharted territory. In this article, I take a close look at how the I/O circuitry protects the 386 from the "dragons" that can destroy it.

The 386 die, zooming in on some of the bond pad circuits. The colors change due to the effects of different microscope lenses. Click this image (or any other) for a larger version.

The 386 die, zooming in on some of the bond pad circuits. The colors change due to the effects of different microscope lenses. Click this image (or any other) for a larger version.

The photo above shows the die of the 386 under a microscope. The dark, complex patterns arranged in rectangular regions arise from the two layers of metal that connect the circuits on the 386 chip. Not visible are the transistors, formed from silicon and polysilicon and hidden beneath the metal. Around the perimeter of this fingernail-sized silicon die, 141 square bond pads provide the connections between the chip and the outside world; tiny gold bond wires connect the bond pads to the package. Next to each I/O pad, specialized circuitry provides the electrical interface between the chip and the external components while protecting the chip. I've zoomed in on three groups of these bond pads along with the associated I/O circuits. The circuits at the top (for data pins) and the left (for address pins) are completely different from the control pin circuits at the bottom, showing how the circuitry varies with the pin's function.

Static electricity

The first dragon that threatens the 386 is static electricity, able to burn a hole in the chip. MOS transistors are constructed with a thin insulating oxide layer underneath the transistor's gate. In the 386, this fragile, glass-like oxide layer is just 250 nm thick, the thickness of a virus. Static electricity, even a small amount, can blow a hole through this oxide layer and destroy the chip. If you've ever walked across a carpet and felt a spark when you touch a doorknob, you've generated at least 3000 volts of chip-destroying static electricity. Intel recommends an anti-static mat and a grounding wrist strap when installing a processor to avoid the danger of static electricity, also known as Electrostatic Discharge or ESD.1

To reduce the risk of ESD damage, chips have protection diodes and other components in their I/O circuitry. The schematic below shows the circuit for a typical 386 input. The goal is to prevent static discharge from reaching the inverter, where it could destroy the inverter's transistors. The diodes next to the pad provide the first layer of protection; they redirect excess voltage to the +5 rail or ground. Next, the resistor reduces the current that can reach the inverter. The third diode provides a final layer of protection. (One unusual feature of this input—unrelated to ESD—is that the input has a pull-up, which is implemented with a transistor that acts like a 20kΩ resistor.2)

Schematic for the BS16# pad circuit. The BS16# signal indicates to the 386 if the external bus is 16 bits or 32 bits.

Schematic for the BS16# pad circuit. The BS16# signal indicates to the 386 if the external bus is 16 bits or 32 bits.

The image below shows how this circuit appears on the die. For this photo, I dissolved the metal layers with acids, stripping the die down to the silicon to make the transistors visible. The diodes and pull-up resistor are implemented with transistors.3 Large grids of transistors form the pad-side diodes, while the third diode is above. The current-limiting protection resistor is implemented with polysilicon, which provides higher resistance than metal wiring. The capacitor is implemented with a plate of polysilicon over silicon, separated by a thin oxide layer. As you can see, the protection circuitry occupies much more area than the inverters that process the signal.

The circuit for BS16# on the die. The green areas are where the oxide layer was incompletely removed.

The circuit for BS16# on the die. The green areas are where the oxide layer was incompletely removed.

Latchup

The transistors in the 386 are created by doping silicon with impurities to change its properties, creating regions of "N-type" and "P-type" silicon. The 386 chip, like most processors, is built from CMOS technology, so it uses two types of transistors: NMOS and PMOS. The 386 starts from a wafer of N-type silicon and PMOS transistors are formed by doping tiny regions to form P-type silicon embedded in the underlying N-type silicon. NMOS transistors are the opposite, with N-type silicon embedded in P-type silicon. To hold the NMOS transistors, "wells" of P-type silicon are formed, as shown in the cross-section diagram below. Thus, the 386 chip contains complex patterns of P-type and N-type silicon that form its 285,000 transistors.

The structure of NMOS and PMOS transistors in the 386 forms parasitic NPN and PNP transistors. This diagram is the opposite of other latchup diagrams because the 386 uses N substrate, the opposite of modern chips with P substrate.

The structure of NMOS and PMOS transistors in the 386 forms parasitic NPN and PNP transistors. This diagram is the opposite of other latchup diagrams because the 386 uses N substrate, the opposite of modern chips with P substrate.

But something dangerous lurks below the surface, the fire-breathing dragon of latchup waiting to burn up the chip. The problem is that these regions of N-type and P-type silicon form unwanted, "parasitic" transistors underneath the desired transistors. In normal circumstances, these parasitic NPN and PNP transistors are inactive and can be ignored. But if a current flows beneath the surface, through the silicon substrate, it can turn on a parasitic transistor and awaken the dreaded latchup.4 The parasitic transistors form a feedback loop, so if one transistor starts to turn on, it turns on the other transistor, and so forth, until both transistors are fully on, a state called latchup.5 Moreover, the feedback loop will maintain latchup until the chip's power is removed.6 During latchup, the chip's power and ground are shorted through the parasitic transistors, causing high current flow that can destroy the chip by overheating it or even melting bond wires.

Latchup can be triggered in many ways, from power supply overvoltage to radiation, but a chip's I/O pins are the primary risk because signals from the outside world are unpredictable. For instance, suppose a floppy drive is connected to the 386 and the drive sends a signal with a voltage higher than the 386's 5-volt supply. (This could happen due to a voltage surge in the drive, reflection in a signal line, or even connecting a cable.) Current will flow through the 386's protection diodes, the diodes that were described in the previous section.7 If this current flows through the chip's silicon substrate, it can trigger latchup and destroy the processor.

Because of this danger, the 386's I/O pads are designed to prevent latchup. One solution is to block the unwanted currents through the substrate, essentially putting fences around the transistors to keep malicious currents from escaping into the substrate. In the 386, this fence consists of "guard rings" around the I/O transistors and diodes. These rings prevent latchup by blocking unwanted current flow and safely redirecting it to power or ground.

The circuitry for the W/R# output pad. (The W/R# signal tells the computer's memory and I/O if the 386 is performing a write operation or a read operation.) I removed the metal and polysilicon to show the underlying silicon.

The circuitry for the W/R# output pad. (The W/R# signal tells the computer's memory and I/O if the 386 is performing a write operation or a read operation.) I removed the metal and polysilicon to show the underlying silicon.

The diagram above shows the double guard rings for a typical I/O pad.8 Separate guard rings protect the NMOS transistors and the PMOS transistors. The NMOS transistors have an inner guard ring of P-type silicon connected to ground (blue) and an outer guard ring of N-type silicon connected to +5 (red). The rings are reversed for the PMOS transistors. The guard rings take up significant space on the die, but this space isn't wasted since the rings protect the chip from latchup.

Metastability

The final dragon is metastability: it (probably) won't destroy the chip, but it can cause serious malfunctions.9 Metastability is a peculiar problem where a digital signal can take an unbounded amount of time to settle into a zero or a one. In other words, the circuit temporarily refuses to act digitally and shows its underlying analog nature.10 Metastability was controversial in the 1960s and the 1970s, with many electrical engineers not believing it existed or considering it irrelevant. Nowadays, metastability is well understood, with special circuits to prevent it, but metastability can never be completely eliminated.

In a processor, everything is synchronized to its clock. While a modern processor has a clock speed of several gigahertz, the 386's clock ran at 12 to 33 megahertz. Inside the processor, signals are carefully organized to change according to the clock—that's why your computer runs faster with a higher clock speed. The problem is that external signals may be independent of the CPU's clock. For instance, a disk drive could send an interrupt to the computer when data is ready, which depends on the timing of the spinning disk. If this interrupt arrives at just the wrong time, it can trigger metastability.

A metastable signal settling to a high or low signal after an indefinite time. This image was used to promote a class on metastability in 1974. From My Work on All Things Metastable by Thomas Chaney.

A metastable signal settling to a high or low signal after an indefinite time. This image was used to promote a class on metastability in 1974. From My Work on All Things Metastable by Thomas Chaney.

In more detail, processors use flip-flops to hold signals under the control of the clock. An "edge-triggered" flip-flop grabs its input at the moment the clock goes high (the "rising edge") and holds this value until the next clock cycle. Everything is fine if the value is stable when the clock changes: if the input signal switches from low to high before the clock edge, the flip-flop will hold this high value. And if the input signal switches from low to high after the clock edge, the flip-flop will hold the low value, since the input was low at the clock edge. But what happens if the input changes from low to high at the exact time that the clock switches? Usually, the flip-flop will pick either low or high. But very rarely, maybe a few times out of a billion, the flip-flop will hesitate in between, neither low nor high. The flip-flop may take a few nanoseconds before it "decides" on a low or high value, and the value will be intermediate until then.

The photo above illustrates a metastable signal, spending an unpredictable time between zero and one before settling on a value. The situation is similar to a ball balanced on top of a hill, a point of unstable equilibrium.11 The smallest perturbation will knock the ball down one of the two stable positions at the bottom of the hill, but you don't know which way it will go or how long it will take.

A metaphorical view of metastability as a ball on a hill, able to roll down either side.

A metaphorical view of metastability as a ball on a hill, able to roll down either side.

Metastability is serious because if a digital signal has a value that is neither 0 nor 1 then downstream circuitry may get confused. For instance, if part of the processor thinks that it received an interrupt and other parts of the processor think that no interrupt happened, chaos will reign as the processor takes contradictory actions. Moreover, waiting a few nanoseconds isn't a cure because the duration of metastability can be arbitrarily long. Waiting helps, since the chance of metastability decreases exponentially with time, but there is no guarantee.12

The obvious solution is to never change an input exactly when the clock changes. The processor is designed so that internal signals are stable when the clock changes, avoiding metastability. Specifically, the designer of a flip-flop specifies the setup time—how long the signal must be stable before the clock edge—and the hold time—how long the signal must be stable after the clock edge. As long as the input satisfies these conditions, typically a few picoseconds long, the flip-flop will function without metastability.

Unfortunately, the setup and hold times can't be guaranteed when the processor receives an external signal that isn't synchronized to its clock, known as an asynchronous signal. For instance, a processor receives interrupt signals when an I/O device has data, but the timing is unpredictable because it depends on mechanical factors such as a keypress or a spinning floppy disk. Most of the time, everything will work fine, but what about the one-in-a-billion case where the timing of the signal is unlucky? (Since modern processors run at multi-gigahertz, one-in-a-billion events are not rare; they can happen multiple times per second.)

One solution is a circuit called a synchronizer that takes an asynchronous signal and synchronizes it to the clock. A synchronizer can be implemented with two flip-flops in series: even if the first flip-flop has a metastable output, chances are that it will resolve to 0 or 1 before the second flip-flop stores the value. Each flip-flop provides an exponential reduction in the chance of metastability, so using two flip-flops drastically reduces the risk. In other words, the circuit will still fail occasionally, but if the mean time between failures (MTBF) is long enough (say, decades instead of seconds), then the risk is acceptable.

The schematic for the BUSY# pin, showing the flip-flops that synchronize the input signal.

The schematic for the BUSY# pin, showing the flip-flops that synchronize the input signal.

The schematic above shows how the 386 uses two flip-flops to minimize metastability. The first flip-flop is a special flip-flop that is based on a sense amplifier. It is much more complicated than a regular flip-flop, but it responds faster, reducing the chance of metastability. It is built from two of the sense-amplifier latches below, which I haven't seen described anywhere. In a DRAM memory chip, a sense amplifier takes a weak signal from a memory cell and rapidly amplifies it into a solid 0 or 1. In this flip-flop, the sense amplifier takes a potentially ambiguous signal and rapidly amplifies it into a 0 or 1. By amplifying the signal quickly, the flip-flop reduces metastability. (See the footnote for details.14)

The sense amplifier latch circuit.

The sense amplifier latch circuit.

The die photo below shows how this circuitry looks on the die. Each flip-flop is built from two latches; note that the sense-amp latches are larger than the standard latches. As before, the pad has protection diodes inside guard rings. For some reason, however, these diodes have a different structure from the transistor-based diodes described earlier. The 386 has five inputs that use this circuitry to protect against metastability.13 These inputs are all located together at the bottom of the die—it probably makes the layout more compact when neighboring pad circuits are all the same size.

The circuitry for the BUSY# pin, showing the special sense-amplifier latches that reduce metastability.

The circuitry for the BUSY# pin, showing the special sense-amplifier latches that reduce metastability.

In summary, the 386's I/O circuits are interesting because they are completely different from the chip's regular logic circuitry. In these circuits, the border between digital and analog breaks down; these circuits handle binary signals, but analog issues dominate the design. Moreover, hidden parasitic transistors play key roles; what you don't see can be more important than what you see. These circuits defend against three dangerous "dragons": static electricity, latchup, and metastability. Intel succeeded in warding off these dragons and the 386 was a success.

For more on the 386 and other chips, follow me on Mastodon (@[email protected]), Bluesky (@righto.com), or RSS. (I've given up on Twitter.) If you want to read more about 386 input circuits, I wrote about the clock pin here

Notes and references

  1. Anti-static precautions are specified in Intel's processor installation instructions. Also see Intel's Electrostatic Discharge and Electrical Overstress Guide. I couldn't find ESD ratings for the 386, but a modern Intel chip is tested to withstand 500 volts or 2000 volts, depending on the test procedure. 

  2. The BS16# pin is slightly unusual because it has an internal pull-up resistor. If you look at the datasheet (9.2.3 and Table 9-3 footnotes), a few input pins (ERROR#, BUSY#, and BS16#) have internal pull-up resistors of 20 kΩ, while the PEREQ input pin has an internal pull-down resistor of 20 kΩ. 

  3. The protection diode is probably a grounded-gate NMOS (ggNMOS), an NMOS transistor with the gate, source, and body (but not the drain) tied to ground. This forms a parasitic NPN transistor under the MOSFET that dissipates the ESD. (I think that the PMOS protection is the same, except the gate is pulled high, not grounded.) For output pins, the output driver MOSFETs have parasitic transistors that make the output driver "self-protected". One consequence is that the input pads and the output pads look similar (both have large MOS transistors), unlike other chips where the presence of large transistors indicates an output. (Even so, 386 outputs and inputs can be distinguished because outputs have large inverters inside the guard rings to drive the MOSFETs, while inputs do not.) Also see Practical ESD Protection Design

  4. The 386 uses P-wells in an N-doped substrate. The substrate is heavily doped with antimony, with a lightly doped N epitaxial layer on top. This doping helped provide immunity to latchup. (See "High performance technology, circuits and packaging for the 80386", ICCD 1986.) For the most part, modern chips use the opposite: N-wells with a P-doped substrate. Why the substrate change?

    In the earlier days of CMOS, P-well was standard due to the available doping technology, see N-well and P-well performance comparison. During the 1980s, there was controversy over which was better: P-well or N-well: "It is commonly agreed that P-well technology has a proven reliability record, reduced alpha-particle sensitivity, closer matched p- and n- channel devices, and high gain NPN structures. N-well proponents acknowledge better compatibility and performance with NMOS processing and designs, good substrate quality, availability, and cost, lower junction capacitance, and reduced body effects." (See Design of a CMOS Standard Cell Library.)

    As wafer sizes increased in the 1990s, technology shifted to P-doped substrates because it is difficult to make large N-doped wafers due to the characteristics of the dopants (link). Some chips optimize transistor characteristics by using both types of wells, called a twin-well process. For instance, the Pentium used P-doped wafers and implanted both N and P wells. (See Intel's 0.25 micron, 2.0 volts logic process technology.) 

  5. You can also view the parasitic transistors as forming an SCR (Silicon Controlled Rectifier), a four-layer semiconductor device. SCRs were popular in the 1970s because they could handle higher currents and voltages than transistors. But as high-power transistors were developed, SCRs fell out of favor. In particular, once an SCR is turned on, it stays on until power is removed or reversed; this makes SCRs harder to use than transistors. (This is the same characteristic that makes latchup so dangerous.) 

  6. Satellites and nuclear missiles have a high risk of latchup due to radiation. Since radiation-induced latchup cannot always be prevented, one technique for dealing with latchup is to detect the excessive current from latchup and then power-cycle the chip. For instance, you can buy a radiation-hardened current limiter chip that will detect excessive current due to latchup and temporarily remove power; this chip sells for the remarkable price of $1780.

    For more on latchup, see the Texas Instruments Latch-Up white paper, as well as Latch-Up, ESD, and Other Phenomena

  7. The 80386 Hardware Reference Manual discusses how a computer designer can prevent latchup in the 386. The designer is assured that Intel's "CHMOS III" process prevents latchup under normal operating conditions. However, exceeding the voltage limits on I/O pins can cause current surges and latchup. Intel provides three guidelines: observe the maximum ratings for input voltages, never apply power to a 386 pin before the chip is powered up, and terminate I/O signals properly to avoid overshoot and undershoot. 

  8. The circuit for the WR# pin is similar to many other output pins. The basic idea is that a large PMOS transistor pulls the output high, while a large NMOS transistor pulls the output low. If the enable input is low, both transistors are turned off and the output floats. (This allows other devices to take over the bus in the HOLD state.)

    Schematic for the WR# pin driver.

    Schematic for the WR# pin driver.

    The inverters that control the drive transistors have an unusual layout. These inverters are inside the guard rings, meaning that the inverters are split apart, with the NMOS transistors in one ring and PMOS transistors in the other. The extra wiring adds capacitance to the output which probably makes the inverters slightly slower.

    These inverters have a special design: one inverter is faster to go high than to go low, while the other inverter is the opposite. The motivation is that if both drive transistors are on at the same time, a large current will flow through the transistors from power to ground, producing an unwanted current spike (and potentially latchup). To avoid this, the inverters are designed to turn one drive transistor off faster than turning the other one on. Specifically, the high-side inverter has an extra transistor to quickly pull its output high, while the low-side inverter has an extra transistor to pull the output low. Moreover, the inverter's extra transistor is connected directly to the drive transistors, while the inverter's main output connects through a longer polysilicon path with more resistance, providing an RC delay. I found this layout very puzzling until I realized that the designers were carefully controlling the turn-on and turn-off speeds of these inverters. 

  9. In Metastability and Synchronizers: A Tutorial, there's a story of a spacecraft power supply being destroyed by metastability. Supposedly, metastability caused the logic to turn on too many units, overloading and destroying the power supply. I suspect that this is a fictional cautionary tale, rather than an actual incident.

    For more on metastability, see this presentation and this writeup by Tom Chaney, one of the early investigators of metastability. 

  10. One of Vonada's Engineering Maxims is "Digital circuits are made from analog parts." Another maxim is "Synchronizing circuits may take forever to make a decision." These maxims and a dozen others are from Don Vonada in DEC's 1978 book Computer Engineering

  11. Curiously, the definition of metastability in electronics doesn't match the definition in physics and chemistry. In electronics, a metastable state is an unstable equilibrium. In physics and chemistry, however, a metastable state is a stable state, just not the most stable ground state, so a moderate perturbation will knock it from the metastable state to the ground state. (In the hill analogy, it's as if the ball is caught in a small basin partway down the hill.) 

  12. In case you're wondering what's going on with metastability at the circuit level, I'll give a brief explanation. A typical flip-flop is based on a latch circuit like the one below, which consists of two inverters and an electronic switch controlled by the clock. When the clock goes high, the inverters are configured into a loop, latching the prior input value. If the input was high, the output from the first inverter is low and the output from the second inverter is high. The loop feeds this output back into the first inverter, so the circuit is stable. Likewise, the circuit can be stable with a low input.

    A latch circuit.

    A latch circuit.

    But what happens if the clock flips the switch as the input is changing, so the input to the first inverter is somewhere between zero and one? We need to consider that an inverter is really an analog device, not a binary device. You can describe it by a "voltage transfer curve" (purple line) that specifies the output voltage for a particular input voltage. For example, if you put in a low input, you get a high output, and vice versa. But there is an equilibrium point where the output voltage is the same as the input voltage. This is where metastability happens.

    The voltage transfer curve for a hypothetical inverter.

    The voltage transfer curve for a hypothetical inverter.

    Suppose the input voltage to the inverter is the equilibrium voltage. It's not going to be precisely the equilibrium voltage (because of noise if nothing else), so suppose, for example, that it is 1µV above equilibrium. Note that the transfer curve is very steep around equilibrium, say a slope of 100, so it will greatly amplify the signal away from equilibrium. Thus, if the input is 1µV above equilibrium, the output will be 100µV below equilibrium. Then the next inverter will amplify again, sending a signal 10mV above equilibrium back to the first inverter. The distance will be amplified again, now 1000mV below equilibrium. At this point, you're on the flat part of the curve, so the second inverter will output +5V and the first inverter will output 0V, and the circuit is now stable.

    The point of this is that the equilibrium voltage is an unstable equilibrium, so the circuit will eventually settle into the +5V or 0V states. But it may take an arbitrary number of loops through the inverters, depending on how close the starting point was to equilibrium. (The signal is continuous, so referring to "loops" is a simplification.) Also note that the distance from equilibrium is amplified exponentially with time. This is why the chance of metastability decreases exponentially with time. 

  13. Looking at the die shows that the pins with metastability protection are INTR, NMI, PEREQ, ERROR#, and BUSY#. The 80386 Hardware Reference Manual lists these same five pins as asynchronous—I like it when I spot something unusual on the die and then discover that it matches an obscure statement in the documentation. The interrupt pins INTR and NMI are asynchronous because they come from external sources that may not be using the 386's clock. But what about PEREQ, ERROR#, and BUSY#? These pins are part of the interface with an external math coprocessor (the 287 or 387 chip). In most cases, the coprocessor uses the 386's clock. However, the 387 supported a little-used asynchronous mode where the processor and the coprocessor could run at different speeds. 

  14. The 386's metastability flip-flop is constructed with an unusual circuit. It has two latch stages (which is normal), but instead of using two inverters in a loop, it uses a sense-amplifier circuit. The idea of the sense amplifier is that it takes a differential input. When the clock enables the sense amplifier, it drives the higher input high and the lower input low (the inputs are also the outputs). (Sense amplifiers are used in dynamic RAM chips to amplify the tiny signals from a RAM cell to form a 0 or 1. At the same time, the amplifier refreshes the DRAM cell by generating full voltages.) Note that the sense amplifier's inputs also act as outputs; inputs during clock phase 1 and outputs during phase 2.

    The schematic shows one of the latch stages; the complete flip-flop has a second stage, identical except that the clock phases are switched. This latch is much more complex than the typical 386 latch; 14 transistors versus 6 or 8. The sense amplifier is similar to two inverters in a loop, except they share a limited power current and a limited ground current. As one inverter starts to go high, it "steals" the supply current from the other. Meanwhile, the other inverter "steals" the ground current. Thus, a small difference in inputs is amplified, just as in a differential amplifier. Thus, by combining the amplification of a differential amplifier with the amplification of the inverter loop, this circuit reaches its final state faster than a regular inverter loop.

    In more detail, during the first clock phase, the two inverters at the top generate the inverted and non-inverted signals. (In a metastable situation, these will be close to the midpoint, not binary.) During the second clock phase, the sense amplifier is activated. You can think of it as a differential amplifier with cross-coupling. If one input is slightly higher than the other, the amplifier pulls that input higher and the input lower, amplifying the difference. (The point is to quickly make the difference large enough to resolve the metastability.)

    I couldn't find any latches like this in the literature. Comparative Analysis and Study of Metastability on High-Performance Flip-Flops describes eleven high-performance flip-flops. It includes two flip-flops that are based on sense amplifiers, but their circuits are very different from the 386 circuit. Perhaps the 386 circuit is an Intel design that was never publicized. In any case, let me know if this circuit has an official name. 

A CT scanner reveals surprises inside the 386 processor's ceramic package

Intel released the 386 processor in 1985, the first 32-bit chip in the x86 line. This chip was packaged in a ceramic square with 132 gold-plated pins protruding from the underside, fitting into a socket on the motherboard. While this package may seem boring, a lot more is going on inside it than you might expect. Lumafield performed a 3-D CT scan of the chip for me, revealing six layers of complex wiring hidden inside the ceramic package. Moreover, the chip has nearly invisible metal wires connected to the sides of the package, the spikes below. The scan also revealed that the 386 has two separate power and ground networks: one for I/O and one for the CPU's logic.

A CT scan of the 386 package. The ceramic package doesn't show up in this image, but it encloses the spiky wires.

A CT scan of the 386 package. The ceramic package doesn't show up in this image, but it encloses the spiky wires.

The package, below, provides no hint of the complex wiring embedded inside the ceramic. The silicon die is normally not visible, but I removed the square metal lid that covers it.1 As a result, you can also see the two tiers of gold contacts that surround the silicon die.

The 386 package with the lid over the die removed.

The 386 package with the lid over the die removed.

Intel selected the 132-pin ceramic package to meet the requirements of a high pin count, good thermal characteristics, and low-noise power to the die.2: However, standard packages didn't provide sufficient power, so Intel designed a custom package with "single-row double shelf bonding to two signal layers and four power and ground planes." In other words, the die's bond wires are connected to the two shelves (or tiers) of pads surrounding the die. Internally, the package is like a 6-layer printed-circuit board made from ceramic.

Package cross-section. Redrawn from "High Performance Technology, Circuits and Packaging for the 80386".

Package cross-section. Redrawn from "High Performance Technology, Circuits and Packaging for the 80386".

The photo below shows the two tiers of pads with tiny gold bond wires attached: I measured the bond wires at 35 µm in diameter, thinner than a typical human hair. Some pads have up to five wires attached to support more current for the power and ground pads. You can consider the package to be a hierarchical interface from the tiny circuits on the die to the much larger features of the computer's motherboard. Specifically, the die has a feature size of 1 µm, while the metal wiring on top of the die has 6 µm spacing. The chip's wiring connects to the chip's bond pads, which have 0.01" spacing (.25 mm). The bond wires connect to the package's pads, which have 0.02" spacing (.5 mm); double the spacing because there are two tiers. The package connects these pads to the pin grid with 0.1" spacing (2.54 mm). Thus, the scale expands by about a factor of 2500 from the die's microscopic circuitry to the chip's pins. `

Close-up of the bond wires.

Close-up of the bond wires.

The ceramic package is manufactured through a complicated process.4 The process starts with flexible ceramic "green sheets", consisting of ceramic powder mixed with a binding agent. After holes for vias are created in the sheet, tungsten paste is silk-screened onto the sheet to form the wiring. The sheets are stacked, laminated under pressure, and then sintered at high temperature (1500ºC to 1600ºC) to create the rigid ceramic. The pins are brazed onto the bottom of the chip. Next, the pins and the inner contacts for the die are electroplated with gold.3 The die is mounted, gold bond wires are attached, and a metal cap is soldered over the die to encapsulate it. Finally, the packaged chip is tested, the package is labeled, and the chip is ready to be sold.

The diagram below shows a close-up of a signal layer inside the package. The pins are connected to the package's shelf pads through metal traces, spectacularly colored in the CT scan. (These traces are surprisingly wide and free-form; I expected narrower traces to reduce capacitance.) Bond wires connect the shelf pads to the bond pads on the silicon die. (The die image is added to the diagram; it is not part of the CT scan.) The large red circles are vias from the pins. Some vias connect to this signal layer, while other vias pass through to other layers. The smaller red circles are connections to a power layer; because the shelf pads are only on the two signal layers, the six power planes have connections to the signal layers for bonding. Since bond wires are only connected on the signal layers, the power layers need connections to pads on the signal layers.

A close-up of a signal layer. The die image is pasted in.

A close-up of a signal layer. The die image is pasted in.

The diagram below shows the corresponding portion of a power layer. A power layer looks completely different from a signal layer; it is a single conductive plane with holes. The grid of smaller holes allows the ceramic above and below this layer to bond, forming a solid piece of ceramic. The larger holes surround pin vias (red dots), allowing pin connections to pass through to a different layer. The red dots that contact the sheet are where power pins connect to this layer. Because the only connections to the die are from the signal layers, the power layers have connections to the signal layers; these are the smaller dots near the bond wires, either power vias passing through or vias connected to this layer.

A close-up of a power layer, specifically I/O Vss. The wavy blue regions are artifacts from neighboring layers. The die image is pasted in.

A close-up of a power layer, specifically I/O Vss. The wavy blue regions are artifacts from neighboring layers. The die image is pasted in.

With the JavaScript tool below, you can look at the package, layer by layer. Click on a radio button to select a layer. By observing the path of a pin through the layers, you can see where it ends up. For instance, the upper left pin passes through multiple layers until the upper signals layer connects it to the die. The pin to its right passes through all the layers until it reaches the logic Vcc plane on top. (Vcc is the 5-volt supply that powers the chip, called Vcc for historical reasons.)


If you select the logic Vcc plane above, you'll see a bright blotchy square in the center. This is not the die itself, I think, but the adhesive that attaches the die to the package, epoxy filled with silver to provide thermal and electrical conductivity. Since silver blocks X-rays, it is highly visible in the image.

Side contacts for electroplating

What surprised me most about the scans was seeing wires that stick out to the sides of the package. These wires are used during manufacturing when the pins are electroplated with gold.5 In order to electroplate the pins, each pin must be connected to a negative voltage so it can function as a cathode. This is accomplished by giving each pin a separate wire that goes to the edge of the package.

This diagram below compares the CT scan (above) to a visual side view of the package (below). The wires are almost invisible, but can be seen as darker spots. The arrows show how three of these spots match with the CT scan; you can match up the other spots.6

A close-up of the side of the package compared to the CT scan, showing the edge contacts. I lightly sanded the edge of the package to make the contacts more visible. Even so, they are almost invisible.

A close-up of the side of the package compared to the CT scan, showing the edge contacts. I lightly sanded the edge of the package to make the contacts more visible. Even so, they are almost invisible.

Two power networks

According to the datasheet, the 386 has 20 pins connected to +5V power (Vcc) and 21 pins connected to ground (Vss). Studying the die, I noticed that the I/O circuitry in the 386 has separate power and ground connections from the logic circuitry. The motivation is that the output pins require high-current driver circuits. When a pin switches from 0 to 1 or vice versa, this can cause a spike on the power and ground wiring. If this spike is too large, it can interfere with the processor's logic, causing malfunctions. The solution is to use separate power wiring inside the chip for the I/O circuitry and for the logic circuitry, connected to separate pins. On the motherboard, these pins are all connected to the same power and ground, but decoupling capacitors absorb the I/O spikes before they can flow into the chip's logic.

The diagram below shows how the two power and ground networks look on the die, with separate pads and wiring. The square bond pads are at the top, with dark bond wires attached. The white lines are the two layers of metal wiring, and the darker regions are circuitry. Each I/O pin has a driver circuit below it, consisting of relatively large transistors to pull the pin high or low. This circuitry is powered by the horizontal lines for I/O Vcc (light red) and I/O ground (Vss, light blue). Underneath each I/O driver is a small logic circuit, powered by thinner Vcc (dark red) and Vss (dark blue). Thicker Vss and Vcc wiring goes to the logic in the rest of the chip. Thus, if the I/O circuitry causes power fluctuations, the logic circuit remains undisturbed, protected by its separate power wiring.

A close-up of the top of the die, showing the power wiring and the circuitry for seven data pins.

A close-up of the top of the die, showing the power wiring and the circuitry for seven data pins.

The datasheet doesn't mention the separate I/O and logic power networks, but by using the CT scans, I determined which pins power I/O, and which pins power logic. In the diagram below, the light red and blue pins are power and ground for I/O, while the dark red and blue pins are power and ground for logic. The pins are scattered across the package, allowing power to be supplied to all four sides of the die.

The pinout from the Intel386DX Microprocessor Datasheet. This is the view from the pin side.

The pinout from the Intel386DX Microprocessor Datasheet. This is the view from the pin side.

"No Connect" pins

As the diagram above shows, the 386 has eight pins labeled "NC" (No Connect)—when the chip is installed in a computer, the motherboard must leave these pins unconnected. You might think that the 132-pin package simply has eight extra, unneeded pins, but it's more complicated than that. The photo below shows five bond pads at the bottom of the 386 die. Three of these pads have bond wires attached, but two have no bond wires: these correspond to No Connect pins. Note the black marks in the middle of the pads: the marks are from test probes that were applied to the die during testing.7 The No Connect pads presumably have a function during this testing process, providing access to an important internal signal.

A close-up of the die showing three bond pads with bond wires and two bond pads without bond wires.

A close-up of the die showing three bond pads with bond wires and two bond pads without bond wires.

Seven of the eight No Connect pads are almost connected: the package has a spot for a bond wire in the die cavity and the package has internal wiring to a No Connect pin. The only thing missing is the bond wire between the pad and the die cavity. Thus, by adding bond wires, Intel could easily create special chips with these pins connected, perhaps for debugging the test process itself.

The surprising thing is that one of the No Connect pads does have the bond wire in place, completing the connection to the external pin. (I marked this pin in green in the pinout diagram earlier.) From the circuitry on the die, this pin appears to be an output. If someone with a 386 chip hooks this pin to an oscilloscope, maybe they will see something interesting.

Labeling the pads on the die

The earlier 8086 processor, for example, is packaged in a DIP (Dual-Inline Package) with two rows of pins. This makes it straightforward to figure out which pin (and thus which function) is connected to each pad on the die. However, since the 386 has a two-dimensional grid of pins, the mapping to the pads is unclear. You can guess that pins are connected to a nearby pad, but ambiguity remains. Without knowing the function of each pad, I have a harder time reverse-engineering the die.

In fact, my primary motivation for scanning the 386 package was to determine the pin-to-pad mapping and thus the function of each pad.8 Once I had the CT data, I was able to trace out each hidden connection between the pad and the external pin. The image below shows some of the labels; click here for the full, completely labeled image. As far as I know, this information hasn't been available outside Intel until now.

A close-up of the 386 die showing the labels for some of the pins.

A close-up of the 386 die showing the labels for some of the pins.

Conclusions

Intel's early processors were hampered by inferior packages, but by the time of the 386, Intel had realized the importance of packaging. In Intel's early days, management held the bizarre belief that chips should never have more than 16 pins, even though other companies used 40-pin packages. Thus, Intel's first microprocessor, the 4004 (1971), was crammed into a 16-pin package, limiting its performance. By 1972, larger memory chips forced Intel to move to 18-pin packages, extremely reluctantly.9 The eight-bit 8008 processor (1972) took advantage of this slightly larger package, but performance still suffered because signals were forced to share pins. Finally, Intel moved to the standard 40-pin package for the 8080 processor (1974), contributing to the chip's success. In the 1980s, pin-grid arrays became popular in the industry as chips required more and more pins. Intel used a ceramic pin grid array (PGA) with 68 pins for the 186 and 286 processors (1982), followed by the 132-pin package for the 386 (1985).

The main drawback of the ceramic package was its cost. According to the 386 oral history, the cost of the 386 die decreased over time to the point where the chip's package cost as much as the die. To counteract this, Intel introduced a low-cost plastic package for the 386 that cost just a dollar to manufacture, the Plastic Quad Flat Package (PQFP) (details).

In later Intel processors, the number of connections exponentially increased. A typical modern laptop processor uses a Ball Grid Array with 2049 solder balls; the chip is soldered directly onto the circuit board. Other Intel processors use a Land Grid Array (LGA): the chip has flat contacts called lands, while the socket has the pins. Some Xeon processors have 7529 contacts, a remarkable growth from the 16 pins of the Intel 4004.

From the outside, the 386's package looks like a plain chunk of ceramic. But the CT scan revealed surprising complexity inside, from numerous contacts for electroplating to six layers of wiring. Perhaps even more secrets lurk in the packages of modern processors.

Follow me on Bluesky (@righto.com), Mastodon (@[email protected]), or RSS. (I've given up on Twitter.) Thanks to Jon Bruner and Lumafield for scanning the chip. Lumafield's interactive CT scan of the 386 package is available here if you to want to examine it yourself. Lumafield also scanned a 1960s cordwood flip-flop and the Soviet Globus spacecraft navigation instrument for us. Thanks to John McMaster for taking 2D X-rays.

Notes and references

  1. I removed the metal lid with a chisel, as hot air failed to desolder the lid. A few pins were bent in the process, but I straightened them out, more or less. 

  2. The 386 package is described in "High Performance Technology, Circuits and Packaging for the 80386", Proceedings, ICCD Conference, Oct. 1986. (Also see Design and Test of the 80386 by Pat Gelsinger, former Intel CEO.)

    The paper gives the following requirements for the 386 package:

    1. Large pin count to handle separate 32-bit data and address buses.
    2. Thermal characteristics resulting in junction temperatures under 110°.
    3. Power supply to the chip and I/O able to supply 600mA/ns with noise levels less than 0.4V (chip) and less than 0.8V (I/O).

    The first and second criteria motivated the selection of a 132-pin ceramic pin grid array (PGA). The custom six-layer package was designed to achieve the third objective. The power network is claimed to have an inductance of 4.5 nH per power pad on the device, compared to 12-14 nH for a standard package, about a factor of 3 better.

    The paper states that logic Vcc, logic Vss, I/O Vcc, and I/O Vss each have 10 pins assigned. Curiously, the datasheet states that the 386 has 20 Vcc pins and 21 Vss pins, which doesn't add up. From my investigation, the "extra" pin is assigned to logic Vss, which has 11 pins. 

  3. I estimate that the 386 package contains roughly 0.16 grams of gold, currently worth about $16. It's hard to find out how much gold is in a processor since online numbers are all over the place. Many people recover the gold from chips, but the amount of gold one can recover depends on the process used. Moreover, people tend to keep accurate numbers to themselves so they can profit. But I made some estimates after searching around a bit. One person reports 9.69g of gold per kilogram of chips, and other sources seem roughly consistent. A ceramic 386 reportedly weighs 16g. This works out to 160 mg of gold per 386. 

  4. I don't have information on Intel's package manufacturing process specifically. This description is based on other descriptions of ceramic packages, so I don't guarantee that the details are correct for the 386. A Fujitsu patent, Package for enclosing semiconductor elements, describes in detail how ceramic packages for LSI chips are manufactured. IBM's process for ceramic multi-chip modules is described in Multi-Layer Ceramics Manufacturing, but it is probably less similar. 

  5. An IBM patent, Method for shorting pin grid array pins for plating, describes the prior art of electroplating pins with nickel and/or gold. In particular, it describes using leads to connect all input/output pins to a common bus at the edge of the package, leaving the long leads in the structure. This is exactly what I see in the 386 chip. The patent mentions that a drawback of this approach is that the leads can act as antennas and produce signal cross-talk. Fujitsu patent Package for enclosing semiconductor elements also describes wires that are exposed at side surfaces. This patent covers methods to avoid static electricity damage through these wires. (Picking up a 386 by the sides seems safe, but I guess there is a risk of static damage.)

    Note that each input/output pin requires a separate wire to the edge. However, the multiple pins for each power or ground plane are connected inside the package, so they do not require individual edge connections; one or two suffice. 

  6. To verify that the wires from pins to the edges of the chip exist and are exposed, I used a multimeter and found connectivity between pins and tiny spots on the sides of the chip. 

  7. To reduce costs, each die is tested while it is still part of the silicon wafer and each faulty die is marked with an ink spot. The wafer is "diced", cutting it apart into individual dies, and only the functional, unmarked dies are packaged, avoiding the cost of packaging a faulty die. Additional testing takes place after packaging, of course. 

  8. I tried several approaches to determine the mapping between pads and pins before using the CT scan. I tried to beep out the connections between the pins and the pads with a multimeter, but because the pads are so tiny, the process was difficult, error-prone, and caused damage to the package.

    I also looked at the pinout of the 386 in a plastic package (datasheet). Since the plastic package has the pins in a single ring around the border, the mapping to the die is straightforward. Unfortunately, the 386 die was slightly redesigned at this time, so some pads were moved around and new pins were added, such as FLT#. It turns out that the pinout for the plastic chip almost matches the die I examined, but not quite. 

  9. In his oral history, Federico Faggin, a designer of the 4004, 8008, and Z80 processors, describes Intel's fixation on 16-pin packages. When a memory chip required 18 pins instead of 16, it was "like the sky had dropped from heaven. I never seen so [many] long faces at Intel, over this issue, because it was a religion in Intel; everything had to be 16 pins, in those days. It was a completely silly requirements [sic] to have 16 pins." At the time, other manufacturers were using 40- and 48-pin packages, so there was no technical limitation, just a minor cost saving from the smaller package.