Ken Shirriff's blog

Reverse-engineering the Z-80: the silicon for two interesting gates explained

I've been reverse-engineering the Z-80 processor, using images from the Visual 6502 team. One interesting thing about the Z-80's silicon is it uses complex gates with multiple inputs and multiple levels of logic. It also implements an XOR gate with an unusual pass-transistor circuit. I thought it would be interesting to examine these gates at the silicon level and show how they work.

The image above shows the overall organization of the Z-80 chip. I'm going to zoom way in on the ALU and look at the silicon that implements one of the complex gates there: a 5-input, three-level gate. I'll walk through this gate and show how it works at the silicon level. While the silicon look like a jumble of lines, its operation is actually straightforward if you step through it.

Let's begin with an (oversimplified) description of how the chip is constructed. The chip starts with the silicon wafer. Regions are diffused with an element such as boron, yielding conductive diffusion regions. A layer of polysilicon strips is put on top. Finally, a layer of metal "wires" above the polysilicon provides more connections. For our purposes, diffusion regions, polysilicon, and metal can all be consider conductors.

In the image below, the bright vertical bands are metal wires. The slightly darker horizontal bands are polysilicon; the borders are more visible than the regions themselves. In this part of the Z-80, the polysilicon connections run mostly horizontally, and the metal wires run vertically. The large irregular regions outlined in black are doped silicon diffusion regions. The circles are vias between different layers.

Transistors are formed where a polysilicon line crosses a diffusion region. You might expect transistors to be very visible in the image, but a polysilicon line looks the same whether its a conductor or a transistor. So transistors just appear as long skinny regions in the image. The diagram below shows the physical structure of a transistor: the source and drain are connected if the gate is positive.

Let's dive in and see how this circuit works. There's a lot going on, but the image below has been colored to make it clearer. Only three of the vertical metal lines are relevant. On the left, the yellow metal line ties together parts of the gate. In the middle is the blue ground line, which is critical to the operation of the gate. At the right, the red positive voltage line is used to pull the output high through a resistor. The large diffusion region has been tinted cyan. This region can be thought of as big conductive areas interrupted by transistors. There are 5 pinkish polysilicon input wires, labeled A, B, C, D, E. When they cross the diffusion region they still act as wires, but also form a transistor below in the diffusion region. For instance, input A is connected to two transistors.

With all the pieces labeled, we can figure out the operation of the circuit. If input A is high, the first transistor will conduct and connect the yellow strip to ground (dotted line 1). Likewise, if input B is high, the second transistor will conduct and ground the yellow strip (dotted line 2). C will ground the yellow strip via 3. So the yellow strip will be grounded for A or B or C. This forms a three-input OR gate.

If input D is high, transistor 4 will connect the yellow strip to the output. Likewise, if input E is high, transistor 5 will connect the yellow strip to the output. Thus, the output will be grounded if (A or B or C) and (D or E).

In the upper right, arrow 6/7/8 will ground the output if A and B and C are high and the three associated transistors (6, 7, 8) conduct. This computes A and B and C.

Putting this all together, the output will be grounded if [(A or B or C) and (D or E)] or [A and B and C]. If the output is not grounded, the resistor (actually a depletion transistor) will pull the output high. Thus, the final output is not [(A or B or C) and (D or E)] or [A and B and C].

The diagram below shows the gate logic implemented by this circuit. This rather complex gate is created from just nine transistors. Note that the final AND and NOR gates are "for free" - they are formed by wiring together previous outputs and don't require additional transistors. Another point of interest is that with NMOS, the output will be high unless something pulls it low, which explains why circuits are based on NAND and NOR gates rather than AND and OR gates.

If you want to see more low-level silicon analysis, see my article on the overflow circuit in the 6502 at the silicon level.

What does this gate do?

This gate is a key part of one bit of the Z-80's ALU. The gate generates the (inverted) sum, AND, OR, or XOR of B and C depending on the inputs. Specifically, B and C are the two operand inputs, and A is the carry in. D is a control input and E is an inverted intermediate carry from B plus C plus carry_in. By controlling D and overriding A and E, the operation is selected.

The Z-80's interesting XOR gate

The Z-80 uses an unusual circuit for its XOR gate. XOR is an inconvenient function to implement since it has a worst-case Karnaugh map, making it expensive to implement from simple gates. Instead, the Z-80 uses a combination of inverters and pass transistors, different from regular NMOS logic.

As before, the diagram below shows the power and ground metal lines, a connecting metal line in yellow, the polysilicon in pink, the polysilicon transistor gates in green, and diffusion in cyan. The two inputs are A and B.

Starting with input A: if it is high, transistor 1 will connect A' to ground. Otherwise the pullup resistor (way on the left), will pull A' high. (Note that A' is the whole diffusion region between transistor 1 and transistor 3 up to the resistor.) Thus transistor 1 forms a simple inverter with inverted output A'. Likewise, transistor 2 inverts input B to give inverted B' (in the whole diffusion region between transistors 2 and 4).

Now comes the tricky part. If A' is high, pass transistor 4 will connect B' to the yellow metal. If B' is high, pass transistor 3 will connect A' to the yellow metal. The third pullup resistor will pull the yellow metal high unless something ties it to ground . Working through the combinations, if A' and B' are both high, both A' and B' are connected to the yellow metal, which gets pulled high. If A' is high and B' is low, B' is connected to the yellow metal, pulling it low. Likewise, if A' is low and B' is high, A' pulls the yellow metal low. Finally if A' and B' are low, nothing gets connected to the yellow metal, so the resistor pulls it high.

To summarize, the yellow metal is pulled high if A' and B' are both high or both low. That is, it is the exclusive-nor of A' and B', which is also the exclusive-or of A and B.

Finally, the xnor value controls transistors 5a and 5b which form an inverter. If xnor is high, transistors 5a and 5b conduct and the xor output is connected to ground, and if xnor is low, the pullup resistors pull the xor output high. One unusual feature here is the parallel transistors 5a and 5b with separate pullup resistors. I haven't seen this in the 8085 or 6502; they use a single larger transistor instead of parallel transistors.

The schematic below summarizes the circuit. In case you're wondering, this XOR gate is used to compute the parity flag. All the bits are XORed together to generate the parity flag.

Comparison to other processors

From what I've seen so far, the Z-80 uses considerably more complex gates than the 8085 and the 6502. The 6502 uses mostly simple NAND/NOR gates and only a few two-level gates, not as complex as on the Z-80. The 8085 uses more complex gates, but still less than the Z-80. I don't know if the difference is due to technical limits on the number of gate levels, or the preferences of the designers.

The XOR circuit in the Z-80 is different from the 8085 and 6502. I'm not sure it saves any transistors, but it is unusual. I've seen other pass-transistor implementations of XOR, but none like the Z-80.

Credits: The Visual 6502 team especially Chris Smith, Ed Spittles, Pavel Zima, Phil Mainwaring, and Julien Oster.

9 Hacker News comments I'm tired of seeing

As a long-time reader of Hacker News, I keep seeing some comments they don't really contribute to the conversation. Since the discussions are one of the most interesting parts of the site I offer my suggestions for improving quality.

Correlation is not causation: the few readers who don't know this already won't benefit from mentioning it. If there's some specific reason you think a a study is wrong, describe it.
"If you're not paying for it, you're the product" - That was insightful the first time, but doesn't need to be posted about every free website.
Explaining a company's actions by "the legal duty to maximize shareholder value" - Since this can be used to explain any action by a company, it explains nothing. Not to mention the validity of the statement is controversial.
[citation needed] - This isn't Wikipedia, so skip the passive-aggressive comments. If you think something's wrong, explain why.
Premature optimization - labeling every optimization with this vaguely Freudian phrase doesn't make you the next Knuth. Calling every abstraction a leaky abstraction isn't useful either.
Dunning-Kruger effect - an overused explanation and criticism.
Betteridge's law of headlines - this comment doesn't need to appear every time a title ends in a question mark.
A link to a logical fallacy, such as ad hominem or more pretentiously tu quoque - this isn't a debate team and you don't score points for this.
"Cue the ...", "FTFY", "This.", "+1", "Sigh", "Meh", and other generic internet comments are just annoying.

My readers had a bunch of good suggestions. Here are a few:

The plural of anecdote is not data
Cargo cult
Comments starting with "No." "Wrong." or "False."
Just use bootstrap / heroku / nodejs / Haskell / Arduino.
"How [or Why] did this make the front page of HN?" followed by http://ycombinator.com/newsguidelines.html

In general if a comment could fit on a bumper sticker or is simply a link to a Wikipedia page or is almost a Hacker News meme, it's probably not useful.

What comments bother you the most?

Check out the long discussion at Hacker News. Thanks for visiting, HN readers!

Amusing note: when I saw the comments below, I almost started deleting them thinking "These are the stupidest comments I've seen in a long time". Then I realized I'd asked for them :-)

Edit: since this is getting a lot of attention, I'll add my "big theory" of Internet discussions.

There are three basic types of online participants: "watercooler", "scientific conference", and "debate team". In "watercooler", the participants are having an entertaining conversation and sharing anecdotes. In "scientific conference", the participants are trying to increase knowledge and solve problems. In "debate team", the participants are trying to prove their point is right.

HN was originally largely in the "scientific conference" mode, with very smart people discussing areas in which they were experts. Now HN has much more "watercooler" flavor, with smart people chatting about random things they often know little about. And certain subjects (e.g. economics, Apple, sexism, piracy) bring out the "debate team" commenters. Any of the three types can carry on happily by themself. However, much of the problem comes when the types of conversation mix. The "watercooler" conversations will annoy the "scientific conference" readers, since half of what they say is wrong. Conversely, the "scientific conference" commenters come across as pedantic when they interrupt a fun conversation with facts and corrections. A conversation between "debate team" and one of the other groups obviously goes nowhere.

Reverse-engineering the 8085's decimal adjust circuitry

In this post I reverse-engineer and describe the simple decimal adjust circuit in the 8085 microprocessor. Binary-coded decimal arithmetic was an important performance feature on early microprocessors. The idea behind BCD is to store two 4-bit decimal numbers in a byte. For instance, the number 42 is represented in BCD as 0100 0010 (0x42) instead of binary 00101010 (0x2a). This continues my reverse engineering series on the 8085's ALU, flag logic, undocumented flags, register file, and instruction set.

The motivation behind BCD is to make working with decimal numbers easier. Programs usually need to input and output numbers in decimal, so if the number is stored in binary it must be converted to decimal for output. Since early microprocessors didn't have division instructions, converting a number from binary to decimal is moderately complex and slow. On the other hand, if a number is stored in BCD, outputting decimal digits is trivial. (Nowadays, the DAA operation is hardly ever used).

One problem with BCD is the 8085's ALU operates on binary numbers, not BCD. To support BCD operations, the 8085 provides a DAA (decimal adjust accumulator) operation that adjusts the result of an addition to correct any overflowing BCD values. For instance, adding 5 + 6 = binary 0000 1011 (hex 0x0b). The value needs to be corrected by adding 6 to yield hex 0x11. Adding 9 + 9 = binary 0001 0010 (hex 0x12) which is a valid BCD number, but the wrong one. Again, adding 6 fixes the value. In general, if the result is ≥ 10 or has a carry, it needs to be decimal adjusted by adding 6. Likewise, the upper 4 BCD bits get corrected by adding 0x60 as necessary. The DAA operation performs this adjustment by adding the appropriate value. (Note that the correction value 6 is the difference between a binary carry at 16 and a decimal carry at 10.)

The DAA operation in the 8085 is implemented by several components: a signal if the lower bits of the accumulator are ≥ 10, a signal if the upper bits are ≥ 10 (including any half carry from the lower bits), and circuits to load the ACT register with the proper correction constant 0x00, 0x06, 0x60, or 0x66. The DAA operation then simply uses the ALU to add the proper correction constant.

The block diagram below shows the relevant parts of the 8085: the ALU, the ACT (accumulator temp) register, the connection to the data bus (dbus), and the various control lines.

The circuit below implements this logic. If the low-order 4 bits of the ALU are 10 or more, alu_lo_ge_10 is set. The logic to compute this is fairly simple: the 8's place must be set, and either the 4's or 2's. If DAA is active, the low-order bits must be adjusted by 6 if either the low-order bits are ≥ 10 or there was a half-carry (A flag).

Similarly, alu_hi_ge_10 is set if the high-order 4 bits are 10 or more. However, a base-10 overflow from the low order bits will add 1 to the high-order value so a value of 9 will also set alu_hi_ge_10 if there's an overflow from the low-order bits. A decimal adjust is performed by loading 6 into the high-order bits of the ACT register and adding it. A carry out also triggers this decimal adjust.

Schematic of the decimal adjust circuitry in the 8085 microprocessor.

The circuits to load the correction value into ACT are controlled by the load_act_x6 signal for the low digit and load_act_6x for the high digit. These circuits are shown in my earlier article Reverse-engineering the 8085's ALU and its hidden registers.

Comparison to the 6502

By reverse-engineering the 8085, we see how the simple decimal adjust circuit in the 8085 works. In comparison, the 6502 handles BCD in a much more efficient but complex way. The 6502 has a decimal mode flag that causes addition and subtraction to automatically do decimal correction, rather than using a separate instruction. This patented technique avoids the performance penalty of using a separate DAA instruction. To correct the result of a subtraction, the 6502 needs to subtract 6 (or equivalently add 10). The 6502 uses a fast adder circuit that does the necessary correction factor addition or subtraction without using the ALU. Finally, the 6502 determines if correction is needed before the original addition/subtraction completes, rather than examining the result of the addition/subtraction, providing an additional speedup.

This information is based on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.

Reverse-engineering and simulating Sinclair's amazing 1974 calculator with half the ROM of the HP-35

I've reverse-engineered the Sinclair Scientific calculator. The remarkable thing about this calculator is they took a simple 4-function calculator chip and reprogrammed its 320-instruction ROM to be a full scientific calculator. By looking at the chip, I've extracted the original code, reverse-engineered how it works, and written a JavaScript simulator that runs the original code and shows what the calculator is doing internally.

The simulator is at righto.com/sinclair. My earlier TI calculator simulator is at righto.com/ti. (The image above is courtesy of Hackaday.)

I wouldn't have given a nickel for their stock: Visiting Apple in 1976

A guest posting from William Fine:

I saw the "Jobs" movie yesterday and it revived some ancient memories of my dealings with Jobs and Holt in the "old days"! When I returned home, I researched Rod Holt on the Internet and ran across your Power Supply Blog, which I found most interesting. Perhaps you can add my ensuing comments to your blog as you see fit.

In 1973 I started a company in my garage in Cupertino to design and manufacture custom Magnetic Products. It was called Mini-Magnetics Co. Inc. After a few months I was forced out of the garage into a small office complex on Sunnyvale-Saratoga Road, and had about 5/10 employees.

I believe it was around 1975/76 or so, I had a visit from a insulation and wire salesman named Mike Felix. He informed me that I may soon be getting a call from a new /start-up company called Apple Computer located just a few blocks away in Cupertino. He gave them my name when he was asked to recommend a Magnetics manufacturing house.

I promptly forgot about it, as I was already quite busy and I never had to solicit business or even advertise. A week or two went by, and I received a call from a female at Apple who set up a appointment for the next day with a guy named Steve Jobs. She gave me the address and it turned out to be located in a office complex located just behind the "Good Earth" restaurant.

The next day I went over to the location and knocked on the door, and it was opened by Jobs, with Wozniack in the background and a young hippie looking girl at a desk in the corner talking on the phone while eating. That was Apple Computer. They had just moved out of their garage into this new location. It appeared to be a large room with "stuff" scattered hap-hazardly all over, on benches and on the floor. From Jobs' appearance, I was a bit afraid to even shake hands with him, especially after getting a whiff of his body odor!

He immediately took me over to a bench that had a few cardboard boxes on it and showed me some transformer cores, bobbins and spools of wire, and unfolded a hand written diagram of the various magnetic components that he wanted me to wind and assemble for Apple.I took a quick look, and while it was all quite sketchy, looked do-able. He said that he needed them within 10 days and I said ok, since he was furnishing the materials.

I told him that I would call him with a quote after I got back to my office and he said ok and as we were parting he mentioned that if I had any technical questions to get a hold of a guy named Rod Holt and wrote down a phone number where he could be reached.

As I recall, there were about 5-6 magnetic components from simple toroids to a complex switching main power transformer. I believe that the price came to about $10.00 per set,and they wanted 35 sets, so the entire matter would be about $350.00. I called it into Apple the next day and they gave me a Purchase Order number over the phone. When I asked if they would be mailing me a hard copy confirming the order, they had no idea of what I was talking about!

I figured, what the hell, worst case, I would be out $350 bucks if they didn't pay the bill. No big deal.

After I got into examining the sketches I discovered something quite interesting about the power transformer. In all previous designs that I had seen, there was a primary, a base feedback winding and several output windings. What Holt had contrived was a interesting method of assuring excellent coupling of the base winding by using a single strand of wire from a multi-filar bundle that was custom ordered from the wire factory. For example, I think that there was a bundle of 30 strands twisted together, which were all coated in red insulation and one strand of green insulation also twisted together in the bundle, which gave a precise turns ratio together with excellent coupling between the windings.

I am uncertain if that contributed much to improving the efficiency of the switcher, but it seemed clever at the time I discovered it. That transformer, is the one that is shown with the copper foil external shield pictured in your blog. I did speak with Holt once or twice but never met him in person.

The 35 sets of parts were delivered on time and much to my surprise, we were paid within 10 days. I attributed that to the arrival of Mike Markkula onto the scene who had provided some money and organization to Apple.

At the time, after seeing the Apple operation, I wouldn't have given a nickle for a share of their stock if it had been offered! Ha!

I had been involved with power supplies for many years prior to this Apple issue, and can say that switchers were known for a long time, but only became practical with the advent of low loss ferrite core materials and faster transistors as your blog implies.

So, thats the Apple Power Supply story ! Be happy to answer any questions that you may come up with. Regards, wpf

Simulating a TI calculator with crazy 11-bit opcodes

I've built a register-level simulator of a 1974 TI calculator chip that shows what actually happens inside a calculator when you perform operations and shows the calculator source code as it executes. The architecture of the calculator chip is pretty interesting, with 11-bit opcodes, a 9-bit address bus, and 44-bit BCD registers. The chip doesn't support multiplication or division, so these are performed with repeated addition or subtraction.

The simulator is at righto.com/ti.

Reverse-engineering the 8085's ALU and its hidden registers

This article describes how the ALU of the 8085 microprocessor works and how it interacts with the rest of the chip, based on reverse-engineering of the silicon. (This is part 2 of my ALU reverse-engineering; part 1 described the circuit for a single ALU bit.) Along with the accumulator, the ALU uses two undocumented registers - ACT and TMP - and this article describes how they work in detail, as well as how the ALU is controlled.

The arithmetic-logic unit is a key part of the microprocessor, performing operations and comparisons on data. In the 8085, the ALU is also a key part of the data path for moving data. The ALU and associated registers take up a fairly large part of the chip, the upper left of the photomicrograph image below. The control circuitry for the ALU is in the top center of the image. The data bus (dbus) is indicated in blue.

Photograph of the 8085 chip showing the location of the ALU, flags, and registers.

The real architecture of the 8085 ALU

The following architecture diagram shows how the ALU interacts with the rest of the 8085 at the block-diagram level. The data bus (dbus) conneccts the ALU and associated registers with the rest of the 8085 microprocessor. There are also numerous control lines, which are not shown.

The ALU uses two temporary registers that are not directly visible to the programmer. The Accumulator Temporary register (ACT) holds the accumulator value while an ALU operation is performed. This allows the accumulator to be updated with the new value without causing a race condition. The second temporary register (TMP) holds the other argument for the ALU operation. The TMP register typically holds a value from memory or another register.

Architecture of the 8085 ALU as determined from reverse-engineering.

The 8085 datasheet has an architecture diagram that is simplified and not quite correct. In particular, the ACT register is omitted and a data path from the data bus to the accumulator is shown, even though that path doesn't exist.

The accumulator and ACT registers

To the programmer, the accumulator is the key register for arithmetic operations. Reverse-engineering, however, shows the accumulator is not connected directly to the ALU, but works closely with the ACT (accumulator temporary) register.

The ACT register has several important functions. First, it holds the input to the ALU. This allows the results from the ALU to be written back to the accumulator without disturbing the input, which would cause instability. Second, the ACT can hold constant values (e.g. for incrementing or decrementing, or decimal adjustment) without affecting the accumulator. Finally, the ACT allows ALU operations that don't use the accumulator.

The accumulator and ACT (Accumulator Temporary) registers and their control lines in the 8085 microprocessor.

The diagram above shows how the accumulator and ACT registers are connected, and the control lines that affect them. One surprise is that the only way to put a value into the accumulator is through the ALU. This is controlled by the alu_to_a control line. You might expect that if you load a value into the accumulator, it would go directly from the data bus to the accumulator. Instead, the value is OR'd with 0 in the ALU and the result is stored in the accumulator.

The accumulator has two status outputs: a_hi_ge_10, if the four high-order bits are ≥ 10, and a_lo_ge_10, if the four low-order bits are ≥ 10. These outputs are used for decimal arithmetic, and will be explained in another article.

The accumulator value or the ALU result can be written to the databus through the sel_alu_a control (which selects between the ALU result and the accumulator), and the alu/a_to_dbus control line, which enables the superbuffer to write the value to the data bus. (Because the data bus is large and connects many parts of the chip, it requires high-current signals to overcome its capacitance. A "superbuffer" provides this high-current output.)

The ACT register can hold a variety of different values. In a typical arithmetic operation, the accumulator value is loaded into the ACT via the a_to_act control. The ACT can also load a value from the data bus via dbus_to_act. This is used for the ARHL/DAD/DSUB/LDHI/LDSI/RDEL instructions (all of which are undocumented except DAD). These instructions perform arithmetic operations without involving the accumulator, so they require a path into the ALU that bypasses the accumulator.

The control lines allow the ACT register to be loaded with a variety of constants. The 0/fe_to_act control line loads either 0 or 0xfe into the ACT; the value is selected by the sel_0_fe control line. The value 0 has a variety of uses. ORing a value with 0 allows the value to pass through the ALU unchanged. If the carry is set, ADDing to 0 performs an increment. The value 0xfe (signed -2) is used only for the DCR (decrement by 1) instruction. You might think the value 0xff (signed -1) would be more appropriate, but if the carry is set, ADDing 0xfe decrements by 1. I think the motivation is so both increments and decrements have the carry set, and thus can use the same logic to control the carry.

Since the 8085 has a 16-bit increment/decrement circuit, you might wonder why the ALU is also used for increment/decrement. The main reason is that using the ALU allows the condition flags to be set by INR and DCR. In contrast, the 16-bit increment and decrement instructions (INX and DCX) use the incrementer/decrementer, and as a consequence the flags are not updated.

To support BCD, the ACT can be loaded with decimal adjustment values 0x00, 0x06, 0x60, or 0x66. The top and bottom four bits of ACT are loaded with the value 6 with the 6x_to_act and x6_to_act control lines respectively.

It turns out that the decimal adjustment values are easily visible in the silicon. The following image shows the silicon that implements the ACT register. Each of the large pink structures is one bit. The eight bits are arranged with bit 7 on the left and bit 0 on the right. Note that half of the bits have pink loops at the top, in the pattern 0110 0110. These loops pull the associated bit high, and are used to set the high and/or low four bits to 6 (binary 0110).

The ACT register in the 8085. This image shows the silicon that implements the 8-bit register.

Building the 8-bit ALU from single-bit slices

In my previous article on the 8085 ALU I described how each bit of the ALU is implemented. Each bit slice of the ALU takes two inputs and performs a simple operation: or, add, xor and, shift right, complement, or subtract. The ALU has a shift right input and a carry input, and generates a carry output. In addition, each slice of the ALU contributes to the parity and zero calculations. The ALU has five control lines to select the operation.

One bit of the ALU in the 8085 microprocessor

The ALU has seven basic operations: or, add, xor, and, shift right, complement, and subtract. The following table shows the five control lines that select the operation, and the meaning of the carry line for the operation. Note that the meaning of carry in and carry out is different for each operations. For bit operations, the implementation of the ALU circuitry depends on a particular carry in value, even though carry is meaningless for these operations.

Operation	select_neg_in2	select_op1	select_op2	select_shift_right	select_ncarry_1	Carry in/out
or	0	0	0	0	1	1
add	0	1	0	0	0	/carry
xor	0	1	0	0	1	1
and	0	1	1	0	1	0
shift right	0	0	1	1	1	0
complement	1	0	0	0	1	1
subtract	1	1	0	0	0	borrow

The eight-bit ALU is formed by linking eight single-bit ALUs as shown below. The high-order bit is on the left, and the low-order bit on the right, matching the layout in silicon. The carry, parity, and zero values propagate through each ALU to form the final values on the left. The right shift input is simply the bit from the right, with the exception of the topmost bit which uses a special shift right input. The auxiliary carry is simply the carry out of bit three. The control lines to select the operation are fed into all eight ALU slices. By combining eight of these ALU slices, the whole 8-bit ALU is created. The values from the top bit are used to control the parity, zero, carry, and sign flags (as well as the undocumented K and V flags). Bit 3 generates the half carry flag.

The 8-bit ALU in the 8085 is formed by combining eight 1-bit slices.

The control lines

The ALU uses 29 control lines that are generated by a PLA that activates the right control lines based on the opcode and the position in the instruction cycle. For reference, the following table lists the 29 ALU control lines and the instructions that affect them.

Control line	Relevant instructions
`ad_latch_dbus, write_dbus_to_alu_tmp, /ad_dbus`	`IN/LDA/LHLD`
`/ad_dbus`	`ARHL/DAD/DSUB/LDHI/LDSI/RDEL`
`/alu/a_to_dbus`	all
`/dbus_to_act`	`ARHL/DAD/DSUB/LDHI/LDSI/RDEL`
`a_to_act`	`ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI`
`0/fe_to_act`	all
`sel_alu_a`	all
`alu_to_a`	`ACI/ADC/ADD/ADI/ANA/ANI/CMA/CMC/DAA/DCR/IN/INR/LDA/LDAX/MOV/MVI/ORA/ORI/POP/RAL/RAR/RIM/RLC/RRC/SBB/SBI/SIM/STC/SUB/SUI/XRA/XRI`
`/daa`	`DAA`
`sel_0_fe`	`DCR`
`store_v_flag`	`ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMP/CPI/DAA/DCR/INR/ORA/ORI/RAL/RAR/RLC/RRC/SBB/SBI/SUB/SUI/XRA/XRI`
`select_shift_right`	`ARHL/RAR/RRC`
`arith_to_flags`	`ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DCR/DSUB/INR/ORA/ORI/SBB/SBI/SUB/SUI/XRA/XRI`
`bus_to_flags`	`POP PSW`
`/zero_flag_combine`	`DAD/DSUB`
`/flags_to_bus`	`ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CALL/CC/CM/CMA/CMC/CMP/CNC/CNZ/CP/CPE/CPI//CPO/CZ/DAA/DAD/DCR/DCX/DI/DSUB/EI/HLT/IN/INR/INX/JC/JK/JM/JMP/JNC/JNK/JNZ/JP/JPE/JPO/JZ/LDA/LDAX/LDHI/LDSI/LHLD/LHLX/LXI/MOV/MVI/NOP/ORA/ORI/OUT/PCHL/POP/PUSH/RAL/RAR/RC/RDEL/RET/RIM/RLC/RM/RNC/RNZ/RP/RPE/RPO/RRC/RST/RSTV/RZ/SBB/SBI/SHLD/SHLX/SIM/SPHL/STA/STAX/STC/SUB/SUI/XCHG/XRA/XRI/XTHL`
`shift_right_in_select`	`ARHL`
`xor_carry_in`	`ANA/ANI/ARHL/CMP/CPI/DCR/DSUB/INR/RAR/RRC/SBB/SBI/SUB/SUI`
`select_op2`	`ANA/ANI/ARHL/RAR/RRC`
`/use_latched_carry /rotate_carry`	`LDHI/LDSI/RLC/RRC`
`/carry_in_0`	0 except for `ACI/ADC/DAD/DSUB/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI`
`select_op1`	`ACI/ADC/ADD/ADI/ANA/ANI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI/XRA/XRI`
`select_ncarry_1`	`ACI/ADC/ADD/ADI/CMP/CPI/DAA/DAD/DCR/DSUB/INR/LDHI/LDSI/RAL/RDEL/RLC/SBB/SBI/SUB/SUI`
In combination with first control line, `write_dbus_to_alu_tmp`	`ADC/ADD/ANA/CMA/CMC/CMP/DAA/DCR/INR/MOV/ORA/RAL/RAR/RIM/RLC/RRC/SBB/SIM/STC/SUB/XRA`
`select_neg_in2`	`CMA/CMP/CPI/DSUB/SBB/SBI/SUB/SUI`
`carry_to_k_flag`	`DCX/INX`
`store_carry_flag`	`ACI/ADC/ADD/ADI/ANA/ANI/ARHL/CMC/CMP/CPI/DAA/DAD/DSUB/ORA/ORI/RAL/RAR/RDEL/RLC/RRC/SBB/SBI/STC/SUB/SUI/XRA/XRI`
`xor_carry_result`	xor for `ANA/ANI/CMC/CMP/CPI/DSUB/SBB/SBI/STC/SUB/SUI`
`/latch_carry use_carry_flag`	`CMC/LDHI/LDSI`

Conclusions

By reverse-engineering the 8085, we can see how the ALU actually works at the gate and silicon level. The ALU uses many standard techniques, but there are also some surprises and tricks. There are two registers (ACT and TMP) that are invisible to the programmer. You'd expect a direct path from the data bus to the accumulator, but instead the data passes through the ALU. The increment/decrement logic uses the unexpected constant 0xfe, and there are two totally different ways of performing increment/decrement. Several undocumented instructions perform ALU operations without involving the accumulator at all.

This information builds on the 8085 reverse-engineering done by the visual 6502 team. This team dissolves chips in acid to remove the packaging and then takes many close-up photographs of the die inside. Pavel Zima converted these photographs into mask layer images, generated a transistor net from the layers, and wrote a transistor-level 8085 simulator.