Ken Shirriff's blog

Restoring YC's Xerox Alto day 9: tracing a crash through software and hardware

Last week, after months of restoration, we finally got the vintage Xerox Alto computer to boot (details) and run programs. However, some programs (such as the mouse-based Draw program) crashed so we knew there must still be a hardware problem somewhere in the system. In today's session we traced through the software, microcode and hardware to track down the cause of these crashes.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. The full collection of Alto restoration posts is here.

When the Xerox Alto encounters a problem, it drops into the Swat debugger.

To assist with debugging, the Alto includes a debugger called Swat. If a program malfunctions, it drops into the Swat debugger, as seen above. The debugger lets you examine memory and set breakpoints. It is more advanced than I'd expect for 1973, including a feature to disassemble machine instructions from memory and view them with names from the symbol table.

In our case, the debugger showed that when we ran the MADTEST test program, the Alto had jumped to address 2, which triggered the debugger. The first 8 memory locations in the Xerox Alto contain TRAP instructions to catch erroneous jumps to a zero address (or near-zero address) which can happen if a subroutine return address is clobbered. By examining the stack frames, I determined which subroutine had been called when the system crashed. The problem occurred when the program was jumping to microcode that had been loaded into microcode RAM, Since this is an unusual operation, it would explain why most programs ran successfully and only a few crashed.

Microcode

Microcode is a low-level feature of most processors, but I should explain what it means for a program to jump to microcode, since this is a strange feature of the Alto. Computers execute machine code, the simple, low-level instructions that the CPU can understand; on modern computers this may be the x86 instruction set, while the Alto used the Data General Nova instruction set. Most processors, however, don't run machine instructions directly, but have a microcode layer that is invisible to the programmer. While the processor appears to be running machine instructions, it internally executes microcode, a simpler, low-level instruction set that is a better match for the hardware. Each machine instruction may turn into many micro-instructions.

The Xerox Alto uses microcode much more extensively than most computers, with microcode performing tasks such as device control that most computers do in hardware, resulting in a cheaper and more flexible system. (As Alan Kay wrote, "Hardware is just software crystallized early.") On the Alto, programmers have access to the microcode—a user program can load new microcode into special control RAM. This microcode can implement new machine instructions, optimize particular operations (analogous to GPU programming), or obtain low-level control over the system.

The Xerox Alto's CRAM board (Control RAM) stores 1024 microcode instructions. The 32 memory chips in the lower left provide the 1024x32 storage. Foreshadowing: the connector at the lower left connects the CRAM board to the microcode control board.

Our Alto has 1024 words of microcode in ROM (for the standard microcode) and 1024 words in RAM (for software-controlled microcode). The photo above shows the CRAM (control RAM) board that holds the user-modifiable microcode. This board illustrates the incredible improvements in memory density since 1973—this board required 32 memory chips to hold the 1024 32-bit words (4 Kbytes) of microcode.

The Alto's microcode uses a 1K (10-bit) address space. Since Altos can support up to 2K of microcode in ROM and 3K in RAM, bank switching is used to switch between different 1K memory banks. Bank switching is triggered by a special micro-instruction called SWMODE (switch mode).

Getting back to our crash, the MADTEST test program loads special test microcode into the control RAM. Then it executes the JMPRAM machine instruction to switch execution from machine instructions to the microcode in RAM. The microcode that implements JMPRAM performs a SWMODE to switch execution to the RAM microcode bank and the microcode in RAM will execute. When the microcode is done, it is supposed to return execution to the machine code emulator, and execution of the user-level program (MADTEST) will continue. But somehow execution ended up at address 2, causing the program to crash.

To track down a problem with the Xerox Alto's bank switching circuit, we attached many probes to the CPU control board.

We used a logic analyzer to record every micro-instruction and memory access, so we could determine exactly what went wrong. After a few tries, we captured a trace showing what the Alto was executing until it crashed. Over the past week, I've been using the Living Computer Museum's ContrAlto simulator of the Xerox Alto to understand how the Alto's software and microcode work. With this background, I could interpret the logic analyzer output and map it to the MADTEST code and the microcode. Everything proceeded fine until the JMPRAM instruction was executed. Instead of switching to the microcode in RAM, it was still running microcode from the ROM. Since the micro-address was intended for the RAM code, the processor was running essentially random microcode. Through pure luck, this microcode routine completed and returned control to the regular machine code emulator rather than hanging the system. Unfortunately this code didn't load the return address register, resulting in a jump to address 2 and the Swat crash we saw.

To summarize, everything was working fine except instead of switching to the microcode RAM bank, execution stayed in the microcode ROM bank. This was pretty clearly a hardware problem, so we started looking at the bank switch circuit, which consists of multiple integrated circuits.

The bank switch hardware

The Alto was built at the dawn of the microprocessor age, so instead of using a microprocessor chip, it used three boards of TTL chips for the CPU. The control board interprets the microcode, including performing bank switching, so that's where we started our investigation.

Bank switching in the Alto happens when the SWMODE micro-instruction is executed. The destination bank is selected following complex rules that depend on the hardware configuration and the current bank. Rather than implement these rules with a complex hardware circuit, the Alto designers used the short cut of encoding all the logic into a 256x4 PROM chip. (This also has the advantage that a different hardware configuration can be supported simply by replacing the PROM.) The schematic below shows the PROM (left) generating the bank select signals (yellow), which pass through various chips to create the current bank select signals (right), which are fed back into the PROM for the next cycle.

This schematic shows the Xerox Alto's bank switching circuit, allowing microcode to run from ROM or RAM banks. (Click for larger image.)

We connected logic analyzer probes so we could trace each chip in the bank select circuit. The PROM correctly generated the RAM bank signals when the SWMODE micro-instruction executed, but in the next step its inputs had reverted to the ROM bank for some reason. This showed the PROM worked, so we continued probing through the circuit. Each chip had the proper output until we got to the multiplexer chip that feeds back to the PROM. (This chip, on the right, handles microcode task switching by selecting either the current task's bank, or the new tasks's bank, which is recorded in a RAM chip.) The input signal to the multiplexer pulsed high for the new bank, but the output stayed low, blocking the bank switch signal. The oscilloscope trace below shows the problem: the input signal (bottom trace) is not passed to the output (middle trace).

A multiplexer IC in the Xerox Alto was failing to pass the bank switch signal from its input (bottom trace) to its output (middle trace).

We found a bad chip on the disk interface board a few weeks ago, so had we located a second bad chip? We pulled out the suspicious chip (a 74S157 multiplexer) and tested it in a breadboard to prove that it was faulty. Surprisingly, it worked just fine. Perhaps the problem only showed up at high frequency? We swapped it with an identical chip on the board and the crash still happened. Clearly there was nothing wrong with the chip. But its output stayed low when it should go high. Why was this?

We thought this 74S157 multiplexer IC from the Xerox Alto was faulty. However, the chip worked fine when tested in a breadboard.

Our next theory was that something was grounding the chip's output signal, forcing the output to remain low. To test this, we disconnected the chip's output pin from the rest of the circuit by bending the pin so it didn't go into the socket, With the output not connected to the circuit, the output went high as expected. (See oscilloscope trace below.) This proved that the chip worked and something else was pulling the signal low. Since the chip's output was connected to the PROM chip, the obvious suspect was the PROM, which might have an input shorted low. We hoped the PROM chip wasn't at fault, since locating a 1970s-era D3601 PROM chip and reprogramming it would be inconvenient. We pulled the PROM chip out of the board and the short to ground remained, demonstrating the PROM chip was not the culprit.

With the multiplexer's output disconnected from the circuit, the input signal (bottom) appears on the output (top) as expected.

We removed the control board from the Alto to examine it for short circuits. On the back of the circuit board, we noticed that two white wires were connected to the multiplexer chip that was causing us problems. (Wires are added to printed circuit boards to fix manufacturing problems, support new features, or support new hardware.) These wires went to the connector that was cabled to the CRAM (control RAM) board shown earlier. With the CRAM board disconnected, the short to ground went away. Thus, the cause of our crashes was these two wires that someone had added to the board! Could we simply cut these wires and have the system work correctly? We figured we should understand why the wires were there, rather than randomly ripping them out. Maybe our control board and CRAM board were incompatible? Maybe these wires were to support the Trident disk drive we aren't using? It was the end of the day by this point, so further investigation will wait until next time.

This is the Xerox Alto control board, one of three boards that make up the CPU. The board has been modified with several white wires which trigger our crashes.

Conclusion

After a bunch of software, microcode and hardware debugging we found that the crashes are due to some wires added to one of the circuit boards. These wires messed up microcode bank switching, causing programs that used custom microcode to crash. Fixing this should be straightforward, but we want to understand the motivation behind these wires. On the whole, the processor is working reliably other than this one issue. Once it is fixed, we can run MADTEST (the microcode test program) to stress-test the processor. If there are no more processor issues, we'll move on to getting the mouse working.

For updates on the restoration, follow me on Twitter at kenshirriff. Thanks to the Living Computer Museum for the extender board and the ContrAlto simulator.

How I added 6 characters to Unicode (and you can too)

Edit (June 2018): Unicode 11 is released, containing the half stars. Now the wait for font support...

Star characters (☆★) have long been part of the Unicode standard, which means they can appear as characters in web pages, text, and email. But half-stars were missing, so they required special images or custom fonts. I recently co-wrote a proposal to add half-star characters to Unicode, and it was just accepted. In an upcoming Unicode release, half-stars will be usable like any other text character. In this article, I discuss how I got the half-star characters and two others added to Unicode.

Usage of the four different half-stars to express 3.5 of 5.

Unicode is the computer standard that defines the characters that are used by almost every computer—this standard allows different computers to easily display text in almost every language, and with almost every symbol you might need. (Before Unicode, dealing with non-English text on computers was a mess.) But Unicode doesn't include everything. Last June, a comment on Hacker News complained that Unicode lacked the half-star character used in ratings and movie reviews:

Until Unicode has a half-star character, it won't even be able to encode the average newspaper.

I suggested that someone should propose the half-star to Unicode, but quickly realized that "someone" would be me. Since I had successfully proposed two symbols to Unicode earlier, I knew the process necessary to get the half-star added.

A few years ago, a detailed article described how a couple people got power symbols added to Unicode. Adding a new character to Unicode is easier than most people think. You don't need to pay money, be part of a major company or join a committee. All you need to do is write a proposal explaining why the character is needed. If the Unicode Committee agrees, they'll approve your character for addition to Unicode.

In 2015, I started programming the 1960s-era IBM 1401 mainframe at the Computer History Museum. But when I wrote about the IBM 1401 system, I ran into a problem. This computer uses a 6-bit character set (the precursor to EBCDIC) with some strange characters. All these characters appeared in Unicode, with the exception of one: the Group Mark. I was a bit shocked that Unicode, with its 128,172 characters, lacked a character I needed. Having read about the power symbol team's success in adding characters, I figured it would be interesting to see if I could get the group mark character added to Unicode. I wrote a proposal, submitted it to Unicode, and at the next meeting it was approved.

The group mark character, from an IBM 705 computer manual (1959). Since Unicode lacked this character, you couldn't write this text on a modern computer.

A few months later, I learned that the Bitcoin symbol was missing from Unicode. This was a surprising omission, since the Bitcoin symbol is widely used in the real world. The symbol had been rejected before, so I made a more thorough proposal in October 2015 with the enthusiastic support of /r/bitcoin and other Bitcoin groups. The Bitcoin symbol proposal was accepted by the Unicode Committee in November 2015.

The Bitcoin symbol on an IBM punched card. Mining Bitcoins on a punched card mainframe isn't practical, but was an interesting experiment.

So when I saw the comment about half-stars on Hacker News, I figured it would be straightforward to get it accepted to Unicode. I wrote a proposal after discussion on HN and on the Unicode mailing list. The Unicode committee considered the proposal in August 2016, but to my surprise they had also received another half star proposal, so they decided to wait on a single proposal. It turned out that Andrew West had also written a proposal for half-stars, and we had both submitted proposals, unaware of the other. So Andrew and I joined forces and made a combined proposal, which was accepted by the committee Sept 30, 2016.

Why did we propose four different half-stars? We included both the outline half-star and solid half-star because both forms are commonly used. (I wasn't sure if the committee would consider these characters distinct enough to include both, but they did.) Right-to-left languages such as Hebrew do their star ratings right-to-left too (which was a bit of a surprise to me), so we included mirrored versions for RTL languages. Thus, the four different half-stars cover the range of uses.

Half-stars in Hebrew are written right-to-left. From Haaretz 2 November 2012, provided by Simon Montagu.

If there's a character that you want to add to Unicode and it meets the requirements, you should submit a proposal, since its a very interesting process and not too difficult. Make sure your character meets the criteria. In particular, you'll need to find a bunch of examples of the character used in text. The Unicode Committee isn't going to add a character just because you think it's cool, so you need examples to prove the character is in use. Creating a font to demonstrate your new character is probably the most challenging part; I used FontForge. The power symbol team has lots of helpful advice on making a successful proposal. I'm also happy to offer advice if you're writing a proposal.

I should mention that emojis have a totally different process, so don't argue that "since the poop emoji exists, my character should too". (The poop emoji 💩 was added for backwards compatibility with Japanese mobile phones.) For emoji, expected popularity of the symbol is a major factor in acceptance. Regular Unicode, on the other hand, isn't concerned with popularity—historical scripts such as Tangut won't get a millionth the usage of a new emoji—but with existing usage in text. (Reading between the lines, I think a lot of the Unicode committee wishes they weren't in the emoji business at all.)

Once a character is accepted, there's still a long road for it to appear in fonts and be usable. A new version of Unicode is released typically every June, so the half stars will probably appear in Unicode 11.0 mid-2018. The Bitcoin community in particular has had to wait patiently since the Bitcoin symbol just missed the cutoff for Unicode 9.0, so it will probably appear in Unicode 10.0, mid-2017. So if you're patient, eventually you'll be able to use the group mark, Bitcoin symbol and half stars in web pages and text just like any other symbol.

Restoring a vintage Xerox Alto day 8: it boots!

We've been restoring a Xerox Alto from the 1970s for several months, and we finally got it to boot and run some programs! There's still some hardware debugging ahead of us, since the Alto drops into the debugger for many programs, but we're quite happy to see the system running. In this post, I describe our latest debugging session and show some programs running on the Alto.

The Xerox Alto, successfully booted and listing the files on the disk. The diagonal strips are an artifact of photographing the CRT and do not appear on the display.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. For posts on previous restoration days see parts 1, 2, 3, 4, 5, 6, 6 update and 7.

The new boot disk

In an earlier session, we discovered that our boot disk had been used for drive testing decades earlier and was filled with random garbage, making it impossible to boot from the disk. Fortunately, the Living Computer Museum in Seattle sent us a new boot disk, loaded with diagnostic software. I received a vintage Digital RK05K-11 disk cartridge box:

Box for a vintage Digital RK05K-11 disk cartridge

Inside the box was the 14" disk. Despite its size, the disk cartridge only hold 2.5 megabytes, a tangible indication of the exponential improvements in disk density since the 1970s. We loaded the disk into the Alto's Diablo drive, waited a minute for the disk to spin up to speed and the heads to load, and Ed eagerly pressed the reset button. Would we be lucky and successfully boot the Alto? After all the anticipation, nothing happened.

An Alto diagnostic boot disk, sent to us by the Living Computer Museum in Seattle.

Why won't the system boot?

Since we had successfully loaded a disk sector (of random data) earlier, we knew that the system was working end-to-end, from the drive through the disk interface card and into the processor boards and memory. One possibility was that the alignment was different between our drive and the Living Computer Museum's drive, corrupting the data. Needing to hand-align our drive would be very difficult, so we hoped that wasn't the problem.

To see the words as they came off the disk, we added more logic analyzer probes to the Alto's backplane to trace the processor bus. At this point, the backplane is liberally decorated with probes, allowing us to monitor the buses and microcode execution in detail.

We added more probes to the Alto's backplane to monitor the processor bus. The probes are connected to a vintage Agilent logic analyzer.

Using the logic analyzer, we could step through the microcode to see each disk word getting loaded into memory, but the data didn't match the boot sector we expected. The Alto stores each sector on disk as a 2-word header (holding the disk address), an 8-word label (holding a next block pointer), and the 256-word data block. Although the data seemed wrong, more interesting was the octal value 000100 in the header coming from disk. (The Alto uses octal, causing us no end of confusion.) This header value corresponds to a disk address of cylinder 8, not the boot sector 0. Could we be reading the wrong sector?

By removing the cover from the Diablo drive, you can watch it seek. Unlike modern hard drives, the Alto's disk isn't sealed so you can see the disk surface and head when the disk is loaded in the drive.

Looking inside the Diablo disk drive, you can see the head moving over the disk's surface as disk seeks take place. The green dial on the right rotates to indicate the current track. These seeks are from an earlier test, not from boot.

Watching the drive as the Alto attempted to boot, we saw the disk arm seek, which it shouldn't have done to read from boot sector 0. The seek dial rotated to cylinder 8—as the logic analyzer suggested, the Alto was trying to boot from the wrong disk cylinder, which clearly wouldn't work.

Inside the Diablo disk drive, the turquoise sector indicator shows the drive has seeked to sector 8.

Since the drive seeked correctly last week, why was it trying to read from the wrong cylinder today? Were we suffering another chip failure on the disk interface card? Had something malfunctioned in the drive? We pored over the disk interface schematics and suspected a problem with the nine cylinder select lines between the Alto and the drive. In particular, a malfunction in the CYL(5) line could set the cylinder to 8, causing the seek we saw. (Bits on the Alto are inconveniently numbered backwards, so cylinder bit 5 corresponds to the value 8.)

We noticed a scratch in the 40-conductor ribbon cable between the Alto and the disk drive, exposing a wire. Could this be the cause of our problems? We carefully checked continuity and found no problems with the cable despite the scratch, so we hooked the cable back up along with an oscilloscope to monitor the offending signal, so we could debug the problem.

Running the Alto

We tried booting the Alto again, watching for the seek problem. This time the disk unexpectedly performed multiple seeks. And then the boot screen appeared on the Alto. We had a running system!

The Xerox Alto screen after booting, waiting for a command.

A few months ago, I had used the Salto simulator to see how the Alto worked. But now, facing a working system, I couldn't remember the commands. To see the files, was it LIST, or DIR? No. How about HELP? No good. After a minute or two, I remembered that a simple question mark was the command to list the disk, and I got a list of files. The system was working well enough to read a directory.

I tried running the WYSIWYG text editor Bravo and the mouse-based drawing program Draw, but they crashed, dropping the system into the debugger, Swat. Clearly some hardware problems remain and our debugging adventure is not over yet.

The Alto's debugger is called Swat, and runs if there is an error.

Some programs ran successfully. The CRT test program drew grids on the bitmapped screen. The CRT is a bit fuzzy in the upper left, but the quality is surprisingly good considering that this tube was almost too dim to see a few months ago. Apparently running the tube a while restored it by burning contaminants off the cathode (or something mysterious tube-era phenomenon like that).

The Xerox Alto running a CRT test program. Antique mechanical calculators are in the background.

The Ethernet diagnostic program ran and showed off the mouse-based GUI. I'm developing a BeagleBone-based Ethernet simulator for the Alto, so this program will be very helpful. We don't have a gridded optical mouse pad, so the mouse didn't work and we couldn't click anything.

The Alto's Ethernet Diagnostic Program uses a mouse-based GUI.

The keyboard test program graphically displays the keyboard and shows each key as it is pressed. We used this to verify the keys all work.

The Alto running the keyboard test program. Antique calculators are in the background.

A closeup of the Alto's keyboard test programming. It highlights keys when they are pressed.

Conclusion

It was an exciting day, with the Alto finally booting successfully. A disk seek problem blocked us for a while, but then the problem mysteriously disappeared. We ran a bunch of test programs from the disk. About half of them ran successfully, and half crashed into the debugger. There may be a malfunction in the processor that we need to track down. Or perhaps we're getting memory errors; the parity errors we saw earlier could have returned. In any case, we have some more debugging ahead of us, but it's exciting to see the system finally running. Hopefully we will soon be playing Alto Trek and Maze War.

For updates on the restoration, follow me on Twitter at kenshirriff.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help and sending out the boot disk.

Sonicare toothbrush teardown: microcontroller, H bridge, and inductive charging

My Sonicare electric toothbrush recently quit working, so I took it apart and examined the interesting circuitry inside. There's much more complexity than I expected inside a toothbrush, especially in the mechanism that drives the brush head at 31,000 strokes per minute. Internally, the brush appears to be designed for quality rather than ease of manufacturing. Unfortunately, moisture can get in, causing reliability problems.

The toothbrush is a Sonicare Flexcare Platinum with more features than you'd expect in a toothbrush: three brushing modes, three intensities and a couple timers, along with 10 LEDs to indicate its status. A pressure sensor in the toothbrush changes the vibration if you apply too much pressure while brushing. The toothbrush uses wireless inductive charging so it charges when set on the base. (This toothbrush may seem overly complicated, but it's nothing compared to the new model that includes Bluetooth.)[1]

Disassembling the Sonicare toothbrush. At the left is the induction coil used for charging.

The first step was to remove the toothbrush base, allowing the toothbrush mechanism to be removed from the case. The toothbrush head mounts on the right; it needed to be removed to disassemble the toothbrush. At the left is the charging coil used to wirelessly charge the toothbrush.

The photos below show the top and bottom of the toothbrush internals. I expected to find a simple, low-cost mechanism, so I was surprised at how much complexity there was inside. The vibration mechanism (right) is built from multiple metal and plastic parts screwed together, requiring more expensive assembly than I expected. The circuit board is literally gold-plated and has a lot of components, even if it doesn't quite reach Apple's level of complexity. Overall, the toothbrush's internal design is high quality (except, of course, for the fact that it quit working, as did an earlier one).

Inside the Sonicare toothbrush, top and bottom composite view. The charging coil is at the left. The battery (red) is in the lower left. The coil that vibrates the brush is in the center and the brushing mechanism is at the right.

The brush contains several key components, as can be seen above. In the center is the large red coil that causes the toothbrush to vibrate. On the right is the vibration mechanism, which has a powerful magnet that is moved by the coil. The brush head snaps on at the right. The battery (red, left) takes up about a third of the toothbrush. The long, thin circuit board (green) has the circuitry to operate the toothbrush. A white spacer sits on top of the circuit board, with holes for the LEDs and buttons.

The photo below shows the brush mechanism partially disassembled and separated from the electronics. The toothbrush still powers on in this state, as you can see from the illuminated LEDs. Note the flexible brown ribbon cable between the center of the brush mechanism and the electronics board. This connects the pressure sensor on the brush mechanism to the electronics board.

The brush mechanism (left) separated from the electronics (right). Note the illuminated LEDs. Alto note the flexible brown ribbon connecting the pressure sensor to the electronics board.

The diagram below shows the main components on the circuit board. The buttons are the most visible feature. The gold circles at the left are used to program the microcontroller. The MOSFET transistor switch the coil on and off to produce vibrations. Ten LEDs are scattered across the board. At the right, the diode bridge is part of the charging circuit.

The circuit board for the Sonicare toothbrush is crammed with tiny parts. The gold circles on the left are used to program the microcontroller chip. The tiny gold circles scattered across the board are test points for testing the board during manufacturing.

The circuit board is covered with tiny gold circles. These are test points, allowing test connections to most parts of the board. For instance, each LED and each button has a test point that can be used to test the component. During testing, spring loaded pogo pins on the test circuit make contact with these test points on the toothbrush board. The number of test points (about 56) looks like overkill to me.

The diagram below shows the components on the back of the circuit board. The toothbrush is controlled by a mid-range 8-bit microcontroller, the PIC16F1516.[2] This chip contains the code for all the toothbrush functions: reading the buttons, lighting the LEDs, controlling the coil, and managing charging. There are too many LEDs (10) for the chip to control individually, so eight of the LEDs are controlled by a separate LED driver chip.[3]

The back of the Sonicare circuit board contains the PIC16F1516 microcontroller chip. The sensor is probably a Hall-effect magnetic field sensor.

The microcontroller is an off-the-shelf part, not a custom chip, so it needs to be programmed with the right software. This is done during manufacturing through the large gold circles and triangle near the end of the toothbrush.[4] The resonator provides the clock signal for the microcontroller's timing.[5]

The driver mechanism and the H bridge circuit

The toothbrush head is driven by an electromagnetic coil that moves a magnet. The coil has two halves, wired in opposite directions, so the sides will have opposite magnetic fields. The coil is pulsed one way to rotate the magnet one direction, and then pulsed the opposite way to rotate the magnet the other direction. The result is the high-speed brushing vibration.

The diagram below shows the driver mechanism disassembled. The coil constantly switches polarity so the north pole will switch from the top to bottom (the yellow and blue poles of the coil). The magnet has poles on the front and back edges (perpendicular to the coils), so it will attempt to rotate back and forth to line up with the coil, along the long axis of the toothbrush. The mechanism limits the rotation to a few degrees, resulting in a rotational vibration back and forth rather than spinning like a motor. This rotational vibration is transmitted to the toothbrush head by the torsion bar causing the head and bristles to vibrate. More details on the driver mechanism are here.

Sonicare toothbrush driver mechanism. As the polarity of the coil switches, the magnet rotates back and forth slightly. The torsion bar transmits the rotation to the shaft, which causes the toothbrush head to vibrate around its axis.

The figure below shows the voltage across the coil. Every 2 milliseconds, there is a 4 volt pulse across the coil, followed by a negative 4 volt pulse. The pulses generate the reversing magnetic field that drives the magnet and causes the toothbrush to vibrate. If you count the positive and negative pulses as separate brush strokes, you get the advertised 31,000 brush strokes per minute. (Although counting an up-down cycle as a single stroke rather than two would make more sense to me.)

Voltage across the actuator coil in a Sonicare toothbrush. An H bridge drives the coil with +/- 4 volt pulse every 2 milliseconds.

You might think that driving a coil in two directions would use two switches, but instead it uses four, in a common circuit called an H bridge, as shown below. If switches 1 and 4 are closed, current flows in the forward direction. If switches 2 and 3 are closed, current flows in the reverse direction. In the toothbrush, transistors are used for the switches, and are turned on and off by the microcontroller.[6] An H bridge is often used to control motors that need to go forwards and reverse, for example in a hoverboard.

An H bridge circuit is used to drive the vibration coil. This allows the coil to be off or energized in either direction. Four switches (MOSFET transistors) are used in the H bridge.

Pressure sensor

One of the features of this toothbrush is a pressure sensor. If you press too hard while brushing, the vibrations start pulsing and the LEDs flash. The sensor itself is a tiny mystery chip (below) mounted on the drive assembly, and connected to the electronics board with a thin flexible cable. The cable is labeled with Vdd (1), Data (2), Clock (3), and Ground (4), so the sensor is probably sending a stream of bits using an I2C protocol. My suspicion is the sensor is a Hall effect magnetic field sensor that detects a change in the magnetic field if pressure is preventing the magnet from vibrating. The chip doesn't seem to be in a position to measure actual pressure, which is why I suspect it's measuring the magnetic field instead.

The pressure sensor on the toothbrush is connected to the electronics via a flexible cable. The sensor is probably a Hall effect magnetic sensor using the I2C protocol.

Charging

To charge the toothbrush, it is set on a stand and charges inductively without physically being plugged in. A coil in the stand is magnetically coupled to a coil in the toothbrush, transmitting the power wirelessly. You can see the coil at the bottom of the toothbrush. When set on the stand, the coil picks up about 12 volts, which is used to charge the battery. The power is transmitted at high frequency (80kHz) for efficiency.

The coil is connected to a diode bridge that converts the power to DC. It then goes through a transistor circuit that regulates the charging, as directed by the microcontroller. The battery in the toothbrush is a Sanyo Li-ion rechargeable battery, which is said to be 3.7V but I measured 4.0V.[7]

Voltage across the charging coil in a Sonicare toothbrush oscillates about about 80kHz.

The toothbrush is designed to conserve battery by using very little power when not in use. The microcontroller has a low power standby mode when it is waiting for a button press. When the toothbrush is activated, a transistor energizes the LEDs and the LED driver chip, while another circuit powers up the pressure sensor. This prevents these components from draining the battery while the toothbrush is not in use.

Conclusion

Overall, I was surprised by how much electronics was inside the toothbrush, as well as the complexity of the drive mechanism. It was designed with quality in mind, not low-cost production. Unfortunately, the brush has reliability issues—this was the second one to fail on me. The problem appears to be water seeping in around the shaft, eventually damaging the internals.

Some other Sonicare teardowns are here, here and here. I would have expected different models to be based on similar electronics that just changed the LEDs, buttons and software. Surprisingly the different teardowns show a variety of microcontrollers, circuitry, and drive coils. Some models even move the magnets from the toothbrush unit to the brush head.

Unfortunately after disassembling my toothbrush I was unable to fix its problem. But at least I got an interesting teardown out of it!

To find out about my latest teardowns, follow kenshirriff on Twitter.

Notes and references

[1] It's ironic for a toothbrush to include Bluetooth technology because Bluetooth is named after Harald Bluetooth, a tenth century Danish king who was called Bluetooth because he had a bad, discolored tooth. The Bluetooth logo itself is formed by combining two runes from the king's name.

[2] The PIC microcontroller runs at 16 megahertz. It has 8K of flash memory for the program, as well as 512 bytes of RAM (the RAM on microcontrollers is usually very small) and 128 bytes of flash memory for data. It includes analog-to-digital conversion, which I think is used to monitor the charging voltage. The toothbrush's 8-bit microcontroller is less powerful than the 16-bit microcontroller inside a Macbook power supply.

[3] The LEDs are controlled by a 75HC595A serial to 8-bit output chip. The benefit of this chip is that the microcontroller would use 8 pins to control 8 LEDs, while the microcontroller only uses 3 pins to communicate with the serial chip, freeing up 5 pins for other tasks.

[4] Programming of the chip is done using the ISCP protocol. This uses the programming contacts labeled Vdd, Vpp, Tx, and Ground, as well as the triangle contact, which provides the ISCP data. For some reason, the Tx and Rx circles are also connected to the chips's UART serial pins, allowing serial communication with the microcontroller. I'm not sure why one would want to communicate with the chip outside programming. Maybe there's serial communication with the microcontroller as part of testing. Or maybe the NSA can download information on your brushing habits :-)

[5] The resonator is a 3-pin unit with built-in load capacitors, similar to a quartz crystal oscillator. I suspect it's a CERALOCK®, or something similar.

[6] The H bridge uses a 6866S 20V dual N-channel MOSFET on the low side and a 6963SD 20V dual P-channel MOSFET on the high side.

[7] The charger circuit is puzzlingly simple. The voltage from the diode bridge goes through a microcontroller-controlled transistor (Q5) and then to the battery (through a tiny fuse), without the filtering, voltage regulator or battery voltage monitoring I'd expect. The microcontroller is connected to the AC side of the diode bridge, and presumably is monitoring the input voltage waveform.

Restoring a Xerox Alto day 7: experiments with disk and Ethernet emulators

In this Alto restoration session we controlled the Alto's disk drive with an FPGA disk emulator and attempted booting the Alto with a BeagleBone-based Ethernet emulator. The GIF below shows the drive performing seeks as commanded by the emulator. (With the cover off the Diablo drive, you can see the disk head floating above the spinning disk surface and moving back and forth for seeks.) However, both emulators encountered some bugs, which we will need to fix.

Looking inside the Diablo disk drive, you can see the head moving over the disk's surface as disk seeks take place. The green dial on the right rotates to indicate the current track.

The Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. For posts on previous restoration days see 1, 2, 3, 4, 5, 6 and 6 update.) Marc's YouTube video on Day 7 is below:

In our previous session, we discovered a faulty 7414 inverter chip on the disk interface card was preventing the disk from working: one of the six inverters on the chip had failed, preventing the disk sector task from running. Since we didn't have a 7414 lying around the house, we used a "dead bug" hack (below) to replace the bad inverter on the chip with an unused one, allowing us to access the disk. This session, we replaced the bad 7414 with a new one since we didn't want our hack to be permanent.

We re-wired a 7414 inverter chip. An unused inverter replaced the failed inverter.

Last week, I discovered that our boot disk had been overwritten with random data decades ago to test the drive (details). This made it impossible to boot off our disk, blocking our progress. Tim Curley from Xerox PARC offered me some disks from PARC's collection of dozens of old Alto disks (below). Some people were concerned, though, that the disks could get damaged in a boot attempt, losing their historical data. To avoid damage, we decided not to boot these disks until we're sure the Alto is working properly and we have them archived. Instead, Josh Dersch at the Living Computer Museum in Seattle is sending us a fresh boot disk with no historical significance. Unfortunately we didn't get the disk in time for today's session, but we'll try it out next session.

Some old Xerox Alto hard disks at PARC. I borrowed a couple of them and we'll try reading them later.

The disk emulator

Our test setup to exercise the Diablo disk drive (center) with the FPGA board (front). The oscilloscope shows the sector pulses (top, blue), clock (middle, green), and data (bottom, yellow). Four sectors are visible on the bottom trace. The Xerox Alto is behind the oscilloscope. On the right are the power supply and the laptop controlling the FPGA board.

Carl built a Diablo disk emulator / exerciser from a FPGA board. The idea is we can hook this up to the Diablo drive to read and archive disks. Then we can connect the Emulator to the Alto and simulate multiple disk packs without physically handling disks. Building a disk emulator is complex because the drive itself implements very little functionality. It provides the raw bit stream as it is read off the disk, and the emulator needs to process this into bytes. In the photo above, the bottom oscilloscope trace shows several sectors as they are read from disk.

If you're not familiar with a FPGA (field-programmable gate array), it is a chip that can be programmed to generate custom hardware. The FPGA chip contain numerous logic blocks along with a switch matrix that allows them to be interconnected as desired. You describe the hardware configuration (gates, latches, and so forth) using a hardware description language such as Verilog and the chip is programmed to implement the desired circuitry.

The FPGA board for the emulator (below) is a Digilent Nexys 2 with a Xilinx Spartan-3E FPGA chip in the center of the board. This chip contains over ten thousand logic cells, allowing it to implement complex circuits. The FPGA board is connected to a prototyping board (right) with chips that shift the voltage levels to TTL as required by the Diablo drive. Carl's FPGA code generates the numerous signals required by the Diablo drive; in the photo below you can see the thick black cable going to the drive.

A Digilent FPGA board configured to control a Diablo disk drive.

We hooked up the FPGA board to the Diablo drive and tested it out. It communicated with the drive just fine and could read from different tracks. Unfortunately, the read data was zeros, which was surprising since the Alto successfully read from the disk last week. After some investigation, Carl found the problem was in the FPGA code that stored the data in RAM, not his code. (See his blog for details.) You'd think writing to RAM would be the easy part, but apparently not. The disk logic appears to work fine so hopefully next session we will be able to read and archive disks.

The Ethernet emulator

The Xerox Alto was the first system with Ethernet, introducing a lot of networking innovations. Unfortunately, it uses 3 Mb/second Ethernet over coaxial cable, which is incompatible with anything modern. I built an Ethernet emulator using a BeagleBone Black, allowing me to send Ethernet boot packets to the Alto. The photo below shows the BeagleBone, along with a chip (74AHCT125) to convert the BeagleBone's 3.3V signals to 5V TTL signals. (The Ethernet signals to and from the Alto are 5V TTL. These signals normally go to a transceiver, which converts these signals to signals over the network cable.) I'm using the BeagleBone's PRU microcontrollers to implement this code; I wrote a blog post with more about the PRUs.

A BeagleBone Black configured to emulate the 3Mb/s Ethernet on the Xerox Alto.

The emulator operates by converting a data block into the low-level signal required by Ethernet. A 0 bit is high-then-low and a 1 bit is low-then-high, with 170 nanosecond pulses. (Note that each data bit includes a transition (high-to-low or vice versa), which allows the receiver to detect bits and extract a clock signal.) My emulator almost worked; by using the logic analyzer, I saw the Ethernet microcode was running and the Alto was receiving data from my board. Unfortunately, there was about one bit error per word, making it unusable. The problem is probably interference due to the sketchy wiring I used; I'll try shielded wire next session.

Conclusion

This week we tried a Diablo disk emulator and an Ethernet emulator. They both partially worked, but still have some bugs. Next week we'll try booting the system with a new disk. I'm moderately optimistic that the system will come up successfully, but there could be more hardware problems waiting for us. For updates on the restoration, follow kenshirriff on Twitter.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help and sending out a boot disk. Thanks to Tim Curley and Xerox PARC for supplying additional disks.

The discussion of this post on Hacker News is here.

How to run C programs on the BeagleBone's PRU microcontrollers

This article describes how to write C programs for the BeagleBone's microcontrollers. The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. By using the PRUs, you can implement real-time functionality that isn't possible in Linux. The PRU microcontrollers can be programmed in C using an IDE, which is much easier than low-level assembler programming. I recently wrote an article about the PRU microcontrollers, explaining how to program them in assembler and describing how they interact with the main ARM processor; so read it for more background. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

A "blink" program in C

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED ten times. This example is based on PRU GPIO example but using C instead of assembly code.

Blinking an LED using the BeagleBone's PRU microcontroller.

The C code, below, flashes the LED ten times. The LED is controlled by setting or clearing a bit in register R30, which controls the GPIO pins. The code demonstrates two ways of performing delays. The first delay uses a for loop, leaving the LED on for 400 ms. The second delay uses the special compiler function __delay_cycles(), which delays for the specified number of cycles. Since the PRUs run at 200 MHz, each cycle is 5 nanoseconds, yielding an off time of 300 ms. At the end, the code sends an interrupt to the host code via register R31 to let it know the PRU has finished.[1]

How to compile C programs with Code Composer Studio

Although you can compile C programs directly on the BeagleBone,[2] it's more convenient to use an IDE. Texas Instruments provides Code Composer Studio (CCS), an integrated development environment on Windows and Linux that you can use to compile C programs for the PRU.[3] To install CCS, use the following steps:

Download CCS here. (You'll need to create a TI account and then fill out an export approval form before downloading, which seems so 1990s but isn't too difficult.)
Follow the instructions here to make sure you have the necessary dependencies or CCS installation will mysteriously fail.
In the installer, select Sitara 32-bit ARM Processors: GCC ARM Compiler and TI ARM Compiler.
In the add-ons dialog, selects PRU Compiler.
After installation, run CCS, select Help -> CCS App Center, and install the additional add-ons (i.e. the PRU compiler).

To create a C program in CCS, use the following steps. The image highlights the fields to update in the dialog.

Start CCS.
Click New Project.
Change target to AM3358.
Change tab to PRU.
Enter a project name, e.g. "test".
Open "Project templates and examples" and select "Basic PRU Project".
Click Finish.
Enter the code.

How to set up Code Composer Studio to compile a PRU program for the BeagleBone.

To set up the BeagleBone for the example:

Download the device tree file: /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Download the linker command file bin.cmd.
Download the host file that loads and runs the PRU code (loader.c) and compile it:
```
# gcc -o loader loader.c -lprussdrv
```

To compile and run the C program:

In CCS, select Project -> Build All (control-B) to compile the program.[4]
Copy the binary (test/Debug/test.out) to BeagleBone (e.g. with scp)

On the BeagleBone, link and run the program:[5]

# hexpru bin.cmd test.out
# ./loader text.bin data.bin

If everything went correctly, the LED should flash. (See my previous article for debugging help.)

In this example, loader simply loads and runs the executable on the PRU.[6] In a more advanced application, it would communicate with the PRU. For example, it could get commands from a web page, send them to the PRU, get results, and display them on the web. The point is that you can use the Linux-side code to do complex network or computational tasks, in combination with the PRU doing low-level, real-time hardware operations. It's kind of like having an Arduino together with a "real computer", in a tiny package.

The BeagleBone Black is a tiny computer that fits inside an Altoids mint tin. It is powered by the TI Sitara™ AM3358 processor, the large square chip in the center.

Documentation

The PRUs are very complex and don't have nice APIs, so you'll probably end up reading a lot of documentation to use them. The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information. More information on using CCS is here.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example. Some PRU example code is in the TI PRU training course.

The BeagleBone Black, with the AM3358 processor in the center. The 512MB DRAM chip is below, with the HDMI framer chip to the right of it. The 4GB flash chip is in the upper right.

Using a timer and interrupts

For a more complex example, I'll show how to use the PRU with a timer and interrupts.[7] The basic idea is the timer will trigger an interrupt at a set frequency. The PRU code in this example will toggle the GPIO pin when an interrupt occurs, generating a sequence of 5 pulses.[8]

It is important to understand that PRU interrupts are not "real" interrupts that interrupt execution, but are signaled through polling.[9] A PRU interrupt sets bit 30 or bit 31 in register R31.[10] The PRU code can busy-wait on this bit to determine if an interrupt has happened. This is fast and very low latency, compared to context-switching interrupt, but it puts more demands on the program structure.

The first step is to add the plumbing for the timer's interrupt, so the PRU will receive the interrupt. The PRUs can handle 64 different interrupt types from various subcomponents of the system. The timer interrupt is assigned system event number 15 and has the cryptic name pr1_ecap_intr_req. (See TRM table 4-22.) Interrupts are configured in the host side code (loader.c) using the PRUSSDRV library API call prussdrv_pruintc_init. To support the timer interrupt, The diagram below shows the complex PRU interrupt configuration on the BeagleBone (details). The new interrupt path, highlighted in red, connects the timer interrupt (15) to CHANNEL0 and in turn to register R31, the register for polling.

Interrupt handling on the BeagleBone for the PRU microcontrollers. The timer interrupt (15) is shown in red. The default interrupt configuration is extended so the timer interrupt will trigger bit 30 of R31.

To add interrupt 15 to the configuration as shown above, the configuration struct in loader.c must be modified. The following structure is passed to prussdrv_pruintc_init to set up the interrupt handling. The changes are highlighted in red. Without this change, timer interrupts will be ignored and the example code will not work.

#define PRUSS_INTC_CUSTOM {   \
 { PRU0_PRU1_INTERRUPT, PRU1_PRU0_INTERRUPT, PRU0_ARM_INTERRUPT, PRU1_ARM_INTERRUPT, \
   ARM_PRU0_INTERRUPT, ARM_PRU1_INTERRUPT,  15, (char)-1  },  \
 { {PRU0_PRU1_INTERRUPT,CHANNEL1}, {PRU1_PRU0_INTERRUPT, CHANNEL0}, {PRU0_ARM_INTERRUPT,CHANNEL2}, {PRU1_ARM_INTERRUPT, CHANNEL3}, \
   {ARM_PRU0_INTERRUPT, CHANNEL0}, {ARM_PRU1_INTERRUPT, CHANNEL1}, {15, CHANNEL0}, {-1,-1}},  \
 {  {CHANNEL0,PRU0}, {CHANNEL1, PRU1}, {CHANNEL2, PRU_EVTOUT0}, {CHANNEL3, PRU_EVTOUT1}, {-1,-1} },  \
 (PRU0_HOSTEN_MASK | PRU1_HOSTEN_MASK | PRU_EVTOUT0_HOSTEN_MASK | PRU_EVTOUT1_HOSTEN_MASK) \
}

The second step to using the timer is to initialize the timer to create interrupts at the desired frequency, as shown in the following code. Using PRU features is fairly difficult since you are controlling them through low-level registers, not a convenient API, so you'll probably need to study TRM section 15.3 to fully understand this. The basic idea is the timer counts up by 1 every cycle (PWM mode is enabled in ECCTL2). When the counter reaches the value in the APRD (period) register, it resets and triggers a "compare equal" interrupt (as controlled by ECEINT). Thus, interrupts will be generated with the period specified by DELAY_NS.

inline void init_pwm() {
  *PRU_INTC_GER = 1; // Enable global interrupts
  *ECAP_APRD = DELAY_NS / 5 - 1; // Set the period in cycles of 5 ns
  *ECAP_ECCTL2 = (1<<9) /* APWM */ | (1<<4) /* counting */;
  *ECAP_TSCTR = 0; // Clear counter
  *ECAP_ECEINT = 0x80; // Enable compare equal interrupt
  *ECAP_ECCLR = 0xff; // Clear interrupt flags
}

The final step is to wait for the interrupt to happen with a busy-wait. The while loop polls register R31 until the timer interrupt fires and sets bit 30. Then the interrupt is cleared in the PRU interrupt subsystem and in the timer subsystem.

inline void wait_for_pwm_timer() {
  while (!(__R31 & (1 << 30))) {} // Wait for timer compare interrupt
  *PRU_INTC_SICR = 15; // Clear interrupt
  *ECAP_ECCLR = 0xff; // Clear interrupt flags
}

The oscilloscope trace below shows the result of the timer example program: five precision pulses with a width of 100 nanoseconds on and 100 nanoseconds off. The important advantage of using the PRU microcontroller rather than the regular ARM processor is the output is stable and free of jitter. You don't need to worry about nondeterminism such as context switches or cache misses. If your application won't be affected by milliseconds of random delay, the regular processor is much easier to program, but if you require precision timing, you should use the PRU.

Using the BeagleBone Black's PRU microcontroller to generate pulses with a width of 100 nanoseconds.

The full source code for the timer example is here.[11] To run the timer example, you'll also need to use the updated loader.c that enables interrupt 15 (or else nothing will happen).

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing, but with a substantial learning curve. Programming the PRUs in C using the IDE is much easier than programming in assembler. (And you can embed assembler code in C if necessary.)

Combining the BeagleBone's full Linux environment with the PRU microcontrollers yields a very powerful system since the microcontrollers provide low-level real-time control, while the main processor gives you network connectivity, web serving, and all the other power of a "real" computer. (My current project using the PRU is a 3 megabit/second Ethernet emulator/gateway to connect to a Xerox Alto.)

Notes and references

[1] Delivering the interrupt to the host code is more complex than you'd expect. I wrote a longer description here, explaining details such as how event 3 on the PRU turns into event 0 on the host.

[2] To compile a C program on the BeagleBone, use the clpru command. See this article for details on clpru.

[3] Code Composer Studio isn't available for Mac, but CCS works well if you run Linux on your Mac using Parallels. I also tried running Linux in VirtualBox, but ran into too many problems.

[4] If you want to see the assembly code generated by the C compiler, use the following steps:

Go to Project -> Properties
Select the configuration you're building (Debug or Release)
Check Advanced Options -> Assembler Options: Keep the generated assembly language file. This adds the --keep_asm flag to the compile.

The resulting assembly file will be in Debug/main.asm. Although the file is hundreds of lines long, the actual generated code is much shorter, starting a few dozen lines into the file. Comments indicate which source lines correspond to the assembly lines.

[5] The hexpru utility converts the ELF-format file generated by the compiler into a raw image file that can be loaded onto the PRU. The bin.cmd file holds the command-line options for hexpru. See the PRU Assembly Language Tools manual for details.

You can configure Code Composer Studio to run hexpru automatically as part of compilation, by doing a bit of configuration. Follow the steps at here to enable and configure PRU Hex Utility.

[6] The loader.c code uses the PRU Linux Application Loader API (PRUSSDRV) to interact with the PRU. I'm told that the cool new framework is remoteproc, but I'll stick with PRUSSDRV for now. (There seems to be a great deal of churn in the BeagleBone world, with huge API changes in every kernel.)

[7] For a timer, I'll use the PRU's ECAP module, which can be configured for PWM and then used as a 32-bit timer. (Yes, this is confusing; see TRM section 15.3 for details.)

[8] This code is intended to demonstrate the timer, not show the best way to generate pulses. If you just want to generate pulses, use the PWM or even a simple delay loop.

[9] You might wonder why you'd use the PRU polling interrupts rather than just polling a device register directly. The reason is you can test the R31 register in one cycle, but reading a device register takes about 3 or 4 cycles (read latency details).

[10] The library uses the convention that PRU0 polls on bit 30 and PRU1 polls on bit 31, but this is arbitrary. You could use both bits to signal one PRU, for instance.

[11] One complexity in the timer source code is the need to define all the register addresses. To figure out a register address, find the address of the register block in the PRU Local Data Memory Map (TRM 4.3.1.2). Then add the offset of the register (TRM 4.5). Note that you can also access these registers from the Linux host side, but the addresses are different. (The PRU is mapped into the host's address space starting at 0x4a300000, TRM table 2.4.)

Restoring YC's Xerox Alto: how our boot disk was trashed with random data

In the previous Xerox Alto restoration session, we got the disk working, but the system didn't boot. After much investigation, I discovered the explanation for the boot failure: the disk has been overwritten with random data! This article describes my journey through the Alto microcode to determine what happened.

Inserting a disk into the Xerox Alto's disk drive. The Alto's video display is visible at the back.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore it, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). For posts on previous restoration days see 1, 2, 3, 4, 5 and 6.

Debugging the boot failure

Last session, after fixing a broken 7414 TTL chip on the disk interface board, we could fetch a block from disk but the Alto failed to boot. We used a logic analyzer to trace the microcode instructions and the ALU bus contents. Josh Dersch from the Living Computer Museum studied the traces and found that the boot program was executing a few instructions (jump, add, load), and then seemed to go off the rails. But it turns out things were more messed up than that.

I made a microcode trace browser to help figure out what was going on. With this program, I can step through an execution trace one micro-instruction at a time and see the corresponding source code line. (Click the image below for the live trace browser.) First, I examined the KWD (disk word task), which executes for each word from disk, and copies that word to memory. I verified that the disk read was working as expected. The second task of interest is the NOVEM (Nova emulator task), which runs a program. In our case, it runs the boot program as soon as it is loaded from disk. By examining this task, we can figure out what is going wrong with the boot process.

Xerox Alto microcode trace viewer. With the viewer, you can step through the execution trace collected by the logic analyzer and see each source code line as it is executed. The buttons on the right indicate which microcode task is running at each step.

By studying the disk read microcode (KWD) closely, I was able to extract each word in the disk sector from the logic analyzer trace. This was very difficult for many reasons. For example, we logged the ALU bus which doesn't have the words from disk. I had to figure out the disk contents by reversing the checksum computation, which was on the ALU bus. Another problem was the Alto stores sectors on disk backwards. But eventually I extracted the contents of the boot sector, as read into the Alto:

16a5 2d4a 5a94 b528 14db 29b6 536c a6d8
333b 6676 ccec e753 b02d 1ed1 3da2 7b44
...

I hand-disassembled these words into Data General Nova assembler code and discovered a few things. First, the first few instructions matched Josh's interpretation, so the CPU and the emulator task seemed to be working correctly. Second, the instructions didn't make any sense as code, and some words weren't even instructions, which explained why the boot rapidly fell apart. Third, and most puzzling, the instructions were nothing like what the Alto boot code was supposed to be.

Backplane of the Xerox Alto wired with logic analyzer probes. These probes monitor the executing micro-instructions and the contents of the ALU bus.

The boot block seemed to contain random junk. The problem wasn't flaky hardware generating bad data, because the block checksum validated correctly. This wasn't the drive returning the wrong sector, because the sector header was correct. The sector didn't contain instructions, it wasn't ASCII, and it didn't look like a sensible file format. As I studied the sector contents more, I wondered it the data was literally random. I made a histogram of how many times each byte value occurred, and it was pretty much uniform so (In comparison, archived Alto disk sectors showed very non-uniform distributions.) But why would the boot block have been overwritten with (pseudo-) random data?

Josh mentioned DiEx (Diablo disk exerciser), a utility program to diagnose problems with the Alto's Diablo disk drive, and suggested that it could have wiped the disk. I found the DiEx source code in the Computer History Museum's Alto archive, and sure enough, it has a feature to write random data to the disk (and then verify it).

Screenshot of the Diablo Disk Exerciser (DiEx) running on a Xerox Alto simulator. Note the early mouse-based GUI; clicking on an entry changes the value. Image courtesy of Nathan Lineback.

I could believe someone had inconveniently wiped our disk with the DiEx utility, but I still had nagging doubts that maybe we were seeing a hardware issue. Could I prove that DiEx was responsible? All I had to do was show that the disk data wasn't arbitrary, but came from DiEx.

Generating random numbers on the Alto

I found the source code for RANDOM.ASM, the Alto's random number code, in the Computer History Museum's Alto archive. This algorithm generates 16-bit random numbers with the recurrence formula: "x[n] = (x[n-33] + x[n-13]) mod 2^16". (Note that are very bad random numbers cryptographically since once you have 33 numbers in the sequence you can generate them all.) I wanted to see if the data we read from disk was generated from this function, so I coded up the algorithm. This was somewhat difficult as the original was written in Nova assembler code. The results didn't match the disk data, no matter what I tried. Finally, I realized that I could just use a brute force solution and ignore the details of the algorithm. I picked random pairs of values in the data and checked if their sum appeared in the data. If the data came from any sort of recurrence, I would get a bunch of matches, but I didn't. I concluded that the disk data wasn't generated from this random number algorithm.

However, on closer examination I noticed that the RANDOM.ASM function signature didn't match the DiEx code, so it probably wasn't the right function. After more searching I found TriexML.asm, another Alto random number function. To generate a random 16-bit word, this algorithm simply shifts the previous value one bit to the left. If there is an overflow, the result is xor'd with the number 077213. (It would be hard to come up with a cryptographically worse random number generator—from one number you can generate the whole sequence—but the algorithm is very fast.)

To check the disk contents against this algorithm, I skipped the careful implementation and went straight to brute force. To see if any shift-and-xor algorithm would explain our data, I shifted each word from the disk sector and xor'd it with the next one. In each case, I got either 0 or octal 077213, matching the algorithm. Starting the algorithm with 012345 (the seed value in the code) eventually generates the exact sector of data we read, proving this algorithm generated the random data we saw on the disk.

A few of the old Xerox Alto disks in Xerox PARC's collection. Hopefully they haven't been overwritten with junk.

Thus, someone had clobbered our disk (probably decades ago) while testing the drive with DiEx. Since we couldn't boot off this disk, we'd need a new boot disk. Xerox PARC has dozens of old Alto disks lying around and they offered some of them to us. But the Living Computer Museum offered to send us a working Alto disk, rather than risk damage to the potentially-interesting contents of an old PARC disk, so we'll use the LCM disk instead.

Conclusion

Last repair session, we fixed a failed 7414 inverter chip on the disk interface board. With that fixed, we could read the disk but boot still failed. After careful investigation of the microcode and traces, I discovered that our disk had been overwritten with random data making it impossible to boot from it. In one way this is a good result, since it means our boot wasn't failing because of a hardware problem.

When we get a new Alto disk, we'll try booting again. I'm moderately optimistic that the system will come up successfully, but there could be more hardware problems waiting for us. For updates on the restoration, follow kenshirriff on Twitter.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help. Thanks to Tim Curley and Xerox PARC for supplying additional disks.