Ken Shirriff's blog

Sonicare toothbrush teardown: microcontroller, H bridge, and inductive charging

My Sonicare electric toothbrush recently quit working, so I took it apart and examined the interesting circuitry inside. There's much more complexity than I expected inside a toothbrush, especially in the mechanism that drives the brush head at 31,000 strokes per minute. Internally, the brush appears to be designed for quality rather than ease of manufacturing. Unfortunately, moisture can get in, causing reliability problems.

The toothbrush is a Sonicare Flexcare Platinum with more features than you'd expect in a toothbrush: three brushing modes, three intensities and a couple timers, along with 10 LEDs to indicate its status. A pressure sensor in the toothbrush changes the vibration if you apply too much pressure while brushing. The toothbrush uses wireless inductive charging so it charges when set on the base. (This toothbrush may seem overly complicated, but it's nothing compared to the new model that includes Bluetooth.)[1]

Disassembling the Sonicare toothbrush. At the left is the induction coil used for charging.

The first step was to remove the toothbrush base, allowing the toothbrush mechanism to be removed from the case. The toothbrush head mounts on the right; it needed to be removed to disassemble the toothbrush. At the left is the charging coil used to wirelessly charge the toothbrush.

The photos below show the top and bottom of the toothbrush internals. I expected to find a simple, low-cost mechanism, so I was surprised at how much complexity there was inside. The vibration mechanism (right) is built from multiple metal and plastic parts screwed together, requiring more expensive assembly than I expected. The circuit board is literally gold-plated and has a lot of components, even if it doesn't quite reach Apple's level of complexity. Overall, the toothbrush's internal design is high quality (except, of course, for the fact that it quit working, as did an earlier one).

Inside the Sonicare toothbrush, top and bottom composite view. The charging coil is at the left. The battery (red) is in the lower left. The coil that vibrates the brush is in the center and the brushing mechanism is at the right.

The brush contains several key components, as can be seen above. In the center is the large red coil that causes the toothbrush to vibrate. On the right is the vibration mechanism, which has a powerful magnet that is moved by the coil. The brush head snaps on at the right. The battery (red, left) takes up about a third of the toothbrush. The long, thin circuit board (green) has the circuitry to operate the toothbrush. A white spacer sits on top of the circuit board, with holes for the LEDs and buttons.

The photo below shows the brush mechanism partially disassembled and separated from the electronics. The toothbrush still powers on in this state, as you can see from the illuminated LEDs. Note the flexible brown ribbon cable between the center of the brush mechanism and the electronics board. This connects the pressure sensor on the brush mechanism to the electronics board.

The brush mechanism (left) separated from the electronics (right). Note the illuminated LEDs. Alto note the flexible brown ribbon connecting the pressure sensor to the electronics board.

The diagram below shows the main components on the circuit board. The buttons are the most visible feature. The gold circles at the left are used to program the microcontroller. The MOSFET transistor switch the coil on and off to produce vibrations. Ten LEDs are scattered across the board. At the right, the diode bridge is part of the charging circuit.

The circuit board for the Sonicare toothbrush is crammed with tiny parts. The gold circles on the left are used to program the microcontroller chip. The tiny gold circles scattered across the board are test points for testing the board during manufacturing.

The circuit board is covered with tiny gold circles. These are test points, allowing test connections to most parts of the board. For instance, each LED and each button has a test point that can be used to test the component. During testing, spring loaded pogo pins on the test circuit make contact with these test points on the toothbrush board. The number of test points (about 56) looks like overkill to me.

The diagram below shows the components on the back of the circuit board. The toothbrush is controlled by a mid-range 8-bit microcontroller, the PIC16F1516.[2] This chip contains the code for all the toothbrush functions: reading the buttons, lighting the LEDs, controlling the coil, and managing charging. There are too many LEDs (10) for the chip to control individually, so eight of the LEDs are controlled by a separate LED driver chip.[3]

The back of the Sonicare circuit board contains the PIC16F1516 microcontroller chip. The sensor is probably a Hall-effect magnetic field sensor.

The microcontroller is an off-the-shelf part, not a custom chip, so it needs to be programmed with the right software. This is done during manufacturing through the large gold circles and triangle near the end of the toothbrush.[4] The resonator provides the clock signal for the microcontroller's timing.[5]

The driver mechanism and the H bridge circuit

The toothbrush head is driven by an electromagnetic coil that moves a magnet. The coil has two halves, wired in opposite directions, so the sides will have opposite magnetic fields. The coil is pulsed one way to rotate the magnet one direction, and then pulsed the opposite way to rotate the magnet the other direction. The result is the high-speed brushing vibration.

The diagram below shows the driver mechanism disassembled. The coil constantly switches polarity so the north pole will switch from the top to bottom (the yellow and blue poles of the coil). The magnet has poles on the front and back edges (perpendicular to the coils), so it will attempt to rotate back and forth to line up with the coil, along the long axis of the toothbrush. The mechanism limits the rotation to a few degrees, resulting in a rotational vibration back and forth rather than spinning like a motor. This rotational vibration is transmitted to the toothbrush head by the torsion bar causing the head and bristles to vibrate. More details on the driver mechanism are here.

Sonicare toothbrush driver mechanism. As the polarity of the coil switches, the magnet rotates back and forth slightly. The torsion bar transmits the rotation to the shaft, which causes the toothbrush head to vibrate around its axis.

The figure below shows the voltage across the coil. Every 2 milliseconds, there is a 4 volt pulse across the coil, followed by a negative 4 volt pulse. The pulses generate the reversing magnetic field that drives the magnet and causes the toothbrush to vibrate. If you count the positive and negative pulses as separate brush strokes, you get the advertised 31,000 brush strokes per minute. (Although counting an up-down cycle as a single stroke rather than two would make more sense to me.)

Voltage across the actuator coil in a Sonicare toothbrush. An H bridge drives the coil with +/- 4 volt pulse every 2 milliseconds.

You might think that driving a coil in two directions would use two switches, but instead it uses four, in a common circuit called an H bridge, as shown below. If switches 1 and 4 are closed, current flows in the forward direction. If switches 2 and 3 are closed, current flows in the reverse direction. In the toothbrush, transistors are used for the switches, and are turned on and off by the microcontroller.[6] An H bridge is often used to control motors that need to go forwards and reverse, for example in a hoverboard.

An H bridge circuit is used to drive the vibration coil. This allows the coil to be off or energized in either direction. Four switches (MOSFET transistors) are used in the H bridge.

Pressure sensor

One of the features of this toothbrush is a pressure sensor. If you press too hard while brushing, the vibrations start pulsing and the LEDs flash. The sensor itself is a tiny mystery chip (below) mounted on the drive assembly, and connected to the electronics board with a thin flexible cable. The cable is labeled with Vdd (1), Data (2), Clock (3), and Ground (4), so the sensor is probably sending a stream of bits using an I2C protocol. My suspicion is the sensor is a Hall effect magnetic field sensor that detects a change in the magnetic field if pressure is preventing the magnet from vibrating. The chip doesn't seem to be in a position to measure actual pressure, which is why I suspect it's measuring the magnetic field instead.

The pressure sensor on the toothbrush is connected to the electronics via a flexible cable. The sensor is probably a Hall effect magnetic sensor using the I2C protocol.

Charging

To charge the toothbrush, it is set on a stand and charges inductively without physically being plugged in. A coil in the stand is magnetically coupled to a coil in the toothbrush, transmitting the power wirelessly. You can see the coil at the bottom of the toothbrush. When set on the stand, the coil picks up about 12 volts, which is used to charge the battery. The power is transmitted at high frequency (80kHz) for efficiency.

The coil is connected to a diode bridge that converts the power to DC. It then goes through a transistor circuit that regulates the charging, as directed by the microcontroller. The battery in the toothbrush is a Sanyo Li-ion rechargeable battery, which is said to be 3.7V but I measured 4.0V.[7]

Voltage across the charging coil in a Sonicare toothbrush oscillates about about 80kHz.

The toothbrush is designed to conserve battery by using very little power when not in use. The microcontroller has a low power standby mode when it is waiting for a button press. When the toothbrush is activated, a transistor energizes the LEDs and the LED driver chip, while another circuit powers up the pressure sensor. This prevents these components from draining the battery while the toothbrush is not in use.

Conclusion

Overall, I was surprised by how much electronics was inside the toothbrush, as well as the complexity of the drive mechanism. It was designed with quality in mind, not low-cost production. Unfortunately, the brush has reliability issues—this was the second one to fail on me. The problem appears to be water seeping in around the shaft, eventually damaging the internals.

Some other Sonicare teardowns are here, here and here. I would have expected different models to be based on similar electronics that just changed the LEDs, buttons and software. Surprisingly the different teardowns show a variety of microcontrollers, circuitry, and drive coils. Some models even move the magnets from the toothbrush unit to the brush head.

Unfortunately after disassembling my toothbrush I was unable to fix its problem. But at least I got an interesting teardown out of it!

To find out about my latest teardowns, follow kenshirriff on Twitter.

Notes and references

[1] It's ironic for a toothbrush to include Bluetooth technology because Bluetooth is named after Harald Bluetooth, a tenth century Danish king who was called Bluetooth because he had a bad, discolored tooth. The Bluetooth logo itself is formed by combining two runes from the king's name.

[2] The PIC microcontroller runs at 16 megahertz. It has 8K of flash memory for the program, as well as 512 bytes of RAM (the RAM on microcontrollers is usually very small) and 128 bytes of flash memory for data. It includes analog-to-digital conversion, which I think is used to monitor the charging voltage. The toothbrush's 8-bit microcontroller is less powerful than the 16-bit microcontroller inside a Macbook power supply.

[3] The LEDs are controlled by a 75HC595A serial to 8-bit output chip. The benefit of this chip is that the microcontroller would use 8 pins to control 8 LEDs, while the microcontroller only uses 3 pins to communicate with the serial chip, freeing up 5 pins for other tasks.

[4] Programming of the chip is done using the ISCP protocol. This uses the programming contacts labeled Vdd, Vpp, Tx, and Ground, as well as the triangle contact, which provides the ISCP data. For some reason, the Tx and Rx circles are also connected to the chips's UART serial pins, allowing serial communication with the microcontroller. I'm not sure why one would want to communicate with the chip outside programming. Maybe there's serial communication with the microcontroller as part of testing. Or maybe the NSA can download information on your brushing habits :-)

[5] The resonator is a 3-pin unit with built-in load capacitors, similar to a quartz crystal oscillator. I suspect it's a CERALOCK®, or something similar.

[6] The H bridge uses a 6866S 20V dual N-channel MOSFET on the low side and a 6963SD 20V dual P-channel MOSFET on the high side.

[7] The charger circuit is puzzlingly simple. The voltage from the diode bridge goes through a microcontroller-controlled transistor (Q5) and then to the battery (through a tiny fuse), without the filtering, voltage regulator or battery voltage monitoring I'd expect. The microcontroller is connected to the AC side of the diode bridge, and presumably is monitoring the input voltage waveform.

Restoring a Xerox Alto day 7: experiments with disk and Ethernet emulators

In this Alto restoration session we controlled the Alto's disk drive with an FPGA disk emulator and attempted booting the Alto with a BeagleBone-based Ethernet emulator. The GIF below shows the drive performing seeks as commanded by the emulator. (With the cover off the Diablo drive, you can see the disk head floating above the spinning disk surface and moving back and forth for seeks.) However, both emulators encountered some bugs, which we will need to fix.

Looking inside the Diablo disk drive, you can see the head moving over the disk's surface as disk seeks take place. The green dial on the right rotates to indicate the current track.

The Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. For posts on previous restoration days see 1, 2, 3, 4, 5, 6 and 6 update.) Marc's YouTube video on Day 7 is below:

In our previous session, we discovered a faulty 7414 inverter chip on the disk interface card was preventing the disk from working: one of the six inverters on the chip had failed, preventing the disk sector task from running. Since we didn't have a 7414 lying around the house, we used a "dead bug" hack (below) to replace the bad inverter on the chip with an unused one, allowing us to access the disk. This session, we replaced the bad 7414 with a new one since we didn't want our hack to be permanent.

We re-wired a 7414 inverter chip. An unused inverter replaced the failed inverter.

Last week, I discovered that our boot disk had been overwritten with random data decades ago to test the drive (details). This made it impossible to boot off our disk, blocking our progress. Tim Curley from Xerox PARC offered me some disks from PARC's collection of dozens of old Alto disks (below). Some people were concerned, though, that the disks could get damaged in a boot attempt, losing their historical data. To avoid damage, we decided not to boot these disks until we're sure the Alto is working properly and we have them archived. Instead, Josh Dersch at the Living Computer Museum in Seattle is sending us a fresh boot disk with no historical significance. Unfortunately we didn't get the disk in time for today's session, but we'll try it out next session.

Some old Xerox Alto hard disks at PARC. I borrowed a couple of them and we'll try reading them later.

The disk emulator

Our test setup to exercise the Diablo disk drive (center) with the FPGA board (front). The oscilloscope shows the sector pulses (top, blue), clock (middle, green), and data (bottom, yellow). Four sectors are visible on the bottom trace. The Xerox Alto is behind the oscilloscope. On the right are the power supply and the laptop controlling the FPGA board.

Carl built a Diablo disk emulator / exerciser from a FPGA board. The idea is we can hook this up to the Diablo drive to read and archive disks. Then we can connect the Emulator to the Alto and simulate multiple disk packs without physically handling disks. Building a disk emulator is complex because the drive itself implements very little functionality. It provides the raw bit stream as it is read off the disk, and the emulator needs to process this into bytes. In the photo above, the bottom oscilloscope trace shows several sectors as they are read from disk.

If you're not familiar with a FPGA (field-programmable gate array), it is a chip that can be programmed to generate custom hardware. The FPGA chip contain numerous logic blocks along with a switch matrix that allows them to be interconnected as desired. You describe the hardware configuration (gates, latches, and so forth) using a hardware description language such as Verilog and the chip is programmed to implement the desired circuitry.

The FPGA board for the emulator (below) is a Digilent Nexys 2 with a Xilinx Spartan-3E FPGA chip in the center of the board. This chip contains over ten thousand logic cells, allowing it to implement complex circuits. The FPGA board is connected to a prototyping board (right) with chips that shift the voltage levels to TTL as required by the Diablo drive. Carl's FPGA code generates the numerous signals required by the Diablo drive; in the photo below you can see the thick black cable going to the drive.

A Digilent FPGA board configured to control a Diablo disk drive.

We hooked up the FPGA board to the Diablo drive and tested it out. It communicated with the drive just fine and could read from different tracks. Unfortunately, the read data was zeros, which was surprising since the Alto successfully read from the disk last week. After some investigation, Carl found the problem was in the FPGA code that stored the data in RAM, not his code. (See his blog for details.) You'd think writing to RAM would be the easy part, but apparently not. The disk logic appears to work fine so hopefully next session we will be able to read and archive disks.

The Ethernet emulator

The Xerox Alto was the first system with Ethernet, introducing a lot of networking innovations. Unfortunately, it uses 3 Mb/second Ethernet over coaxial cable, which is incompatible with anything modern. I built an Ethernet emulator using a BeagleBone Black, allowing me to send Ethernet boot packets to the Alto. The photo below shows the BeagleBone, along with a chip (74AHCT125) to convert the BeagleBone's 3.3V signals to 5V TTL signals. (The Ethernet signals to and from the Alto are 5V TTL. These signals normally go to a transceiver, which converts these signals to signals over the network cable.) I'm using the BeagleBone's PRU microcontrollers to implement this code; I wrote a blog post with more about the PRUs.

A BeagleBone Black configured to emulate the 3Mb/s Ethernet on the Xerox Alto.

The emulator operates by converting a data block into the low-level signal required by Ethernet. A 0 bit is high-then-low and a 1 bit is low-then-high, with 170 nanosecond pulses. (Note that each data bit includes a transition (high-to-low or vice versa), which allows the receiver to detect bits and extract a clock signal.) My emulator almost worked; by using the logic analyzer, I saw the Ethernet microcode was running and the Alto was receiving data from my board. Unfortunately, there was about one bit error per word, making it unusable. The problem is probably interference due to the sketchy wiring I used; I'll try shielded wire next session.

Conclusion

This week we tried a Diablo disk emulator and an Ethernet emulator. They both partially worked, but still have some bugs. Next week we'll try booting the system with a new disk. I'm moderately optimistic that the system will come up successfully, but there could be more hardware problems waiting for us. For updates on the restoration, follow kenshirriff on Twitter.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help and sending out a boot disk. Thanks to Tim Curley and Xerox PARC for supplying additional disks.

The discussion of this post on Hacker News is here.

How to run C programs on the BeagleBone's PRU microcontrollers

This article describes how to write C programs for the BeagleBone's microcontrollers. The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. By using the PRUs, you can implement real-time functionality that isn't possible in Linux. The PRU microcontrollers can be programmed in C using an IDE, which is much easier than low-level assembler programming. I recently wrote an article about the PRU microcontrollers, explaining how to program them in assembler and describing how they interact with the main ARM processor; so read it for more background. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

A "blink" program in C

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED ten times. This example is based on PRU GPIO example but using C instead of assembly code.

Blinking an LED using the BeagleBone's PRU microcontroller.

The C code, below, flashes the LED ten times. The LED is controlled by setting or clearing a bit in register R30, which controls the GPIO pins. The code demonstrates two ways of performing delays. The first delay uses a for loop, leaving the LED on for 400 ms. The second delay uses the special compiler function __delay_cycles(), which delays for the specified number of cycles. Since the PRUs run at 200 MHz, each cycle is 5 nanoseconds, yielding an off time of 300 ms. At the end, the code sends an interrupt to the host code via register R31 to let it know the PRU has finished.[1]

How to compile C programs with Code Composer Studio

Although you can compile C programs directly on the BeagleBone,[2] it's more convenient to use an IDE. Texas Instruments provides Code Composer Studio (CCS), an integrated development environment on Windows and Linux that you can use to compile C programs for the PRU.[3] To install CCS, use the following steps:

Download CCS here. (You'll need to create a TI account and then fill out an export approval form before downloading, which seems so 1990s but isn't too difficult.)
Follow the instructions here to make sure you have the necessary dependencies or CCS installation will mysteriously fail.
In the installer, select Sitara 32-bit ARM Processors: GCC ARM Compiler and TI ARM Compiler.
In the add-ons dialog, selects PRU Compiler.
After installation, run CCS, select Help -> CCS App Center, and install the additional add-ons (i.e. the PRU compiler).

To create a C program in CCS, use the following steps. The image highlights the fields to update in the dialog.

Start CCS.
Click New Project.
Change target to AM3358.
Change tab to PRU.
Enter a project name, e.g. "test".
Open "Project templates and examples" and select "Basic PRU Project".
Click Finish.
Enter the code.

How to set up Code Composer Studio to compile a PRU program for the BeagleBone.

To set up the BeagleBone for the example:

Download the device tree file: /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Download the linker command file bin.cmd.
Download the host file that loads and runs the PRU code (loader.c) and compile it:
```
# gcc -o loader loader.c -lprussdrv
```

To compile and run the C program:

In CCS, select Project -> Build All (control-B) to compile the program.[4]
Copy the binary (test/Debug/test.out) to BeagleBone (e.g. with scp)

On the BeagleBone, link and run the program:[5]

# hexpru bin.cmd test.out
# ./loader text.bin data.bin

If everything went correctly, the LED should flash. (See my previous article for debugging help.)

In this example, loader simply loads and runs the executable on the PRU.[6] In a more advanced application, it would communicate with the PRU. For example, it could get commands from a web page, send them to the PRU, get results, and display them on the web. The point is that you can use the Linux-side code to do complex network or computational tasks, in combination with the PRU doing low-level, real-time hardware operations. It's kind of like having an Arduino together with a "real computer", in a tiny package.

The BeagleBone Black is a tiny computer that fits inside an Altoids mint tin. It is powered by the TI Sitara™ AM3358 processor, the large square chip in the center.

Documentation

The PRUs are very complex and don't have nice APIs, so you'll probably end up reading a lot of documentation to use them. The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information. More information on using CCS is here.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example. Some PRU example code is in the TI PRU training course.

The BeagleBone Black, with the AM3358 processor in the center. The 512MB DRAM chip is below, with the HDMI framer chip to the right of it. The 4GB flash chip is in the upper right.

Using a timer and interrupts

For a more complex example, I'll show how to use the PRU with a timer and interrupts.[7] The basic idea is the timer will trigger an interrupt at a set frequency. The PRU code in this example will toggle the GPIO pin when an interrupt occurs, generating a sequence of 5 pulses.[8]

It is important to understand that PRU interrupts are not "real" interrupts that interrupt execution, but are signaled through polling.[9] A PRU interrupt sets bit 30 or bit 31 in register R31.[10] The PRU code can busy-wait on this bit to determine if an interrupt has happened. This is fast and very low latency, compared to context-switching interrupt, but it puts more demands on the program structure.

The first step is to add the plumbing for the timer's interrupt, so the PRU will receive the interrupt. The PRUs can handle 64 different interrupt types from various subcomponents of the system. The timer interrupt is assigned system event number 15 and has the cryptic name pr1_ecap_intr_req. (See TRM table 4-22.) Interrupts are configured in the host side code (loader.c) using the PRUSSDRV library API call prussdrv_pruintc_init. To support the timer interrupt, The diagram below shows the complex PRU interrupt configuration on the BeagleBone (details). The new interrupt path, highlighted in red, connects the timer interrupt (15) to CHANNEL0 and in turn to register R31, the register for polling.

Interrupt handling on the BeagleBone for the PRU microcontrollers. The timer interrupt (15) is shown in red. The default interrupt configuration is extended so the timer interrupt will trigger bit 30 of R31.

To add interrupt 15 to the configuration as shown above, the configuration struct in loader.c must be modified. The following structure is passed to prussdrv_pruintc_init to set up the interrupt handling. The changes are highlighted in red. Without this change, timer interrupts will be ignored and the example code will not work.

#define PRUSS_INTC_CUSTOM {   \
 { PRU0_PRU1_INTERRUPT, PRU1_PRU0_INTERRUPT, PRU0_ARM_INTERRUPT, PRU1_ARM_INTERRUPT, \
   ARM_PRU0_INTERRUPT, ARM_PRU1_INTERRUPT,  15, (char)-1  },  \
 { {PRU0_PRU1_INTERRUPT,CHANNEL1}, {PRU1_PRU0_INTERRUPT, CHANNEL0}, {PRU0_ARM_INTERRUPT,CHANNEL2}, {PRU1_ARM_INTERRUPT, CHANNEL3}, \
   {ARM_PRU0_INTERRUPT, CHANNEL0}, {ARM_PRU1_INTERRUPT, CHANNEL1}, {15, CHANNEL0}, {-1,-1}},  \
 {  {CHANNEL0,PRU0}, {CHANNEL1, PRU1}, {CHANNEL2, PRU_EVTOUT0}, {CHANNEL3, PRU_EVTOUT1}, {-1,-1} },  \
 (PRU0_HOSTEN_MASK | PRU1_HOSTEN_MASK | PRU_EVTOUT0_HOSTEN_MASK | PRU_EVTOUT1_HOSTEN_MASK) \
}

The second step to using the timer is to initialize the timer to create interrupts at the desired frequency, as shown in the following code. Using PRU features is fairly difficult since you are controlling them through low-level registers, not a convenient API, so you'll probably need to study TRM section 15.3 to fully understand this. The basic idea is the timer counts up by 1 every cycle (PWM mode is enabled in ECCTL2). When the counter reaches the value in the APRD (period) register, it resets and triggers a "compare equal" interrupt (as controlled by ECEINT). Thus, interrupts will be generated with the period specified by DELAY_NS.

inline void init_pwm() {
  *PRU_INTC_GER = 1; // Enable global interrupts
  *ECAP_APRD = DELAY_NS / 5 - 1; // Set the period in cycles of 5 ns
  *ECAP_ECCTL2 = (1<<9) /* APWM */ | (1<<4) /* counting */;
  *ECAP_TSCTR = 0; // Clear counter
  *ECAP_ECEINT = 0x80; // Enable compare equal interrupt
  *ECAP_ECCLR = 0xff; // Clear interrupt flags
}

The final step is to wait for the interrupt to happen with a busy-wait. The while loop polls register R31 until the timer interrupt fires and sets bit 30. Then the interrupt is cleared in the PRU interrupt subsystem and in the timer subsystem.

inline void wait_for_pwm_timer() {
  while (!(__R31 & (1 << 30))) {} // Wait for timer compare interrupt
  *PRU_INTC_SICR = 15; // Clear interrupt
  *ECAP_ECCLR = 0xff; // Clear interrupt flags
}

The oscilloscope trace below shows the result of the timer example program: five precision pulses with a width of 100 nanoseconds on and 100 nanoseconds off. The important advantage of using the PRU microcontroller rather than the regular ARM processor is the output is stable and free of jitter. You don't need to worry about nondeterminism such as context switches or cache misses. If your application won't be affected by milliseconds of random delay, the regular processor is much easier to program, but if you require precision timing, you should use the PRU.

Using the BeagleBone Black's PRU microcontroller to generate pulses with a width of 100 nanoseconds.

The full source code for the timer example is here.[11] To run the timer example, you'll also need to use the updated loader.c that enables interrupt 15 (or else nothing will happen).

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing, but with a substantial learning curve. Programming the PRUs in C using the IDE is much easier than programming in assembler. (And you can embed assembler code in C if necessary.)

Combining the BeagleBone's full Linux environment with the PRU microcontrollers yields a very powerful system since the microcontrollers provide low-level real-time control, while the main processor gives you network connectivity, web serving, and all the other power of a "real" computer. (My current project using the PRU is a 3 megabit/second Ethernet emulator/gateway to connect to a Xerox Alto.)

Notes and references

[1] Delivering the interrupt to the host code is more complex than you'd expect. I wrote a longer description here, explaining details such as how event 3 on the PRU turns into event 0 on the host.

[2] To compile a C program on the BeagleBone, use the clpru command. See this article for details on clpru.

[3] Code Composer Studio isn't available for Mac, but CCS works well if you run Linux on your Mac using Parallels. I also tried running Linux in VirtualBox, but ran into too many problems.

[4] If you want to see the assembly code generated by the C compiler, use the following steps:

Go to Project -> Properties
Select the configuration you're building (Debug or Release)
Check Advanced Options -> Assembler Options: Keep the generated assembly language file. This adds the --keep_asm flag to the compile.

The resulting assembly file will be in Debug/main.asm. Although the file is hundreds of lines long, the actual generated code is much shorter, starting a few dozen lines into the file. Comments indicate which source lines correspond to the assembly lines.

[5] The hexpru utility converts the ELF-format file generated by the compiler into a raw image file that can be loaded onto the PRU. The bin.cmd file holds the command-line options for hexpru. See the PRU Assembly Language Tools manual for details.

You can configure Code Composer Studio to run hexpru automatically as part of compilation, by doing a bit of configuration. Follow the steps at here to enable and configure PRU Hex Utility.

[6] The loader.c code uses the PRU Linux Application Loader API (PRUSSDRV) to interact with the PRU. I'm told that the cool new framework is remoteproc, but I'll stick with PRUSSDRV for now. (There seems to be a great deal of churn in the BeagleBone world, with huge API changes in every kernel.)

[7] For a timer, I'll use the PRU's ECAP module, which can be configured for PWM and then used as a 32-bit timer. (Yes, this is confusing; see TRM section 15.3 for details.)

[8] This code is intended to demonstrate the timer, not show the best way to generate pulses. If you just want to generate pulses, use the PWM or even a simple delay loop.

[9] You might wonder why you'd use the PRU polling interrupts rather than just polling a device register directly. The reason is you can test the R31 register in one cycle, but reading a device register takes about 3 or 4 cycles (read latency details).

[10] The library uses the convention that PRU0 polls on bit 30 and PRU1 polls on bit 31, but this is arbitrary. You could use both bits to signal one PRU, for instance.

[11] One complexity in the timer source code is the need to define all the register addresses. To figure out a register address, find the address of the register block in the PRU Local Data Memory Map (TRM 4.3.1.2). Then add the offset of the register (TRM 4.5). Note that you can also access these registers from the Linux host side, but the addresses are different. (The PRU is mapped into the host's address space starting at 0x4a300000, TRM table 2.4.)

Restoring YC's Xerox Alto: how our boot disk was trashed with random data

In the previous Xerox Alto restoration session, we got the disk working, but the system didn't boot. After much investigation, I discovered the explanation for the boot failure: the disk has been overwritten with random data! This article describes my journey through the Alto microcode to determine what happened.

Inserting a disk into the Xerox Alto's disk drive. The Alto's video display is visible at the back.

For background, the Alto was a revolutionary computer designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore it, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). For posts on previous restoration days see 1, 2, 3, 4, 5 and 6.

Debugging the boot failure

Last session, after fixing a broken 7414 TTL chip on the disk interface board, we could fetch a block from disk but the Alto failed to boot. We used a logic analyzer to trace the microcode instructions and the ALU bus contents. Josh Dersch from the Living Computer Museum studied the traces and found that the boot program was executing a few instructions (jump, add, load), and then seemed to go off the rails. But it turns out things were more messed up than that.

I made a microcode trace browser to help figure out what was going on. With this program, I can step through an execution trace one micro-instruction at a time and see the corresponding source code line. (Click the image below for the live trace browser.) First, I examined the KWD (disk word task), which executes for each word from disk, and copies that word to memory. I verified that the disk read was working as expected. The second task of interest is the NOVEM (Nova emulator task), which runs a program. In our case, it runs the boot program as soon as it is loaded from disk. By examining this task, we can figure out what is going wrong with the boot process.

Xerox Alto microcode trace viewer. With the viewer, you can step through the execution trace collected by the logic analyzer and see each source code line as it is executed. The buttons on the right indicate which microcode task is running at each step.

By studying the disk read microcode (KWD) closely, I was able to extract each word in the disk sector from the logic analyzer trace. This was very difficult for many reasons. For example, we logged the ALU bus which doesn't have the words from disk. I had to figure out the disk contents by reversing the checksum computation, which was on the ALU bus. Another problem was the Alto stores sectors on disk backwards. But eventually I extracted the contents of the boot sector, as read into the Alto:

16a5 2d4a 5a94 b528 14db 29b6 536c a6d8
333b 6676 ccec e753 b02d 1ed1 3da2 7b44
...

I hand-disassembled these words into Data General Nova assembler code and discovered a few things. First, the first few instructions matched Josh's interpretation, so the CPU and the emulator task seemed to be working correctly. Second, the instructions didn't make any sense as code, and some words weren't even instructions, which explained why the boot rapidly fell apart. Third, and most puzzling, the instructions were nothing like what the Alto boot code was supposed to be.

Backplane of the Xerox Alto wired with logic analyzer probes. These probes monitor the executing micro-instructions and the contents of the ALU bus.

The boot block seemed to contain random junk. The problem wasn't flaky hardware generating bad data, because the block checksum validated correctly. This wasn't the drive returning the wrong sector, because the sector header was correct. The sector didn't contain instructions, it wasn't ASCII, and it didn't look like a sensible file format. As I studied the sector contents more, I wondered it the data was literally random. I made a histogram of how many times each byte value occurred, and it was pretty much uniform so (In comparison, archived Alto disk sectors showed very non-uniform distributions.) But why would the boot block have been overwritten with (pseudo-) random data?

Josh mentioned DiEx (Diablo disk exerciser), a utility program to diagnose problems with the Alto's Diablo disk drive, and suggested that it could have wiped the disk. I found the DiEx source code in the Computer History Museum's Alto archive, and sure enough, it has a feature to write random data to the disk (and then verify it).

Screenshot of the Diablo Disk Exerciser (DiEx) running on a Xerox Alto simulator. Note the early mouse-based GUI; clicking on an entry changes the value. Image courtesy of Nathan Lineback.

I could believe someone had inconveniently wiped our disk with the DiEx utility, but I still had nagging doubts that maybe we were seeing a hardware issue. Could I prove that DiEx was responsible? All I had to do was show that the disk data wasn't arbitrary, but came from DiEx.

Generating random numbers on the Alto

I found the source code for RANDOM.ASM, the Alto's random number code, in the Computer History Museum's Alto archive. This algorithm generates 16-bit random numbers with the recurrence formula: "x[n] = (x[n-33] + x[n-13]) mod 2^16". (Note that are very bad random numbers cryptographically since once you have 33 numbers in the sequence you can generate them all.) I wanted to see if the data we read from disk was generated from this function, so I coded up the algorithm. This was somewhat difficult as the original was written in Nova assembler code. The results didn't match the disk data, no matter what I tried. Finally, I realized that I could just use a brute force solution and ignore the details of the algorithm. I picked random pairs of values in the data and checked if their sum appeared in the data. If the data came from any sort of recurrence, I would get a bunch of matches, but I didn't. I concluded that the disk data wasn't generated from this random number algorithm.

However, on closer examination I noticed that the RANDOM.ASM function signature didn't match the DiEx code, so it probably wasn't the right function. After more searching I found TriexML.asm, another Alto random number function. To generate a random 16-bit word, this algorithm simply shifts the previous value one bit to the left. If there is an overflow, the result is xor'd with the number 077213. (It would be hard to come up with a cryptographically worse random number generator—from one number you can generate the whole sequence—but the algorithm is very fast.)

To check the disk contents against this algorithm, I skipped the careful implementation and went straight to brute force. To see if any shift-and-xor algorithm would explain our data, I shifted each word from the disk sector and xor'd it with the next one. In each case, I got either 0 or octal 077213, matching the algorithm. Starting the algorithm with 012345 (the seed value in the code) eventually generates the exact sector of data we read, proving this algorithm generated the random data we saw on the disk.

A few of the old Xerox Alto disks in Xerox PARC's collection. Hopefully they haven't been overwritten with junk.

Thus, someone had clobbered our disk (probably decades ago) while testing the drive with DiEx. Since we couldn't boot off this disk, we'd need a new boot disk. Xerox PARC has dozens of old Alto disks lying around and they offered some of them to us. But the Living Computer Museum offered to send us a working Alto disk, rather than risk damage to the potentially-interesting contents of an old PARC disk, so we'll use the LCM disk instead.

Conclusion

Last repair session, we fixed a failed 7414 inverter chip on the disk interface board. With that fixed, we could read the disk but boot still failed. After careful investigation of the microcode and traces, I discovered that our disk had been overwritten with random data making it impossible to boot from it. In one way this is a good result, since it means our boot wasn't failing because of a hardware problem.

When we get a new Alto disk, we'll try booting again. I'm moderately optimistic that the system will come up successfully, but there could be more hardware problems waiting for us. For updates on the restoration, follow kenshirriff on Twitter.

Thanks to Josh Dersch and the Living Computer Museum for their debugging help. Thanks to Tim Curley and Xerox PARC for supplying additional disks.

Restoring YCombinator's Xerox Alto day 6: Fixed a chip, data read from disk

In today's Xerox Alto restoration session we investigated why the disk drive isn't working and found a failed chip. With this chip repaired, we were able to read a block from disk, although the system still doesn't boot. (In previous episodes, we fixed the power supply, got the CRT display working, cleaned up the disk drive and hooked up a logic analyzer: days 1, 2, 3, 4 and 5.)

Our test setup for the Xerox Alto. The Alto computer itself is the metal cabinet in the center with the visible circuit boards. On the left is a vintage HP line printer, with the logic analyzer behind it. The video display for the Alto is visible on the right, behind the oscilloscope.

The Alto was a revolutionary computer, designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). Marc's video of this restoration session is below.

The missing disk sector task

In the Alto, like most modern computers, each machine instruction is implemented in an even more primitive form of code called microcode. But unlike most computers, the Alto also implements some of its low-level software in microcode. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, a microcode sector task processes each disk sector and a word task stores each word of data as it arrives from the disk drive; most computers do this with DMA hardware.

Last week we hooked a logic analyzer to the Alto to trace the executing microcode and found the disk sector task was failing to run. Each track on the Alto's hard disk is divided into 12 sectors, with 12 slots in the hub to indicate the sectors. We verified that the disk drive was detecting these slots and sending the sector pulses every 3.33 milliseconds. The disk sector task is supposed to run for each sector and perform any disk command, but the logic analyzer showed that this task was not running.

The hard disk pack for the Xerox Alto has 12 sectors. Slots cut into the disk hub trigger a signal for each sector. Four of the sector slots are labeled in the photo.

Why was the sector task not running? The disk interface board provides a signal to indicate when the sector task should run (WAKEST), but we found it was not being activated even though the disk drive was providing sector pulses to the disk interface board. Looking at the disk interface board schematic, the sector pulse circuit is fairly simple: just a few flip flops. (You don't need to understand the schematic below. The key point is the sector pulse comes in on the left, goes through a few chips, and the wakeup signal comes out on the right.) I've heard that old TTL flip flops fail regularly, so I figured one of the flip flop chips had failed. We decided to hook up an oscilloscope and see where things were going wrong, but one problem stood in our way.

Schematic from the Xerox Alto's disk controller card. This circuit processes sector pulses from the disk drive and generates signals to wake up the microcode sector task.

The extender card

The Alto consists of 13 circuit cards plugged into a wire-wrapped backplane, making them inaccessible to probing. Fortunately, the Living Computer Museum gave us an extender card, a board that goes between an Alto board and the backplane, physically extending the Alto board out of the cabinet where it can be diagnosed. Last week, we used the extender card to probe signals on the CPU control board. But no matter how hard I tried, I couldn't get the extender board to plug into the disk interface board's slot. Marc noticed out that the board was hitting something, and we realized that the disk interface board had a notch on the right, allowing the board to clear a bar that was in the way. The extender board, like most of the Alto boards, lacked this notch. A bit more investigation revealed that memory boards had a notch, but on the left.

Why did some boards have notches? Most of the boards are powered with 5 volts. The memory boards also require -5 volts and +12 volts for the 4116 DRAM chips. The I/O boards (Ethernet and disk) have +/- 15 volts as well as 5 volts. The Alto backplane was apparently designed so you couldn't plug a board into a slot with the wrong voltages (which would have been catastrophic). Boards with unusual power requirements had a notch that allowed them to fit into slots wired with unusual voltages. The consequence was that we couldn't use the extender board with the disk interface without cutting a notch in it, which we did (see photo below).

Milling a notch into the extender board.

We were worried that by cutting a notch in the extender board and using it in a slot where it wasn't intended we might destroy the computer in a spectacular show of sparks and smoke. The concern was that the extender board doesn't simply pass the 162 lines through, but wires all the ground lines to a ground plane and wires the +5 lines together. If the disk interface card had +15 volts where the extender board expected, say, +5 volts, the extender card would run +15 volts to all the chips and destroy them. We verified the wiring five times to make sure nothing would get shorted, plugged in the extender board, and turned the Alto on with some trepidation. Fortunately our calculations were correct and nothing blew up.

Debugging the disk interface

The photo below shows the disk interface card extended out of the Alto cabinet, with some oscilloscope probes attached to the flip flop chips. (The ribbon cable attached to the board connects to the disk drive, while the ribbon cable hanging above the board allows us to probe microcode signals with the logic analyzer.) Strangely, we didn't see any signals either going into the flip flops or coming out. We checked that the sector pulses were showing up in the logic analyzer, and on the connector from the disk drive, but the flip flops were getting nothing. Eventually we turned our attention to the inverter chip (see earlier schematic). We saw the sector signal going into the inverter, but not coming out. Could this simple chip be causing the problems?

Debugging the disk interface card in the Xerox Alto.

The 7414 TTL chip contains 6 inverters, which turn a 1 input into a 0 output and vice versa. We pulled the chip out of the disk interface board and tested it with a simple LED circuit (see photo below). Five of the six inverters worked fine, but one of the inverters had entirely failed. The chip is a bit unusual since it uses a Schmitt trigger—a circuit that cleans up noisy signals (such as the sector pulses that traveled over a long cable from the disk drive)—so we couldn't get a replacement at Fry's or Radio Shack. Were we stuck for the day?

Testing the 7414 inverter chip from the Xerox Alto's disk interface card. One inverter was burnt out, preventing the disk from working.

Fortunately we could work around the faulty chip. Carl studied the schematics and discovered that one of the good inverters on the chip was unused. We rewired the chip to replace the bad inverter with the unused good inverter by using an ugly but effective "dead bug" hack. We bent out the pins from the good inverter and attached wires. We cut off the pins from the bad inverter. Finally, we stuck the wires into the socket along with the IC, so the good inverter was wired in place of the bad inverter.

We re-wired a 7414 inverter chip. An unused inverter replaced the failed inverter.

We booted the Alto and found that our chip hack actually worked and the system worked much better than before: the sector pulses got through the inverter, were processed by the flip flops, and triggered the sector task as we hoped. The sector task read the disk command from memory and sent it to the disk drive. The disk drive read the desired sector and started sending bits back. For each word, the disk word task read the word from the disk interface and stored it in memory. In summary, we were now reading data from disk!

Reading data from disk was a big milestone, since most of the system needed to be working properly for this to happen. Unfortunately the Alto didn't boot up, and we'll need to figure out where things went wrong. Is the boot block not running correctly? Is the read data corrupted? Is the disk returning an error at some point? Is our disk not a boot disk? Strangely, there was no sign of the parity errors we kept seeing last week.

The timeline diagram below shows task switching in the Alto over an interval of 700 microseconds.. You can see that the microcode is constantly switching between tasks. Today's accomplishment can be seen in the periodic execution of disk word task (KWT) at the bottom of the image; this task runs about every 9 microseconds when each word comes from the disk drive. The disk sector task (KSEC) runs at the start of the next sector (at which time the word task stops). Other tasks are the memory refresh task (MRT) and cursor task (CURT) that run periodically. (You can see where the higher-priority MRT task interrupted the KSEC task.) The lowest priority task is the Nova emulator (NOVEM), which runs program code when nothing else is happening. The numbers at the bottom show the micro-instruction count since boot; at this point we are 14.8 milliseconds into the boot process. I generated the diagram below by processing the logic analyzer output to show each running task. An interactive version is here, allowing zoom and pan with the mouse.

Timeline showing task switching on the Xerox Alto. These are microcode tasks switched by hardware, not operating system level processes or threads.

Conclusion

In today's repair session, we found a failed 7414 inverter chip that was preventing disk operation. By working around that issue, we could finally read from disk, but boot is still failing for unknown reasons. Nonetheless, today's session got us much closer to a working system. We'll need to dig through the logic analyzer output to figure out where the boot process is breaking down.

Lacking safety features, cheap MacBook chargers create big sparks

You might wonder if it's worth spending $79 for a genuine MacBook charger when you can get a charger on eBay for under $15. You shouldn't get a cheap charger because they are often dangerous and lack safety features. In addition, they produce poor-quality power that isn't good for your laptop and may charge more slowly. I've written before about the safety problems with cheap chargers, but they say a picture is worth a thousand words, so here is why you shouldn't buy a cheap knockoff charger:

A knockoff MacBook charger emits large sparks if short-circuited. Genuine Apple chargers have safety features to protect against this.

If the connector comes in contact with something metal (a paperclip in this instance), it shorts out, creating a big spark. (Don't try this at home.) The genuine Apple charger (below) has safety features that protect against a short circuit. Shorting the connector on a genuine charger has no effect.

A genuine Apple MacBook charger has safety features that protect it from short circuits.

It's really hard to tell a genuine charger from a knockoff from the outside, since the knockoffs look just like the real thing. If you carefully read the text on this charger, you'll notice that "Apple" is missing. However, many knockoff chargers duplicate the text from a real charger, so often you can't tell if it is genuine or not just by looking. Big sparks, however, are a clear sign.

A cheap MacBook charger from eBay. Unlike most cheap chargers, this one doesn't claim to be an Apple charger, but just a "Replacement AC Adapter".

Why does a fake charger produce sparks, while a genuine one doesn't? The fake charger constantly outputs 20 volts, so if any metal shorts the connector, it produces a big spark with all its 85 watts of power. On the other hand, the genuine charger doesn't power up until it has been securely connected to the laptop for a full second. Until it is properly connected (details), it outputs a tiny amount of power (0.6 volts at 100µA) that can't produce a spark. To manage this, the genuine charger includes a powerful microcontroller (more powerful than the microprocessor in the original Macintosh by some measures). Since this processor increases the cost of the charger, knockoff chargers omit it, even though this makes the charger more dangerous.

As the photos below show, the cheap charger (left) omits as much as possible. On the other hand, the genuine Apple charger (right) is crammed full of components. Many of these components filter the power to provide higher-quality power to your laptop. The Apple charger also includes power factor correction, making the charger more efficient.

The cheap MacBook charger (left) omits most of the components found in a genuine Apple charger (right). The genuine charger includes more filtering, power factor correction (left), and a powerful microcontroller (board in upper right).

I've written in detail before about how chargers work, but I'll give a quick explanation here. The AC power comes in the red wires at the top and is converted to high-voltage DC (170V or 340V, depending on if you're in the US or Europe). A transistor (black component on left) chops the power into high-frequency pulses. The pulses create a changing magnetic field in the flyback transformer (large blue box), generating a high-current, low-voltage output. The output is converted to DC by diodes (black component, upper right), and filtered by capacitors (cylinders), to produce the 20 volt output (wires at bottom). A control IC (see photo below) controls the system to regulate the voltage. This may seem like an excessively complicated way to generate 20 volts, but switching power supplies like this are very compact, lightweight and efficient compared to simpler power supplies.

Shorting a cheap charger with a paperclip creates impressive sparks.

Looking at the underside of the cheap charger board shows it has very few components, while the genuine Apple charger's board is covered with tiny components. The two chargers are worlds apart as far as complexity, and this complexity is what provides more efficiency, more safety, and better quality power in the Apple charger.

The cheap MacBook charger (left) uses very simple circuits compared to the genuine Apple charger (right), which is crammed full of components.

Conclusion

While buying a cheap charger saves a lot of money, these chargers omit many safety features and can be hazardous to you and your computer. Don't buy a cheap knockoff charger; if you don't want to pay for a genuine Apple charger, at least buy a charger from a name-brand manufacturer.

Maybe you think these safety issues don't matter because you don't poke your charger with a paperclip. But if you have any metal objects on your desk, a random contact could yield a surprisingly large spark.

I've written a bunch of articles before about chargers, so if this article seems familiar, you're probably thinking of an earlier article, such as: Counterfeit MacBook charger teardown, Magsafe charger teardown, iPhone charger teardown or iPad charger teardown.

Follow me on Twitter to find out about my new articles.

Notes

If you're interested in the components inside the cheap charger, I have some details. The PWM control IC is a SiFirst 1560, a basic control IC for a flyback converter. The IC datasheet has the approximate schematic for the charger. The switching transistor is a 2N601 2 amp, 600 volt N-channel MOSFET. The voltage reference is an AZ431, similar to the ubiquitous TL431. The optoisolator is an 817C. The output diode is a MBRF20100C 10 amp Schottky diode pair. The electrolytic capacitors are from HKLCON.

A cheap charger emits large sparks if you short the connector with a paperclip. Safety features in a genuine charger protect against shorts.

Restoring YCombinator's Xerox Alto day 5: Microcode tracing with a logic analyzer

In today's Xerox Alto restoration session we investigated why the system doesn't boot. We find a broken wire, hook up a logic analyzer, generate a cloud of smoke, and discover that memory problems are preventing the computer from booting. (In previous episodes, we fixed the power supply, got the CRT display working and cleaned up the disk drive: days 1, 2, 3. and 4.)

The broken wire

The Xerox Alto is built from 13 circuit boards, crammed with TTL chips. In 1973, minicomputers such as the Alto were built from a whole bunch of simple ICs instead of a primitive microprocessor chip. (People still do this as a retro project.) The Alto's CPU is split across 3 boards: an ALU board, a control board, and a control RAM board. The control board is the focus of today's adventures.

If a circuit board has a design defect or needs changes, it can be modified by attaching new wires to create the necessary connections. The photo below shows the control board with several white modification wires. While examining the control board, we noticed one of the wires had come loose. Could the boot failures be simply due to a broken wire?

Control board from the Xerox Alto, showing a broken wire. The white wires were for a modification, but one wire came loose.

We carefully resoldered the wire and powered up the system. The disk drive slowly came up to speed and the heads lowered onto the disk surface. We pressed the reset button (under the keyboard) to boot. As before, nothing happened and the display remained blank. Fixing the wire had no effect.

After investigation, it appears the rework wires were to support the Trident/Tricon hard disk. In the photo above, note the small edge connector in the upper right, with the white wires connected. The Trident disk controller used this connector, but our (Diablo) disk controller does not. In other words, the broken wire might have caused problems with a different disk drive, but it was irrelevant to us.

Microcode on the Xerox Alto

Some background on the Xerox Alto's architecture will help motivate our day's investigation. The Alto, like most modern computers, is implemented using microcode. Computers are programmed in machine instructions, where each instruction may involve several steps. For instance, a "load" instruction may first compute a memory address by adding an offset to an index register. Then the address is sent to memory. Finally the contents of memory are stored into a register. Instead of hardcoding these steps (as done in the 6502 or Z-80 for instance), modern computers run a sequence of "micro-instructions", where each micro-instruction performs one step of the larger machine instructions. This technique, called microcode, is used by the Xerox Alto.

The Alto uses microcode much more heavily than most computers. The Alto not only uses microcode to implement the instruction set, but implements part of the software in microcode directly. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, most video displays pull pixels out of memory and display them on the screen. In the Alto, the processor itself fetches pixels out of memory and passes them to the video hardware. Similarly, most disk interfaces transfer data between memory and the disk drive. But in the Alto, the processor moves each data word to/from memory itself. The code to perform these tasks is written in microcode.

To perform all these low-level activities, the Alto hardware manages 16 different tasks, listed below. High-priority tasks (such as handling high-speed data from the disk) can take over from low-priority tasks, such as handling the display cursor. The lowest-level task is the "emulator", the task that executes program instructions. (In a normal computer, the emulator task is the only thing microcode is doing.) Remember, these tasks are not threads or processes handled by the operating system. These are microcode tasks, below the operating system and scheduled directly by the hardware.

Task	Name	Description
0	Emulator	Lowest priority.
1	-	unused
2	-	unused
3	-	unused
4	KSEC	Disk sector task
5	-	unused
6	-	unused
7	ETHER	Ethernet task
8	MRT	Memory refresh task. Wakeup every 38.08 microseconds.
9	DWT	Display word task
10	CURT	Cursor task
11	DHT	Display horizontal task
12	DVT	Display vertical task. Wakeup every 16.666 milliseconds.
13	PART	Parity task. Wakeup generated by parity error.
14	KWD	Disk word task
15	-	unused

Last episode, we found that processor was running the various tasks, but never tried to access the disk. System boot is started by the emulator task, which stores a disk command in memory. The disk sector task (KSEC) periodically checks if there are any disk commands to perform. Thus, it seemed like something was going wrong in either the emulator task (setting up the disk request), or the disk sector task (performing the disk request). To figure out exactly what was happening, we needed to hook up a logic analyzer.

The logic analyzer

A logic analyzer is a piece of test equipment a bit like an oscilloscope, except instead of measuring voltages, it just measures 0's or 1's. A logic analyzer also has dozens of inputs, allowing many signals to be analyzed at once. By using a logic analyzer, we can log every micro-instruction the processor runs, track each task, and even record every memory access.

Most of the signals of interest are available on the Alto's backplane, which connects all the circuit cards. Since the backplane is wire-wrapped, it consists of pins that conveniently fit the logic analyzer probes. For each signal, you need to find the right card, and then count the pins until you find the right pin to attach the probe. This setup is very tedious, but Marc patiently connected all the probes, while Carl entered the configuration into the logic analyzer.

The backplane of the Xerox Alto, with probes from the logic analyzer attached to trace microcode execution. Note the thick power wires on the left.

Unfortunately, a few important signals (the addresses of the micro-instructions) were not available on the backplane, and we needed to attach probes to one of the PROM chips that hold the microcode. Fortunately, the Living Computer Museum in Seattle gave us an extender card; by plugging the extender card into the backplane and the circuit board into the extender card, the board was accessible and we could connect the probes.

Probes from the logic analyzer hooked up to the Xerox Alto. By plugging the control board into an extension board, probes can be attached to it.

Hours later, with all the probes connected and the configuration programmed into the logic analyzer, we were ready to power up the system and collect data.

Running the logic analyzer

"Smoke! Stop! Shut it off!"

As soon as we flipped the power switch, smoke poured out of the backplane. Had we destroyed this rare computing artifact? What had gone wrong? When something starts smoking, it's usually pretty obvious where the problem is. In our case, one of the ground wires from the logic analyzer pod had melted, turning its insulation into smoke. A bit of discussion followed: "Pin 3 is ground, right?" "No, pin 9 is ground, pin 3 is 5 volts." "Oops." It turns out that when you short +5 and ground, a probe wire is no match for a 60 amp power supply. Fortunately, this wire was the only casualty of the mishap.

This logic probe wire melted when we accidentally connected +5 volts and ground with it.

With this problem fixed, we were able to get a useful trace from the logic analyzer. The trace showed that the Alto started off with the emulator/boot task. After just four instructions, execution switched to the disk word task, which was rapidly interrupted by the parity error task. When that task finished, execution went back to the disk word task, which was interrupted a few instructions later by the display vertical task. The disk word task was able to run a few more instructions before the display horizontal task ran, followed by the cursor task.

The vintage Agilent 1670G logic analyzer that we connected to the Xerox Alto. The screen shows the start of the Alto's boot sequence.

It's rather amazing how much task switching is going on in the Alto, with low-priority tasks only getting a few instructions executed before being interrupted by a higher-priority task. Looking at the trace made me realize how much overhead these tasks have. In our case, the emulator task is running the boot code, so progress towards boot requires looking at hundreds of instructions in the logic analyzer.

The key thing we noticed in the traces is the parity error task ran right near the start, indicating an error in memory. This would explain why the system doesn't boot up. We ran a few more boot cycles through the logic analyzer. The specific order of tasks varied each time, as you'd expect since they are triggered asynchronously from hardware events. But we kept seeing the parity errors.

The Alto's memory system

The Alto was built in the early days of semiconductor memory, when RAM chips were expensive and unreliable. The original Alto module used Intel's 1103 memory chips, which were the first commercially available DRAM chip, holding just 1 kilobit. To provide 128 kilobytes of memory, the Alto I used 16 boards crammed full of chips. (If you're reading this on a computer with 4 gigabytes of memory, think about how much memory capacity has improved since the 1970s.)

We have the later Alto II XM (extended memory) system, which used more advanced 16 kilobit chips to fit 512 kilobytes of storage onto 4 boards. Each memory board stored a 10 bit chunk—why 10 bits? Because memory chips were unreliable, the Alto used error correction. To store a 32-bit word pair, 6 bits of Hamming error correction were added, along with a parity bit, and one unused bit. The extra bits allow single-bit errors to be corrected and double-bit errors to be detected. The four memory boards in parallel stored 40 bits at a time—the 32 bit word pair and the extra bits for error correction.

A 128KB memory card from the Xerox Alto. The board has eighty 4116 DRAM chips, each with 16 kilobits of storage.

In addition to the 4 memory boards, the Alto has three circuit boards to control memory. The "MEAT" (Memory Extension And Terminator) is a trivial board to support four memory banks (the extended memory in the Alto XM). The "AIM" board (Address Interface Module) is a complex board that maps addresses to memory control signals, as well as handling memory-mapped peripherals such as the keyboard, mouse, and printer. Finally, the "DIM" board (Data Interface Module) generates the Hamming error correcting code signals, and performs error detection and correction.

More probing showed that the DIM board was always expressing a parity error. At this point, we're not sure if some of the memory chips are bad or if the complex circuitry on the DIM board is malfunctioning and reporting errors. As you can tell from the above description, the memory system on the Alto is complex. It may be a challenge to debug the memory and find out why we're getting errors.

A look at the microcode

In this section, I'll give a brief view of what the microcode looks like and how it appears in the logic analyzer. Microcode is generally hard to understand because it is at a very low level in the system, below the instruction set and running on the bare hardware. The Alto's microcode seems especially confusing.

Each Alto micro-instruction specifies an ALU operation and two "functions". A function can be something like "read a register" or "send an address to memory". But a function can also change meaning depending on what task is running. For instance, when the Ethernet task is running, a function might mean "do a four-way branch depending on the Ethernet state". But during the display task, the same function could mean "display these pixels on the screen". As a result, you can't figure out what an instruction does unless you know which task it is a part of.

The image below shows a small part of the logic analyzer output (as printed on Marc's vintage HP line printer). Each line corresponds to one executed micro-instruction. The "address" column shows the address of the micro-instruction in the 1K PROM storage. The task field shows which task is running. You can see the task switch midway through execution; 0 is the emulator and 13 is the parity task. Finally, the 32-bit micro-instruction is broken into fields such as RSEL (register select), ALUF (ALU function) and F1 (function 1).

The start of the logic analyzer trace from booting the Xerox Alto. The trace shows us each micro-instruction that was executed.

Note that the addresses jump around a lot; this is because the microcode isn't stored linearly in the PROM. Every micro-instruction has a "next instruction address" field in the instruction, so you can think of it as a GOTO inside every instruction. To make it worse, this field can be modified by the interface hardware, turning a GOTO into a computed GOTO. To make this work, the assembler shuffles instructions around in memory, so it's hard to figure out what code goes with a particular address. The point of this is that the logic analyzer output shows us every micro-instruction as it executes, but the output is somewhat difficult and tedious to interpret.

Fortunately we have the source code for the microcode, but understanding it is a challenge. The image below shows a small section of the boot code. I won't attempt to explain the microcode in detail, but want to give you a feel for what it is like. Labels (along the left) and jumps to labels are highlighted in blue. Things such as IR, L, and T are registers, and they get assigned values as indicated by the arrows. MAR is the memory address register (giving an address to memory) and MD is memory data, reading or writing the memory value.

A short section of the Xerox Alto's microcode. Labels and jumps are colored blue. Comments are gray.

Figuring out the control flow of the microcode requires detailed understanding of what is happening in the hardware. For example, in the last line above, ":Q0" indicates a jump to label "Q0". However the previous line says "BUS", which means the contents of the data bus are ORed into the address, turning the jump into a conditional jump to Q0, Q1, Q2, etc. depending on the bus value. And "TASK" indicates that a task switch can happen after the next instruction. So matching up the instructions in the logic analyzer output with instructions in the source code is non-trivial.

I should mention that the authors of the Alto's microcode were really amazing programers. An important feature for graphics displays is BITBLT, bit block transfer. The idea is to take an arbitrary rectangle of pixels in memory (such as a character, image, or window) and copy it onto the screen. The tricky part is that the regions may not be byte-aligned, so you may need to extract part of a byte, shift it over, and combine it with part of the destination byte. In addition, BITBLT supports multiple writing modes (copy, XOR, merge) and other features. So BITBLT is a difficult function to implement, even in a high-level language. The incredible part is that the Xerox Alto has BITBLT implemented in hundreds of lines of complex microcode! Using microcode for BITBLT made the operation considerably faster than implementing it in assembly code. (It also meant that BITBLT was used as a single machine language instruction.)

Conclusion

Hooking up the logic analyzer was time consuming, but succeeded in showing us exactly what was happening inside the Alto processor. Although interpreting the logic analyzer output and mapping it to the microcode source is difficult, we were able to follow the execution and determined that the parity task was running. It appears that memory parity errors are preventing the system from booting. Next step will be to understand the memory system in detail to determine where these errors are coming from and how to fix them.