Ken Shirriff's blog

PRU tips: Understanding the BeagleBone's built-in microcontrollers

The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. While the PRUs provide the real-time processing capability lacking in Linux, using these processors has a learning curve. In this article, I show how to run a simple program on the PRU, and then dive into the libraries and device drivers to show what is happening behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

The BeagleBone uses the Sitara AM3358, an ARM processor chip running at 1 GHz—this is the thumbnail-sized chip in the center of the board below. If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system. However, the Sitara chip contains two 32-bit microcontrollers, called Programmable Realtime Units or PRUs. (It's almost fractal, having processors inside the processor.) By using a PRU, you can achieve fast, deterministic, real-time control of I/O pins and devices.

The BeagleBone computer is tiny. For some reason it was designed to fit inside an Altoids mint tin. The square thumbnail-sized chip in the center is the AM3358 Sitara processor chip. This chip contains an ARM processor as well as two 32-bit microcontrollers called PRUs.

The nice thing about using the PRU microcontrollers on the BeagleBone is that you get both real-time Arduino-style control, and "real computer" features such as a web server, WiFi, Ethernet, and multiple languages. The main processor on the BeagleBone can communicate with the PRUs, giving you the best of both worlds. The downside of the PRUs is there's a significant learning curve to use them since they have their own instruction set and run outside the familiar Linux world. Hopefully this article will help with the learning curve.

I wrote an article a couple weeks ago on The BeagleBone's I/O pins. That article discussed the pins controlled by the ARM processor, while this article focuses on the PRU microcontroller. If you think you've seen the present article before, they cover two different things, but I won't blame you for getting deja vu!

Running a "blink" program on the PRU

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED. This example is based on PRU GPIO example

Blinking an LED using the BeagleBone's PRU microcontroller.

The easiest way to compile and assemble PRU code is to do it on the BeagleBone, since the necessary tools are installed by default (at least if you get the Adafruit BeagleBone). Perform the following steps on the BeagleBone.

Connect an LED to BeagleBone header P8_11 through a 1K resistor.
Download the assembly code file blink.p:
Download the host file to load and run the PRU code, loader.c.
Download the device tree file, /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Assemble blink.p and compile the loader:

# pasm -b blink.p
# gcc -o loader loader.c -lprussdrv

Run the loader to execute the PRU binary:
```
# ./loader blink.bin
```

If all goes well, the LED should blink 10 times.[1]

Documentation

The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example.

The assembly code

If you're familiar with assembly code, the PRU's instruction set should be fairly straightforward. The instruction set is documented on the wiki and in the PRU reference manual (section 5).[2]

The demonstration blink code uses register R1 to count the number of blinks (10) and register R0 to provide the delay between flashes. The delay timing illustrates an important feature of the PRU: it is entirely deterministic. Most instructions takes 5 nanoseconds (at 200 MHz), although reads are slower. There is no pipelining, branch delays, memory paging, interrupts, scheduling, or anything else that interferes with instruction execution. This makes PRU execution predictable and suitable for real-time processing. (In contrast, the main ARM processor has all the scheduling and timing issues of Linux, making it unsuitable for real-time processing.)

Since the delay loop is two instructions long and executes 0xa00000 times (an arbitrary value that gave a visible delay), we can compute that each delay is 104.858 milliseconds, regardless of what else the processor is doing. (This is shown in the oscilloscope trace below.) A similar loop on the ARM processor would have variable timing, depending on system load.

LED output from the BeagleBone PRU demo, showing the 104ms oscillations.

The I/O pin

How do we know that pin P8_11 is controlled by bit 15 of R30? We're using one of the PRU output pins called "pr1_pru0_pru_r30_15", which is a PRU 0 output pin controlled by R30 bit 15. (Note that the PRU GPIO pins are separate from the "regular" GPIO pins.)[3] The BeagleBone header chart shows that this PRU I/O pin is connected to BeagleBone header pin P8_11.[4] To tell the BeagleBone we want to use this pin with the PRU, the device tree configuration is updated as discussed below.

Communication between the main processor and the PRU

The PRU microcontrollers are part of the processor chip, yet operate independently. In the blink demo, the main processor runs a loader program that downloads the PRU code to the microcontroller, signals the PRU to start executing, and then waits for the PRU to finish. But what happens behind the scenes to make this work? The PRU Linux Application Library (PRUSSDRV) provides the API between code running under Linux on the BeagleBone and the PRU.[5]

The prussdrv_exec_program() API call loads a binary program (in our case, the blink program) into the PRU so it can be executed. To understand how this works requires a bit of background on how memory is set up.

Each PRU microcontroller has 8K (yes, just 8K) of instruction memory, starting at address 0. Each PRU also has 8K of data memory, starting at address 0 (see TRM 4.3).[6] Since the main processor can't access all these memories at address 0, they are mapped into different parts of the main processor's memory as defined by the "global memory map".[7] For instance, PRU0's instruction memory is accessed by the host starting at physical address 0x4a334000. Thus, to load the binary code into the PRU, the API code copies the PRU binary file to that address.[8]

Then, to start execution, the API code starts the PRU running by setting the enable bit in the PRU's control register, which the host can access at a specific physical address (see TRM 4.5.1.1).

To summarize, the loader program running on the main processor loads the executable file into the PRU by writing it to special memory addresses. It then starts up the PRU by writing to another special memory address that is the PRU's control register.

The Userspace I/O (UIO) driver

At this point, you may wonder how the loader program can access to physical memory and low-level chip registers - usually is not possible from a user-level program. This is accomplished through UIO, the Linux Userspace I/O driver (details). The idea behind a UIO driver is that many devices (the PRU in our case) are controlled through a set of registers. Instead of writing a complex kernel device driver to control these registers, simply expose the registers to user code, and then user-level code can access the registers to control the device. The advantage of UIO is that user-level code is easier to write and debug than kernel code. The disadvantage is there's no control over device usage—buggy or malicious user-level code can easily mess up the device operation.[9]

For the PRU, the uio_pruss driver provides UIO access. This driver exposes the physical memory associated with the PRU as a device /dev/uio0. The user-level PRUSSDRV API library code does a mmap on this device to map the address space into its own memory. This gives the loader code access to the PRU memory through addresses in its own address space. The uio_pruss driver exposes information on the mapping to the user-level code through the directory /sys/class/uio/uio0/maps.

Another photo of the BeagleBone Black single-board computer inside an Altoids mint tin. At this point in the article, you may need a break from the text.

Device Tree

How does the uio_pruss driver know the address of the PRU? This is configured through the device tree, a complicated set of configuration files used to configure the BeagleBone hardware. I won't go into the whole explanation of device trees, but just the parts relevant to our story.[10] The pruss entry in the device tree is defined in am33xx.dtsi. Important excerpts are:

pruss: pruss@4a300000 {
 compatible = "ti,pruss-v2";
 reg = <0x4a300000 0x080000>;
 status = "disabled";
 interrupts = <20 21 22 23 24 25 26 27>;
};

The address 4a300000 in the device tree corresponds to the PRU's instruction/data/control space in the processor's address space. (See TRM Table 2-4 and section 4.3.2.) This address is provided to the uio_pruss driver so it knows what memory to access.

The "ti,pruss-v2" compatible line causes the matching uio_pruss driver to be loaded. Note that the status is "disabled", so the PRU is disabled by default until enabled in a device tree overlay.

The "interrupts" line specifies the eight ARM interrupts that are handled by the driver; these will be discussed in the Interrupts section below.

Device tree overlay

To enable the PRU and the GPIO output pin, we needed to load the PRU-GPIO-EXAMPLE-00A0.dts device tree overlay file.[11] This file is discussed in detail at credentiality, so I'll just give some highlights.

As described earlier, each header pin has eight different functions, so we need to tell the pin multiplexer driver which of the eight modes to use for the pin P8_11. Looking at the P8 header diagram, pin P8_11 has internal address 0x34 and has the PRU output function in mode 6. This is expressed in the overlay file:

exclusive-use = "P8.11", "pru0";
...
    target = <&am33xx_pinmux>;
...
       pinctrl-single,pins = <
         0x34 0x06

The above entry extends the am33xx_pinmux device tree description and is passed to the pinctrl-single driver, which configures the pins as desired by updating the appropriate chip control registers.

The device tree file also overrides the previous pruss entry, changing the status to "okay", which enables the uio_pruss device driver:

    target = <&pruss>;
    __overlay__ {
      status = "okay";

The cape manager

This device tree overlay was loaded with the command "echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots". What does this actually do?

The idea of the cape manager is if you plug a board (called a cape) onto the BeagleBone, the cape manager reads an EEPROM on the board, figures out what resources the board needs, and automatically loads the appropriate device tree fragments (from /lib/firmware) to configure the board (source, details, documentation). The cape manager is configured in the device tree file am335x-bone-common.dtsi.

Examples of BeagleBone capes. Image from beaglebone.org, CC BY-SA 3.0.

In our case, we aren't installing a new board, but just want to manually install a new device tree file. The cape manager exposes a bunch of files in /sys/devices/bone_capemgr.*. Sending a name to the file "slots" causes the cape manager to load the corresponding device tree file. This enables the PRU and the desired output pin.

Interrupts

Interrupts can be used to communicate between the PRU code and the host code. At the end of the PRU blink code, it sends an interrupt to the host by writing 35 to register R31. This wakes up the loader program, which is waiting with the API call "prussdrv_pru_wait_event(PRU_EVTOUT_0)". This process seems like it should be straightforward, but it's surprisingly complex. In this section, I'll explain how PRU interrupts work.

I'll start with the host (loader program) side. The PRU can send eight different events to the host program. The prussdrv_pru_wait_event(N) call waits for event N (0 through 7) by reading from /dev/uioN. This read will block until an interrupt happens, and then return. (This seems like a strange way to receive interrupts, but using the file system fits the Unix philosophy.)[12]

If you look in the file system, there are 8 uio files: /dev/uio0 through /dev/uio7. You might expect /dev/uio1 provides access to PRU1, but that's not the case. Instead, there are 8 event channels between the PRUs and the main processor, allowing 8 different types of interrupts. Each one of these has a separate /dev/uioN file; reading from the file waits for the corresponding event, labeled PRU_EVTOUT0 through PRU_EVTOUT7.

Jumping now to the PRU side, the interrupt is triggered by a write to R31. Register R31 is a special register - writing to it sends an event to the interrupt system.[13] The blink code generates PRU internal event 3 (with the crazy name pr1_pru_mst_intr[3]_intr_req), which is mapped to system event 19.

The following diagram shows how an event (right) gets mapped to a channel, then a host, and finally generates an interrupt (left). In our case, system event 19 from the PRU is given the label PRU0_ARM_INTERRUPT. This interrupt is mapped to channel 2, then host 2 which goes to ARM interrupt PRU_EVTOUT0, which is ARM interrupt 20 (see TRM 6.3). The device tree configured the pruss_uio driver to handle events 20 through 27, so the driver receives the interrupt and unblocks /dev/uio0, informing the host process of the PRU event.

Interrupt handling on the BeagleBone between the PRU microcontrollers and the ARM processor. System events (right) are mapped to channels, then hosts, finally generating interrupts (left).

How does the above complex mapping get set up? You might guess the device tree, but it's done by the PRUSSDRV user-level library code. The interrupt configuration is defined in PRUSS_INTC_INITDATA, a struct that defines the various mappings shown above. This struct is passed to prussdrv_pruintc_init(), which writes to the appropriate PRU interrupt registers (mapped into memory).[14]

Next steps

A real application would use a host program much more powerful than the example loader. For instance, the host program could run a web server and actively communicate with the PRU. Or the host program could do computation on data received from the PRU. Such a system uses the ARM processor to get all the advantages of Linux, and the PRU microcontrollers to perform real-time activities impossible in Linux.

It's easier to program the PRU in C using an IDE. That's a topic for another post, but for now you can look here, here, here or here for information on using the C compiler.

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing. Combining the BeagleBone's full Linux system with the PRU microcontrollers yields a very powerful system. The Linux side can provide powerful processing, network connectivity, web servers, and so forth, while the PRUs can interface with the external world. If you're using an Arduino but want more (Ethernet, web connectivity, USB, WiFi), the BeagleBone is an alternative to consider.

There is a significant learning curve to using the PRU microcontrollers, however. Much of what happens may seem like magic, hidden behind APIs and device tree files. Hopefully this article has filled in the gaps, explaining what happens behind the scene.

Notes and references

[1] If you get the error "prussdrv_open() failed" the problem is the PRU didn't get enabled. Make sure you loaded the device tree file into the cape manager.

[2] One of the instructions in the PRUs instruction set is LBBO (load byte burst), which reads a block of memory into the register file. That sounded to me like an obscure instruction I'd never need, but it turns out that this instruction is used for any memory reads. So don't make my mistake of ignoring this instruction!

[3] To understand the pin name "pr1_pru0_pru_r30_15": pru0 indicates the pin is controlled by PRU 0 (rather than PRU 1), r30 indicates the pin is an output pin, controlled by the output register R30 (as opposed to the input register R31), and 15 indicates I/O pin 15, which is controlled by bit 15 of the register. A mnemonic to remember that R30 is the output register and R31 is the input register: 0 looks like O for output, 1 looks like I for input.

[4] Most PRU pins conflict with HDMI or another subsystem, so you need to disable HDMI to use them. To avoid this problem, I picked one of the few non-conflicting PRU pins. Disabling HDMI is described here and here. See this chart for a list of available PRU pins and potential conflicts.

[5] The PRU library is documented in PRU Linux Application Loader API Guide. The library is often installed on the BeagleBone by default but can also be downloaded in the am335x_pru_package along with PRU documentation and some coding examples. Library source code is here.

[6] The PRU, like many microcontrollers, uses a Harvard architecture, with independent memory for code and data.

[7] In the main processor's address space, PRU0's data RAM starts at address 0x4a300000, PRU1's data RAM starts at 0x4a302000, PRU0's instruction RAM starts at 0x4a334000, and PRU1's instruction RAM starts at 0x4a338000. (See TRM Table 2-4 and section 4.3.2.) The PRU control registers also have addresses in the main processor's physical address space; PRU0's control registers start at address 0x4a322000 and PRU1's at 0x4a324000.

[8] The API uses prussdrv_pru_write_memory() to copy the contents into the memory-mapped region for PRU's instruction memory. The prussdrv_load_datafile API call is used to load a file into the PRU's data memory.

[9] Note that using Userspace I/O to control the PRU is the exact opposite philosophy of how the regular GPIO pins on the BeagleBone are controlled. Most of the BeagleBone's I/O pins are controlled through complex device drivers that provide access through simple file system interfaces. This is easy to use: just write 1 to /sys/class/gpio/gpio60/value to turn on GPIO 60. On the other hand, the BeagleBone gives you raw access to the PRU control registers - you can do whatever you want, but you'll need to figure out how to do it yourself. It's a bit strange that two parts of the BeagleBone have such wildly different philosophies.

[10] Device trees are discussed in more detail in my previous BeagleBone article. Also see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

[11] Why is the example device tree overlay (and most device tree files) labeled with version 00A0? I found out (thanks to Robert Nelson) that A0 is a common initial version for contract hardware. So 00A0 is the first version of something, followed by 00A1, etc.

[12] The blink example only sends events from the PRU to the host, but what about the other direction? The host can send an event (e.g. ARM_PRU0_INTERRUPT, 21) to a PRU using prussdrv_pru_send_event(). This writes the PRU's SRSR0 (System Event Status Raw/Set Register, TRM 4.5.3.13) to generate the event. The interrupt diagram shows this event flows to R31 bit 30. The PRUs don't receive interrupts as such, but must poll the top two bits of R31 to see if an interrupt signal has arrived.

[13] The PRU has 64 system events for all the different things that can trigger interrupts, such as timers or I/O (see TRM 4.4.2.2). When generating an event with R31, bit 5 triggers an event, while the lower 4 bits select the event number (see TRM 4.4.1.2). PRU system event numbers are offset by 16 from the internal event numbers, so internal event 3 is system event 19. See PRU interrupts for the full list of events.

[14] Multiple PRU registers are set up to initialize interrupt handling. The CMR (channel map registers, TRM 4.5.3.20) map from each system event to a channel. The HMR (host interrupt map registers, TRM 4.5.3.36) map from a channel to a host interrupt. The PRU interrupt registers (INTC) start at address 0x4a320000 (TRM Table 4-8).

The BeagleBone's I/O pins: inside the software stack that makes them work

The BeagleBone is a inexpensive, credit-card sized computer with many I/O pins. These pins can be easily controlled from software, but it can be very mysterious what is really happening. To control a general purpose input/output (GPIO) pin, you simply write a character to a special file and the pin turns on or off. But how do these files control the hardware pins? In this article, I dig into the device drivers and hardware and explain exactly what happens behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then. (Various web pages describe the GPIO pins, but if you just want a practical guide of how to use the GPIO pins, I recommend the detailed and informative book Exploring BeagleBone.)

This article focuses on the BeagleBone Black, the popular new member of the BeagleBoard family. If you're familiar with the Arduino, the BeagleBone is much more complex; while the Arduino is a microcontroller, the BeagleBone is a full computer running Linux. If you need more than an Arduino can easily provide (more processing, Ethernet, WiFi), the BeagleBone may be a good choice.

Beaglebone Black single-board computer. Photo by Gareth Halfacree, CC BY-SA 2.0

The BeagleBone uses the Sitara AM3358 processor chip running at 1 GHz - this is the thumbnail-sized chip in the center of the board above. This chip is surprisingly complicated; it appears they threw every feature someone might want into the chip. The diagram below shows what is inside the chip: it includes a 32-bit ARM processor, 64KB of memory, a real-time clock, USB, Ethernet, an LCD controller, a 3D graphics engine, digital audio, SD card support, various networks, I/O pins, an analog-digital converter, security hardware, a touch screen controller, and much more.[1] To support real-time applications, the Sitara chip also includes two separate 32-bit microcontrollers (on the chip itself - processors within processors!).

Functional diagram of the complex processor powering the BeagleBone Black. The TI AM3358 Sitara processor contains many functional units. Diagram from Texas Instruments.

The main document that describes the Sitara chip is the Technical Reference Manual, which I will call the TRM for short. This is a 5041 page document describing all the feature of the Sitara architecture and how to control them. But for specifics on the AM3358 chip used in the BeagleBone, you also need to check the 253 page datasheet. I've gone through these documents, so you won't need to, but I reference the relevant sections in case you want more details.

Using a GPIO pin

The chip's pins can be accessed through two BeagleBone connectors, called P8 and P9. To motivate the discussion, I'll use an LED connected to GPIO 49, which is on pin 23 of header P9 (i.e. P9_23). (How do you know this pin is GPIO 49? I explain that below.) Note that the Sitara chip's outputs can provide much less current than the Arduino, so be careful not to overload the chip; I used a 1K resistor.

To output a signal to this GPIO pin, simply write some strings to some files:

# echo 49 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio49/direction
# echo 1 > /sys/class/gpio/gpio49/value

The first line causes a directory for gpio49 to be created. The next line sets the pin mode as output. The third line turns the pin on; writing 0 will turn the pin off.

This may seem like a strange way to control the GPIO pins, using special file system entries. It's also strange that strings ("49", "out", and "1") are used to control the pin. However, the file-based approach fits with the Unix model, is straightforward to use from any language, avoids complex APIs, and hides the complexity of the chip.

The file-system approach is very slow, capable of toggling a GPIO pin at about 160 kHz, which is rather awful performance for a 1 GHz processor.[2] In addition, these oscillations may be interrupted for several milliseconds if the CPU is being used for other tasks, as you can see from the image below. This is because file system operations have a huge amount of overhead and context switches before anything gets done. In addition, Linux is not a real-time operating system. While the processor is doing something else, the CPU won't be toggling the pins.

An I/O pin can be toggled to form an oscillator. Unfortunately, oscillations can stop for several milliseconds if the processor is doing something else.

The moral here is that controlling a pin from Linux is fine if you can handle delays of a few milliseconds. If you want to produce an oscillation, don't manually toggle the pin, use a PWM (pulse width modulator) output instead. If you need more complex real-time outputs, use one of the microcontrollers (called a PRU) inside the processor chip.

What's happening internally with GPIOs?

Next, I'll jump to the low level, explaining what happens inside the chip to control the GPIO pins. A GPIO pin is turned on or off when a chip register at a particular address is updated. To access memory directly, you can use devmem2, a simple program that allows a memory register to be read or written. Using the devmem2 program, we can turn the GPIO pin first on and then off by writing a value to the appropriate register (assuming the pin has been initialized). The first number is the address, and the second number is value written to the address.

devmem2 0x4804C194 w 0x00020000
devmem2 0x4804C190 w 0x00020000

What are all these mystery numbers? GPIO 49 is also known as GPIO1_17, the 17th pin in the GPIO1 bank (as will be explained later). The value written to the register, 0x00020000, is a 1 bit shifted left 17 positions (1<<17) to control pin 17. The address 4804C194 is the address of the register used to turn on a GPIO pin. Specifically it is SETDATAOUT register for the 32 GPIO1 pins. By writing a bit to this register, the corresponding pin is turned on. Similarly, the address 4804C190 is the address of the CLEARDATAOUT register, which turns a pin off.

How do you determine these register addresses? It's all defined in the TRM document if you know where to look. The address 4804C000 is the starting address of GPIO1's registers (see Table 2-3 in the TRM). The offset 194h is the offset of the GPIO_SETDATAOUT register, and 190h is the GPIO_CLEARDATAOUT register (see TRM 25.4.1). Combining these, we get the address 4804C194 for GPIO1's SETDATAOUT and 4804C190 for CLEARDATAOUT. Writing a 1 bit to the SETDATAOUT register will set that GPIO, while leaving others unchanged. Likewise, writing a 1 bit to the CLEARDATOUT register will clear that GPIO. (Note that this design allows the selected register to be modified without risk of modifying other GPIO values.)

Putting this all together, writing the correct bit pattern to the address 0x4804C194 or 0x4804C190 turns the GPIO pin on or off. Even if you use the filesystem API to control the pin, these registers are what end up controlling the pin.

How does devmem2 work? It uses Linux's /dev/mem, which is a device file that is an image of physical memory. devmem2 maps the relevant 4K page of /dev/mem into the process's address space and then reads or writes the address corresponding to the desired physical address.

By using mmap and toggling the register directly from a C++ program, the GPIO can be toggled at about 2.8 MHz, much faster than the device driver approach.[3] If you think this approach will solve your problems, think again. As before, there are jitter and multi-millisecond dropouts if there is any other load on the system.

The names of pins

Confusingly, each pin has multiple names and numbers. For instance, pin 23 on header 9 is GPIO1_17, which is GPIO 49, which is pin entry 17 (unrelated to GPIO1_17) which is pin V14 on the chip itself. Meanwhile, the pin has 7 other functions including GMPC_A1, which is the name used for the pin in the documentation. This section explains the different names and where they come form. (If you just want to know the pin names, see the diagrams P8 header and P9 header from Exploring Beaglebone.)

The first complication is that each pin has eight different functions since the processor has many more internal functions than physical pins.[4] (The Sitara chip has about 639 I/O signals, but only 124 output pins available for these signals.) The solution is that each physical pin has eight different internal functions mapped to it. For each pin, you select a mode 0 through 7, which selects which of the eight possible functions you want.

Figuring out "from scratch" which functions are on each BeagleBone header pin is tricky and involves multiple documents. To start, you need to determine what functions are on each physical pin (actually ball) of the chip. Table 4-1 in the chip datasheet shows which signals are associated with each physical pin (see the "ZCZ ball number"). (Or see section 4.3 for the reverse mapping, from the signal to the pin.) Then look at the BeagleBone schematic to see the connection from each pin on the chip (page 3) to an external connector (page 11).

For instance, consider the GPIO pin used earlier. With the ZCZ chip package, chip ball V14 is called GPMC_A1 (in mode 0 this pin is used for the General Purpose Memory Controller). The Beaglebone documentation names the pin with the the more useful mode 7 name "GPIO1_17", indicating GPIO 17 in GPIO bank 1. (Each GPIO bank (0-3) has 32 pins. Bank 0 is numbered 0-31, bank 1 is 32-63, and so forth. So pin 17 in bank 1 is GPIO number 32+17=49.) The schematic shows this chip ball is connected to header P9 pin 23, earning it the name P9_23. This name is relevant when connecting wires to the board. It is also the name used in BeagleScript (discussed later).

Another pin identifier is the sequential pin number based on the the pin control registers. (This number is indicated under the column $PINS in the header diagrams.) Table 9-10 in the TRM lists all the pins in order along with their control address offsets. Inconveniently, the pin names are the mode 0 names (e.g. conf_gpmc_a1), not the more relevant names (e.g. gpio1_17). From this table, we can determine that the offset of con_gpmc_a1 is 844h. Counting the entries (or computing (844h-800h)/4) shows that this is pin 17 in the register list. (It's entirely coincidental that this number 17 matches GPIO1_17). This number is necessary when using the pin multiplexer (described later).

How writing to a file toggles a GPIO

When you write a string to /sys/class/gpio/gpio49/value, how does that end up modifying the GPIO? Now that some background has been presented, this can be explained. At the hardware level, toggling a GPIO pin simply involves setting a bit in a control register, as explained earlier. But it takes thousands of instructions in many layers to get from writing to the file to updating the register and updating the GPIO pin.

The write goes through the standard Linux file system code (user library, system call code, virtual file system layer) and ends up in the sysfs virtual file system. Sysfs is an in-memory file system that exposes kernel objects through virtual files. Sysfs will dispatch the write to the gpio device driver, which processes the request and updates the appropriate GPIO control register.

In more detail, the /sys/class/gpio filesystem entries are provided by the gpio device driver (documentation for sysfs and gpio). The main gpio driver code is gpiolib.c. There are separate drivers for many different types of GPIO chips; the Sitara chip (and the related OMAP family) uses gpio-omap.c. The Linux code is in C, but you can think of gpiolib.c as the superclass, with subclasses for each different chip. C function pointers are used to access the different "methods".

The gpiolib.c code informs sysfs of the various files to control GPIO pin attributes (/active_low, /direction/, /edge, /value), causing them to appear in the file system for each GPIO pin. Writes to the /value file are linked to the function gpio_value_store, which parses the string value, does error checking and calls gpio_set_value_cansleep, which calls chip->set(). This function pointer is where handling switches from the generic gpiolib.c code to the device-specific gpio-omap.c code. It calls gpio_set, which calls _set_gpio_dataout_reg, which determines the register and bit to set. This calls raw_writel, which is inline ARM assembler code to do a STR (store) instruction. This is the point at which the GPIO control register actually gets updated, changing the GPIO pin value.[5] The key point is that a lot of code runs to get from the file system operation to the actual register modification.

How does the code know the addresses of the registers to update? The file gpio-omah.h contains constants for the GPIO register offsets. Note that these are the same values we used earlier when manually updating the registers.

#define OMAP4_GPIO_CLEARDATAOUT  0x0190
#define OMAP4_GPIO_SETDATAOUT  0x0194

But how does the code know that the registers start at 0x4804C000? And how does the system know this is the right device driver to use? These things are specified, not in the code, but in a complex set of configuration files known as Device Trees, explained next.

Device Trees

How does the Linux kernel know what features are on the BeagleBone? How does it know what each pin does? How does it know what device drivers to use and where the registers are located? The BeagleBone uses a Linux feature called the Device Tree, where the hardware configuration is defined in a device tree file.

Linux used to define the hardware configuration in the kernel. But each new ARM chip variant required inconvenient kernel changes, which led to Linus Torvald's epic rant on the situation. The solution was to move this configuration out of kernel code and into files known as the Device Tree, which specifies the hardware associated with the device. This switch, in the 3.8 kernel, is described here.

As if device trees weren't complex enough, the next problem was that BeagleBone users wanted to change pin configuration dynamically. The solution to this was device tree overlays, which allow device tree files to modify other device tree files. With a device tree overlay, the base hardware configuration can be modified by configuration in the overlay file. Then the Capemgr was implemented to dynamically load device tree fragments, to manage the installation of BeagleBoard daughter cards (known as capes).

I won't go into the whole explanation of device trees, but just the parts relevant to our story. For more information see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

The BeagleBone's device tree is defined starting with am335x-boneblack.dts, which includes am33xx.dtsi and am335x-bone-common.dtsi.

The relevant lines are in am33xx.dtsi:

gpio1: gpio@44e07000 {
  compatible = "ti,omap4-gpio";
  ti,hwmods = "gpio1";
  gpio-controller;
  interrupt-controller;
  reg = <0x44e07000 0x1000>;
  interrupts = <96>;
};

The "compatible" line is very important as it causes the correct device driver to be loaded. While processing a device tree file, the kernel checks all the device drivers to find one with a matching "compatible" line. In this case, the winning device driver is gpio-omap.c, which we looked at earlier.

The other lines of interest specify the register base address 44e07000. This is how the kernel knows where to find the necessary registers for the chip. Thus, the Device Tree is the "glue" that lets the device drivers in the kernel know the specific details of the registers on the processor chip.

BoneScript

One easy way to control the BeagleBone pins is by using JavaScript along with BoneScript, a Node.js library. The BoneScript API is based on the Arduino model. For instance, the following code will turn on the GPIO on pin 23 of header P9.

var b = require('bonescript');
b.pinMode('P9_23', b.OUTPUT);
b.digitalWrite('P9_23', b.HIGH);

Using BoneScript is very slow: you can toggle a pin at about 370 Hz, with a lot of jitter and the occasional multi-millisecond gap. But for programs that don't require high speed, BoneScript provides a convenient programming model, especially for applications with web interaction.

For the most part, the BoneScript library works by reading and writing the file system pseudo-devices described earlier. You might expect BoneScript to have some simple code to convert the method calls to the appropriate file system operations, but BoneScript is surprisingly complex. The first problem is BoneScript supports different kernel versions with different cape managers and pin multiplexers, so it implements everything four different ways (source).

A bit surprise is that BoneScript generates and installs new device tree files as a program is running. In particular, for the common 3.8.13 kernel, BoneScript creates a new device tree overlay (e.g. /lib/firmware/bspm_P9_23_2f-00A0.dts) on the fly from a template, runs it through the dtc compiler and installs the device tree overlay through the cape manager by writing to /sys/devices/bone_capemgr.N/slots (source). That's right, when you do a pinMode() operation in the code, BoneScript runs a compiler!

Conclusion

The BeagleBone's GPIO pins can be easily controlled through the file system, but a lot goes on behind the scenes, making it very mysterious what is actually happening. Examining the documentation and the device drivers reveals how these file system writes affect the pins by writing to various control registers.[6] Hopefully after reading this article, the internal operation of the Beaglebone will be less mysterious.

Notes and references

[1] Many of the modules of the Sitara chip have cryptic names. A brief explanation of some of them, along with where they are described in the TRM:

PRU-ICSS (Programmable Real-Time Unit / Industrial Communication Subsystem, chapter 4): this is the two real-time microcontrollers included in the chip. They contain their own modules.
Industrial Ethernet is an extension of Ethernet protocols for industrial environments that require real-time, predictable communication. The Industrial Ethernet module provides timers and I/O lines that can be used to implement these protocols.
SGX (chapter 5) is a 2D/3D graphics accelerator.
GPMC is the general-purpose memory controller. OCMC is the on-chip memory controller, providing access to 64K of on-chip RAM. EMIF is the external memory interface. It is used on the BeagleBone to access the DDR memory. ELM (error location module) is used with flash memory to detect errors. Memory is discussed in chapter 7 of the TRM.
L1, L2, L3, L4: The processor has L1 instruction and data caches, and a larger L2 cache. The L3 interconnect provides high-speed communication between many of the modules of the processor using a "NoC" (Network on Chip) infrastructure. The slower L4 interconnect is used for communication with low-bandwidth modules on the chip. (See chapter 10 of the TRM.)
EDMA (enhanced direct memory access, chapter 11) provides transfers between external memory and internal modules.
TS_ADC (touchscreen, analog/digital converter, chapter 12) is a general-purpose analog to digital converter subsystem that includes support for resistive touchscreens. This module is used on the BeagleBone for the analog inputs.
LCD controller (chapter 13) provides support for LCD screens (up to 2K by 2K pixels). This module provides many signals to the TDA19988 HDMI chip that generates the BeagleBone's HDMI output.
EMAC (Ethernet Media Access Controller, chapter 14) is the Ethernet subsystem. RMII (reduced media independent interface) is the Ethernet interface used by the BeagleBone. MDIO (management data I/O) provides control signals for the Ethernet interface. GMII and RGMII are similar for gigabit Ethernet (unused on the BeagleBone).
The PWMSS (Pulse-width modulation subsystem, chapter 15) contains three modules. The eHRPWM (enhanced high resolution pulse width modulator) generates PWM signals, digital pulse trains with selectable duty cycle. These are useful for LED brightness or motor speed, for instance. These are sometimes called analog outputs, although technically they are not. This subsystem also includes the eCAP (enhanced capture) module which measures the time of incoming pulses. eQEP (Enhanced quadrature encoder pulse) module is a surprisingly complex module to process optical shaft encoder inputs (e.g. an optical encoder disk in a mouse) to determine its rotation.
MMC (multimedia card, chapter 18) provides the SD card interface on the BeagleBone.
UART (Universal Asynhronous Receiver/Transmitter, chapter 19) handles serial communication.
I2C (chapter 21) is a serial bus used for communication with devices that support this protocol.
The McASP (Multichannel audio serial port, chapter 22) provides digital audio input and output.
CAN (controller area network, chapter 23) is a bus used for communication on vehicles.
McSPI (Multichannel serial port interface, chapter 24) is used for serial communication with devices supporting the SPI protocol.

[2] The following C++ code uses the file system to switch GPIO 49 on and off. Remove the usleeps for maximum speed. Note: this code omits pin initialization; you must manually do "echo 49 > /sys/class/gpio/export" and "echo out > /sys/class/gpio/gpio49/direction".

#include <unistd.h>
#include <fcntl.h>

int main() {
  int fd =  open("/sys/class/gpio/gpio49/value", O_WRONLY);
  while (1) {
    write(fd, "1", 1);
    usleep(100000);
    write(fd, "0", 1);
    usleep(100000);
  }
}

[3] Here's a program based on devmem2 to toggle the GPIO pin by accessing the register directly. The usleeps are delays to make the flashing visible; remove them for maximum speed. For simplicity, this program does not set the pin directions; you must do that manually.

#include <fcntl.h>
#include <sys/mman.h>

int main() {
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    void *map_base = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
            fd, 0x4804C000);
    while (1) {
        *(unsigned long *)(map_base + 0x194) = 0x00020000;
        usleep(100000);
        *(unsigned long *)(map_base + 0x190) = 0x00020000;
        usleep(100000);
    }
}

For more on this approach, see BeagleBone Black GPIO through /dev/mem.

[4] Selecting one of the eight functions for each pin is done through the pin multiplexer (pinmux) which uses the pinctrl-single device driver. Pin usage is defined in the device tree. The details of this are complex and change from kernel to kernel. See GPIOs on the Beaglebone Black using the Device Tree Overlays or BeagleBone and the 3.8 Kernel for details.

[5] The function names tend to change in different kernel versions. I'm describing the Linux 3.8 kernel code.

[6] I've discussed just the GPIO pins, but other pins (LED, PWM, timer, etc) have similar file system entries, device drivers, and device tree entries.

Restoring Y Combinator's Xerox Alto, day 4: What's running on the system

This post describes our continuing efforts to restore a Xerox Alto. We checked that the low-level microcode tasks are running correctly and the processor is functioning. (The Alto uses an unusual architecture that runs multiple tasks in microcode.) Unfortunately the system still doesn't boot from disk, so the next step will be to get out the logic analyzer and see exactly what's happening. Here's Marc's video of the days's session:

The Alto was a revolutionary computer, designed at Xerox PARC to investigate personal computing, introducing the GUI, Ethernet and laser printers to the world. Y Combinator received an Alto from computer visionary Alan Kay. I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). For background, see my previous restoration articles: day 1, day 2, day 3.

Checking the clocks

We started by checked that all the clock signals were working properly by connecting an oscilloscope to the wirewrap pins on the computer's backplane. This took a lot of careful counting to make sure we connected to the right pins! The system clock signals are generated by an oscillator on the video display card, which isn't where I'd expect to find them. Since the clock signals control the timing of the entire system, nothing will happen if the clock is bad. Thus, checking the clock was an important first step.

At first, the clock signals all looked awful, but after finding a decent ground for the oscilloscope probes, the clock signals looked much better. We verified that the multiple clock outputs were all running nicely. We also tested the reset line to make sure it was being triggered properly - the Alto is reset by pushing a button at the back of the keyboard.

Connecting oscilloscope probes to the Xerox Alto backplane.

Microcode tasks

Next we looked at the running tasks. The Alto has 16 separate tasks running in microcode, doing everything from pushing pixels to the display to refreshing memory to moving disk words. Keep in mind that these are microcode tasks, not operating-system level tasks. The Alto was designed to reduce hardware by performing as many tasks in software as possible to reduce price and increase flexibility. The downside is the CPU can spend the majority of its time doing these tasks rather than "useful" work.

Alto task scheduling is fairly complex. Each task has a priority. When a task is ready to run, its request line is activated (by the associated hardware). The current task can offer to yield by executing the TASK function at convenient points. If there is a higher-priority task ready to run, it preempts the running task. If there's nothing better to run, task 0 runs - this task is what actually runs user code, emulating the Data General Nova instruction set.

The point of this explanation is that microcode instructions need to be running properly for task switching to happen. If the TASK function doesn't get called, the current task will run forever. And if all the task scheduling hardware isn't working right, task switching also won't happen.

Below is a picture of the microcode control board from the Alto. When you're using 1973-era chips, it takes a lot of chips to do anything. This board manages which task is running and the memory address of each task. It uses two special priority encoder chips to determine which waiting task has the highest priority. The board holds the microcode, 1024 micro-instructions of 32 bits each, using eight 1K x 4 bit PROM chips. (PROM, programmable read-only memory, is sort of like non-erasable flash memory.) The board has 8 open sockets allowing an upgrade of 1K of additional microcode to be installed. Note the tiny memory capacity of the time, just 512 bytes of storage per chip.

The microcode control board from a Xerox Alto

Since tasks can be interrupted, the board needs to store the current address of each task. It uses two i3101A RAM chips for this storage. The 3101 is historically interesting: it was the first solid state memory chip, introduced by Intel in 1969. This chip holds 64 bits as 16 words of 4 bits each. Just imagine a time when a memory chip held not gigabits but just 64 total bits.

Looking at the running tasks

The control board has a 4-bit task number available on the backplane, indicating which task is running. We hooked up the oscilloscope so we could see the running tasks. The good news is we saw the appropriate tasks running at the right intervals, with preemption working properly. The following traces show the four task number bits. Most of the time the low-priority task 0 runs (all active-low signals high). Task 12 is running in the middle. Task 8 (memory refresh) runs three times, 38.08 microseconds apart as expected. From the traces, everything seems to be functioning correctly with the task execution.

Trace of the 4-bit microcode task select lines on the Xerox Alto. Top (red) is 8, then 4, 2 and bottom (yellow) is 1 bit. Signals are active-low. Each time interval is 10 microseconds, so this shows a 100 microsecond time interval.

Seeing the running tasks is a big thing, since it shows a whole lot of the system is working properly. As explained earlier, since tasks are running and switching, the microcode processor must be fetching and executing micro-instructions correctly.

Display working better now

You may remember from the previous article that the Alto display was very, very dim and we suspected the CRT was failing. The good news is the display has steadily increased in brightness from its original very dim state, so we probably won't need to replace the CRT. We also managed to see some garbage on the screen along with a cursor, showing that RAM is storing something and the display interface is working.

The display of the Xerox Alto displaying random junk.

Boot still doesn't work

Lots of things are working at this point. The minor :-) remaining problem is the system doesn't boot. Last time, we got the disk drive working: we can put a 14-inch disk cartridge (below) in the drive, the drive spins up, and the heads load. But looking at the backplane signals, we found nothing is getting read from the disk (which explains the boot failure). The oscilloscope showed that the Alto isn't sending any commands to the disk - the Alto isn't even trying to read the disk. We checked for various hardware issues and couldn't find any problems. My suspicion is the boot code in microcode isn't running properly.

Inserting a hard disk into the Diablo drive.

A bit of explanation on the boot process: On reset, microcode task 0 handles the boot. If backspace is pressed on the keyboard, the Alto does a Ethernet boot. Otherwise it does a disk boot by setting up a disk command block in RAM. The microcode disk sector task gets triggered on each sector pulse (which we saw coming from the disk). It checks if there is a command block in RAM, and if so sends the command to the disk. When the read data comes from the disk, the disk word task copies the data into memory. At the end, the block read from disk will be executed, performing the disk boot. So three microcode tasks need to cooperate to boot from disk.

Since we're seeing no command sent to the disk, something must be going wrong between task 0 setting up the command block in RAM and the sector task executing the command block. There's a lot that needs to go right here. If anything is wrong in the ALU or RAM has problems, the command block will get corrupted. And then no disk operation will happen.

Conclusion

The next step is to use a logic analyzer to see exactly what is running, instruction by instruction. By looking at the microcode address lines, we will be able to see what code is executing and where things go wrong. Then we can probe the memory bus to see if RAM is the problem, and look at the ALU to see if it is causing the problem. This is where debugging will get more complex.

I've studied the microcode and it is very bizarre. (You can see the source here.) Instructions are in random order in the PROM, what an instruction does depends on what task is running, branches happen when a device board flips address bits on the bus, and some bits in the PROM are inverted for no good reason (probably to save an inverter chip somewhere). So looking at the microcode can be mind-bending. But hopefully with the logic analyzer we can narrow the problem down. We can also use the Living Computer Museum's simulator to cross-check against what microcode should be running.

For updates on the restoration, follow kenshirriff on Twitter.

Restoring Y Combinator's Xerox Alto, day 3: Inside the disk drive

I'm helping restore a Xerox Alto — a legendary minicomputer from 1973 that helped set the direction for personal computing. This post describes how we cleaned and restored the disk drive and then powered up the system. Spoiler: the drive runs but the system doesn't boot yet.

While creating the Alto, Xerox PARC invented much of the modern personal computer: everything from Ethernet and the laser printer to WYSIWYG editors with high-quality fonts. Getting this revolutionary system running again is a big effort but fortunately I'm working with a strong team: Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen, along with special guest Tim Curley from PARC.

If this article gives you deja vu, you probably saw Marc's restoration video (above) on Hacker News last week or read the earlier restoration updates: introduction, day 1, day 2.

Hard disk technology of the 1970s

For mass storage, the Alto uses a Diablo disk drive, which stores 2.5 megabytes on a removable 14 inch disk cartridge. With 1970s technology, you don't get much storage even on an inconveniently large disk, so Alto users were constantly short of disk space. The photo below shows the Xerox Alto, with the computer chassis (bottom) partially removed. Above the chassis and below the keyboard is the Diablo disk drive, which is the focus of this article.

The Xerox Alto II XM 'personal computer'. The card cage below the disk drive has been partially removed. Four cooling fans are visible at the front of it.

To insert the disk cartridge into the drive, the front of the drive swings down and the cartridge slides into place. The cartridge is an IBM 2315 disk pack, which was used by many manufacturers of the era such as DEC and HP, and contains a single platter inside the hard white protective case. The disk drive has been partially pulled out of the cabinet and the top removed, revealing the internals of the drive. During normal use, the disk drive is inside the cabinet, of course.

Inserting a hard disk into the Diablo drive.

Unlike modern hard disks, the Alto's disk is not sealed; the disk pack opens during use to provide access to the heads. To protect against contamination and provide cooling, filtered air is blown through the disk pack during use. Air enters the disk through a metal panel on the bottom of the disk (as seen below) and exits through the head opening, blowing any dust away from the disk surface.

Hard disk for the Xerox Alto, showing the air intake vent.

Although the heads are widely separated during disk pack insertion, they move very close to the disk surface during operation, floating on a cushion of air about one thousandth of a millimeter above the surface. The diagram below from the manual illustrates the danger of particles on the disk's surface. Any contamination can cause the head to crash into the disk surface, gouging out the oxide layer and destroying the disk and the head.

The Diablo disk and why contaminants are bad, from the Alto disk manual.

The magnified photo below shows the read/write head. The two air bleed holes ensure that the head is flying at the correct height above the disk surface. The long part of the cross contains the read/write coil, while the short part of the cross contains the erase coils (which erase a band between tracks).

Read/write head for the Diablo drive.

The following diagram shows how data is stored on the disk in 203 tracks (actually 203 cylinders, since there are tracks on the top and bottom surfaces). The drive moves the tiny read/write heads to the desired track. Each track is divided into 12 sectors, with 256 words of data in each sector.

Diagram of how the Diablo disk drive's read/write head stores data in tracks on the disk surface. From the Maintenance Manual.

In the photo below, we have removed the top of the disk pack revealing the hard disk inside. Note the vertical metal ring along the inside of the disk; it has twelve narrow slots that physically indicate the twelve sectors of the disk. A double slot is the index mark, indicating the first sector. To make sure the disk surface was clean, we wiped the disk surfaces clean with isopropyl alcohol. This seemed a bit crazy to me, but apparently it's a normal thing to do with disks of that era.

Inside the disk pack used by the Xerox Alto.

The photo below shows the motor spindle that rotates the hard disk at 1500 RPM. In front of the spindle, you can see the sensor that detects the slots that indicate sectors. To the left is the air duct that provides filtered airflow into the disk pack. (The air intake on the bottom of the disk pack was shown in an earlier photo.) Around the edge of the air duct is foam to provide a seal between the duct and the disk cartridge, ensuring airflow through the cartridge.

The motor spindle (center) rotates the hard disk. In front of the spindle is the sensor to detect sectors. To the left is the ventilation air duct for the disk.

After 40 years, the foam had deteriorated into mush and needed to be replaced. The foam no longer provided an airtight seal. Even worse, particles could break off the foam. If a piece of foam got blown onto the disk surface, it would probably trigger a catastrophic disk crash. To replace the foam, we used weatherstripping, which isn't standard but seemed to get the job done.

As well as replacing the foam, we vacuumed any dust out of the drive and carefully cleaned the heads and other drive components.

How the Diablo drive works

The drive itself has fairly limited logic, with most of the functionality inside the Alto. The drive can seek to a particular track, indicate the current sector, and read or write a stream of raw bits. Since there's no buffering in the disk drive, the Alto must supply every bit at the precise time based on the disk's rotation. In the Alto, microcode performs many interfacing tasks that are usually done in hardware. Instead of using DMA, the Alto's microcode moves data words one at a time to the disk interface card in the Alto, which does the serial/parallel conversion.

The Diablo drive opened for servicing.

Modern disk drives use a dense disk controller integrated circuit. The Diablo drive, in contrast, implements its limited functionality with transistors and individual chips (mostly gates and flip flops), so it requires boards of components. The photo above shows the 6 main circuit boards of the Alto, plugged into the "mother board": three on the left side and three on the right side. For ease of maintenance, the electronics assembly pops up as seen above, allowing access to the boards. The leftmost board is the analog circuitry, generating the write signals for the heads and amplifying the signals read back from the disk. You can see a wire running from the board to the read/write heads. The next board detects sector and index marks and controls the motor speed. The third board has a counter to keep track of the current sector number.

The three boards on the right perform seeks, moving the disk head to the desired track. The first board computes the difference between the previous track number and the requested track number. The next board counts tracks as the head moves to determine the distance remaining. The rightmost board controls the servo that moves the head to the right track. The seek servo has a four-speed drive, so the head moves rapidly at first and slows down as it approaches the right track, more sophistication than I expected. The Diablo drive manual has detailed schematics.

The photo below shows some of the colorful resistors and diodes on the analog read/write board, along with some transistors. Modern circuit boards would be much denser, with tightly packed surface mounted components.

Circuitry inside the Diablo 31 drive.

The head positioning mechanism is shown below. The turquoise circles rotate as the drive moves to a new track and the yellow pointer indicates the track number on the dial. The heads themselves are on the arm below (lower center). In front of the heads (bottom of the picture) is the metal bar that opens the disk pack when it is inserted.

Inside the Diablo disk drive. The heads are visible in the center. In front of them is the metal bar that opens the disk pack.

As the disk pack enters the drive, it opens up to provide access to the disk surface. The photo below shows the same mechanism as the previous photo, but from the side and with a disk inserted. You can see the exposed surface of the disk, brownish from the magnetizable iron oxide layer over the aluminum platter. As described earlier, the airflow exits the cartridge here, preventing dust from entering through this opening. The read/write head is visible above the disk's surface, with another head below the disk.

Closeup of the hard disk inside the Diablo drive. The read/write head (metal/yellow) is visible above the disk surface (brown).

The drive largely uses primitive DTL chips—diode transistor logic, an early form of digital logic, as well as some slightly more modern TTL chips. The photo below shows some of the chips on the sector counting board. The chips labeled MC858P provide four NAND gates, so there's not much logic per chip. (7651 is the date code, indicating the chip was manufactured in week 51 of 1976.)

Chips on a control board for the Diablo drive.

Conclusion

After putting the disk drive back together, we carefully powered up the system. The disk drive spun up to high speed, the heads dropped to the surface, and the disk slowed to 1500 RPM as expected. (One surprising complexity of the drive is it runs at a faster speed for a while so the airflow will blow contaminants out of the disk pack before loading the heads; it has counters and logic to implement this.) We verified that the disk surface remained undamaged, so the drive works properly, at least mechanically.

This was the first time we had powered up the Alto circuitry. Happily, nothing emitted smoke. But not surprisingly, the Alto failed to boot from the disk. Unless the Alto can read boot code from the disk (or Ethernet), nothing happens, not even a prompt on the screen. The photo below shows the disk with the ready light illuminated, and the empty screen.

The Xerox Alto's drive powered up, along with monitor (showing a white screen).

We have a long debugging task ahead of us, to trace through the Alto's logic circuits and find out what's going wrong. We're also building a disk emulator using a FPGA, so we will be able to run the Alto with an emulated disk, rather than depending on the Diablo drive to keep running. The restoration is likely to keep us busy for a while, so expect more updates. One item we are missing is the Alignment Cartridge (or C.E. Pack), a disk cartridge with specially-recorded tracks used to align the drive; let us know if you happen to have one lying around!

For updates on the restoration, follow kenshirriff on Twitter. Thanks to Al Kossow and Keith Hayes for assistance with restoration.

Restoring Y Combinator's Xerox Alto, day 2: Repairing the display

This post describes how we repaired the monitor from a Xerox Alto. The Alto was a revolutionary computer, designed in 1973 at Xerox PARC to investigate personal computing. Y Combinator received an Alto from computer visionary Alan Kay, and I'm helping restore this system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen. You may have seen Marc's restoration video (below) on Hacker News; my article describes the monitor repairs in more detail. (See my first article on the restoration for more background.)

The Alto display

One of the innovative features of the Alto was its 606 by 808 pixel bitmapped display, which could display high-quality proportionally spaced fonts as well as graphics. This display led to breakthroughs in computer interaction, such as the first WYSIWYG text editor, which could print formatted text on a laser printer (which Xerox had recently invented). This editor, Bravo, was written for the Alto by Charles Simonyi, who later wrote Word at Microsoft. (I discussed the Bravo editor in more detail last week.)

As you can see from the photos, the Alto's display has a somewhat unusual portrait orientation, allowing it to simulate an 8½x11 sheet of paper. Custom monitor hardware was required to support the portrait orientation, which uses 875 scan lines instead of the standard 525 lines. The Alto's monitor was based on a standard Ball Brothers computer monitor with some component values changed for the higher scan rate. This was easier than turning a standard monitor sideways and rotating everything in software.

The Xerox Alto's monitor with the case removed.

How a CRT monitor works

Since many readers may not be familiar with how CRTs work, I'll give a brief overview. The cathode ray tube (CRT) ruled television and computer displays from the 1930s until a decade or two ago, when flat panel displays finally won out. In a CRT, an electron beam is shot at a phosphor-coated screen, causing a spot on the screen to light up. The beam scans across the screen, left to right and top to bottom in a raster pattern. The beam is turned on and off, generating a pattern of dots on the screen to form the image.

The Xerox Alto's display uses a 875-line raster scan. (For simplicity, I'm ignoring interlacing of the raster scan.)

The cathode ray tube is a large vacuum tube containing multiple components, as shown below. A small electrical heater, similar to a light bulb filament, heats the cathode to about 1000°C. The cathode is an electrode that emits electrons when hot due to physics magic. The anode is positively charged with a high voltage (17,000V). Since electrons are negatively charged, they are attracted to the anode, causing a beam of electrodes to fly down the tube and slam into the screen. The screen is coated with a phosphor, causing it to glow where hit by the electron beam. A control grid surrounds the cathode; putting a negative voltage on the grid repels the electrons, reducing the beam strength and thus the brightness. Two electromagnets are arranged on the neck of the tube to deflect the beam horizontally and vertically; these are the deflection coils.

Diagram of a Cathode Ray Tube (CRT). Based on drawings by Interiot and Theresa Knott (CC BY-SA 3.0).

The components of the tube work together to control the image. The cathode voltage turns the beam on or off, allowing a spot to be displayed or not. The control grid controls the brightness of the display. The horizontal deflection coil scans the beam from left to right across the display, and then rapidly returns it to the left for the next line. The vertical deflection coil more slowly scans the beam from top to bottom, and then rapidly returns the beam to the top for the next image.

Monitors were built with the same CRT technology as televisions, but a television includes a tuner circuit to select the desired channel from the antenna. In addition, televisions have circuitry to extract the horizontal sync, vertical sync and video signals from the combined broadcast signal. These three signals are supplied to the Alto monitor separately, simplifying the circuitry. Color television is more complicated than the Alto's black and white display.

Getting the monitor operational

We started by removing the heavy metal case from the monitor, as seen below. The screen is at the bottom; the neck of the tube is hidden behind the components. The printed circuit board with most of the components is visible at the top. Unlike more modern displays that use integrated circuits, this display's circuitry is built from transistors. On the right is the power supply for the monitor, with a large transformer, capacitor, and fuse. On the left is the vertical drive transformer.

Inside the Alto's monitor. The screen is at the bottom. The power supply transformer is on the right and the vertical deflection transformer is on the left. The circuit board is at top.

We started by checking out the 55 volt power supply that runs the monitor. This power supply is a linear power supply driven from the input AC. The input transformer produces about 68 volts, which is then dropped to 55 volts by a power transistor, controlled by a regulator circuit. (A few years later, more efficient switching power supplies were common in monitors.) It took some time to find the right place to measure the voltage, but eventually we determined that the power supply was working properly, as shown below. At the bottom of the photo below, you can also see the round, reddish connector that provides high voltage to the CRT tube; this is a separate circuit from the 55V power supply. (Grammar note: I consider CRT tube to be clearer, although technically redundant.)

Testing the 55V power supply in the Xerox Alto's monitor.

At the top of the photo above, you can see the power transistor for the vertical deflection circuit. This circuit generates the vertical sweep signal, scanning from top to bottom 60 times a second. This signal is a sawtooth wave fed into the vertical deflection coil, so the beam is slowly deflected from top to bottom and then rapidly returns to the top. The vertical deflection signal is synchronized to the video input by a vertical sync input from the Alto.

The monitor circuit board (below) contains circuitry for vertical deflection, horizontal deflection, 55V power supply regulation, and video amplification. The board isn't designed for easy repair; to access the components, many connectors (lower left) must be disconnected.

The circuit board for the Xerox Alto's monitor.

The horizontal deflection circuit generates the horizontal sweep signal, scanning from left to right about 26,000 times a second. Driving the deflection coils requires a current of about 2 amps, so this circuit must switch a lot of current very rapidly. I've heard that the horizontal circuitry on the Alto has a tendency to overheat. We noticed some darkened spots on the board, but it still works.

The horizontal deflection circuit also supplies 17,000 volts to the CRT tube. This voltage is generated by the flyback transformer. Each horizontal scanline sends a current pulse into the flyback transformer, which is a step-up transformer that produces 17,000 volts on its output. (Interestingly, phone chargers also use flyback transformers, but one that produces 5 volts rather than 17,000 volts.)

The photo below shows the flyback transformer (upper right), the UFO-like structure with a thick high-voltage wire leading to the CRT. The large white cylinder is the bleeder resistor that drains the high voltage away when the monitor is not in use. This unusually large resistor is 500 megaohms and 6 watts, and is connected to the tube by a thick, red high voltage wire. The deflection coils are visible at the left, coils of red wire with white plastic on either side. The deflection coils are mounted on the outside of the tube. At the bottom is the power transistor for the horizontal circuit.

The CRT tube in the Alto's monitor, showing the deflection coils around the tube. The flyback transformer is in the upper right. The bleeder resistor is on the right.

Needless to say, extreme caution is needed when working around the monitor, due to the high voltage that is present. The high voltage circuitry can be tested by holding an oscilloscope probe a couple inches away from the flyback transformer; even at that distance, a strong signal shows up on the oscilloscope. The CRT tube also poses a danger due to the vacuum inside; if the tube breaks, it can violently implode, sending shards of glass flying. Finally, X-rays are generated by using high voltage to accelerate electrons from a cathode to hit a target, just like a CRT operates, so there is a risk of X-ray production. To guard against this, the glass screen of a CRT contains pounds of lead to shield against radiation (which makes CRT disposal an environmental problem). In addition, the monitor's circuitry guards against overvoltages that would produce excessive X-rays. The photo below shows some of the safety warnings from the monitor.

Safety warnings on the Alto monitor. CRTs pose danger from implosion, X-rays, and high voltage.

Since we don't have the Alto computer running yet, we can't use it to generate a video signal. Fortunately we obtained a display test board from Keith Hayes at the Living Computer Museum in Seattle. This board, based on a PIC 24F microcontroller, generates video, horizontal, and vertical signals to produce a simple test bar pattern on the display. We used this board to drive the monitor during testing.

Test board from the Living Computer Museum to drive the Alto's monitor.

We got a bit confused about which signal from the test board was the horizontal sync and which signal was the video. When we switched the signals around, we got a buzzing noise out of the monitor (see the video). Since the horizontal sync signal drives the high voltage power supply, we were in effect turning the power supply on and off 60 times a second with an audible effect. We eventually determined that Keith's test board was wired correctly and undid our modifications.

Even with the test board hooked up properly, the display didn't seem to be operational. But by turning off the room lights, we could see very faint bars on the display. We discovered that the monitor's brightness adjustment was flaky; just touching caused the display to flicker. Removing the variable resistor (below) and cleaning it with alcohol improved the situation somewhat. We tested an electrolytic capacitor in the brightness circuit and found it was bad, but replacing it didn't make much difference.

The brightness control on the Alto monitor. This control was flaky and needed cleaning.

At the end of our session, we had dim bars on the display, showing that the display works but not at the desired brightness. We suspect that the CRT tube is weak due to its age, so we're searching for a replacement tube. Another alternative is rejuvenation – putting high-current pulses through the tube to get the "gunk" off the cathode, extending the tube's lifetime for a while. (If anyone has a working CRT rejuvenator in the Bay Area, let us know.)

The monitor successfully displays the test bars, but very faintly.

Our work on the monitor was greatly aided by the detailed service manual. If you want more information on how the monitor works, the manual has a detailed function description and schematics. An excerpt of the schematic is shown below.

Detail of the schematic diagram for the Alto's monitor, showing the CRT. The schematic is online in the manual.

The disk controller

Our previous episode of Alto restoration ended with the surprising discovery that the Alto had the wrong disk controller card which wouldn't work with our Alto's Diablo disk drive. Fortunately, we were rescued by Al Kossow, who happened to have an extra Alto disk interface card lying around (that's Silicon Valley for you) and gave it to us. Below is a photo of the Alto disk interface card we got from Al. At the left is the edge connector that plugs into the Alto's backplane. The disk cable attaches to the connector on the right.

The Alto II Disk Control card, the interface to the Diablo drive.

Conclusion

Our efforts to get the monitor working were moderately successful. Although the monitor is dim, it functions well enough to proceed with restoring the Alto. We'll see if we can improve the brightness or obtain a new CRT tube. We will probably work on the disk drive next, as the drive is necessary for booting up the Alto. Since the drive is a complex mechanical device with precise tolerances, I expect a challenge.

"Hello world" in the BCPL language on the Xerox Alto simulator

The first programming language for the Xerox Alto was BCPL, the language that led to C. This article shows how to write a BCPL "Hello World" program using Bravo, the first WYSIWYG text editor, and run it on the Alto simulator.

The Xerox Alto is the legendary minicomputer from 1973 that helped set the direction for personal computing. Since I'm helping restore a Xerox Alto (details), I wanted to learn more about BCPL programming. (The influential Mesa and and Smalltalk languages were developed on the Alto, but those are a topic for another post.)

The Xerox Alto II XM computer. Note the video screen is arranged in portrait mode. Next to the keyboard is a mouse. The Diablo disk drive is below the keyboard. The base contains the circuit boards and power supplies.

Using the simulator

Since the Alto I'm restoring isn't running yet, I ran my BCPL program on Salto, an Alto simulator for Linux written by Juergen Buchmueller. To build it, download the simulator source from github.com/brainsqueezer/salto_simulator, install the dependencies listed in the README, and run make. Then run the simulator with the appropriate Alto disk image:

bin/salto disks/tdisk4.dsk.Z

Here's what the simulator looks like when it's running:

The Salto simulator for the Xerox Alto.

(To keep this focused, I'm not going to describe everything you can run on the simulator, but I'll point out that pressing ? at the command line will show the directory contents. Anything ending in .run is a program you can run, e.g. "pinball".)

Type bravo to start the Bravo text editor. Press i (for insert). Enter the BCPL program:

// Hello world demo
get "streams.d"
external
[
Ws
]

let Main() be
[
Ws("Hello World!*N")
]

Here's a screenshot of the Bravo editor with the program entered:

A Xerox Alto 'Hello World' program for written in BCPL, in the Bravo editor.

Press ESC to exit insert mode.
Press p (put) to save the file.
Type hello.bcpl (the file name) and press ESC (not enter!).
Press q then ENTER to quit the editor.

Run the BCPL compiler, the linker, and the executable by entering the following commands at the prompt:

bcpl hello.bcpl
bldr/d/l/v hello
hello

If all goes well, the program will print "Hello World!" Congratulations, you've run a BCPL program.

Output of the Hello World program in BCPL on the Xerox Alto simulator.

The following figure explains the Hello World program. If you know C, the program should be comprehensible.

'Hello World' program in BCPL with explanation.

The BCPL language

The BCPL language is interesting because it was the grandparent of C. BCPL (Basic Combined Programming Language) was developed in 1966. The B language was developed in 1969 as a stripped down version of BCPL by Ken Thompson and Dennis Ritchie. With the introduction of the PDP-11, system software needed multiple datatypes, resulting in the development of the C language around 1972.

Overall, BCPL is like a primitive version of C with weirdly different syntax. The only type that BCPL supports is the 16-bit word, so it doesn't use type declarations. BCPL does support supports C-like structs and unions, including structs that can access bit fields from a word. (This is very useful for the low-level systems programming tasks that BCPL was designed for.) BCPL also has blocks and scoping rules like C, pointers along with lvalues and rvalues, and C-like looping constructs.

A BCPL program looks strange to a C programmer because many of the special characters are different and BCPL often uses words instead of special characters. Here are some differences:

Blocks are defined with [...] rather than {...}.
For array indexing, BCPL uses a!b instead of a[b].
BCPL uses resultis 42 instead of return 42.
Semicolons are optional, kind of like JavaScript.
For pointers, BCPL uses lv and rv (lvalue and rvalue) instead of & and *. rvalues.
The BCPL operator => (known as "heffalump"; I'm not making this up) is used for indirect structure references instead of C's ->.
selecton X into, instead of C's switch, but cases are very similar with fall-throughs and default.
lshift and rshift instead of << and >>.
eq, ne, ls, le, gr, ge in place of ==, !=, <, <=, >, >=.
test / ifso / ifnot instead of if / else.

A BCPL reference manual is here if you want all the details of BCPL.

More about the Bravo editor

The Bravo editor was the first editor with WYSIWYG (what you see is what you get) editing. You could format text on the screen and print the text on a laser printer. Bravo was written by Butler Lampson and Charles Simonyi in 1974. Simonyi later moved to Microsoft, where he wrote Word, based on the ideas in Bravo.

Steve Jobs saw the Alto when he famously toured Xerox Parc in 1979, and it inspired the GUI for the Lisa and Mac. However, Steve Jobs said in a commencement address, "[The Mac] was the first computer with beautiful typography. If I had never dropped in on that [calligraphy] course in college, the Mac would have never had multiple typefaces or proportionally spaced fonts. And since Windows just copied the Mac, it's likely that no personal computer would have them." This is absurd since the Alto had a variety of high-quality proportionally spaced fonts in 1974, before the Apple I was created, let alone the Macintosh.

The image below shows the Hello World program with multiple fonts and centering applied. Since the compiler ignores any formatting, the program runs as before. (Obviously styling is more useful for documents than code.)

The Bravo editor provides WYSIWYG formatting of text.

The manual for Bravo is here but I'll give a quick guide to Bravo if you want to try more editing. Bravo is a mouse-based editor, so use the mouse to select the text for editing. left click and right click under text to select it with an underline. The editor displays the current command at the top of the editing window. If you mistype a command, pressing DEL (delete) will usually get you out of it. Pressing u provides an undo.

To delete selected text, press d. To insert more text, press i, enter the text, then ESC to exit insert mode. To edit an existing file, start Bravo from the command line, press g (for get), then enter the filename and press ESC. To apply formatting, select characters, press l (look), and then enter a formatting code (0-9 to change font, b for bold, i for italics).

Troubleshooting

If your program has an error, compilation will fail with an error message. The messages don't make much sense, so try to avoid typos.

The simulator has a few bugs and tends to start failing after a few minutes with errors in the simulated disk. This will drop you into the Alto debugger, called Swat. At that point, you should restart the simulator. Unfortunately any files you created in the simulator will be lost when you exit the simulator.

If something goes wrong, you'll end up in Swat, the Xerox Alto's debugging system.

Conclusion

The BCPL language (which predates the Alto) had a huge influence on programming languages since it led to C (and thus C++, Java, C#, and so forth). The Bravo editor was the first WYSIWYG text editor and strongly influenced personal computer word processing. Using the Alto simulator, you can try both BCPL and Bravo for yourself by compiling a "Hello World" program, and experience a slice of 1970s computing history.

Inside the tiny RFID chip that runs San Francisco's "Bay to Breakers" race

How does a tiny chip time the runners in the Bay to Breakers race? In this article, I take die photos of the RFID chip used to track athletes during the race.

Bay to Breakers, 2016. Photo courtesy of David Yu, CC BY-NC-ND 2.0.

Bay to Breakers is the iconic San Francisco race, with tens of thousands of runners (many in costume and some in nothing) running 12km across the city. To determine their race time, each runner wears an identification bib. As you can see below, the back of the bib has a small foam rectangle with a metal foil antenna and a tiny chip underneath. The runners are tracked using a technology called RFID (Radio Frequency Identification).

The bib worn by runners in the Bay to Breakers race. At the top, behind the foam is an antenna and RFID chip used to detect the runner at the start and end of the race.

At the beginning and end of the race, the runners cross special mats that contain antennas and broadcast ultra high frequency radio signals. The runner's RFID chip detects this signal and sends back the athlete's ID number, which is programmed into the chip. By tracking these ID numbers, the system determines the time each runner took to run the race. The cool thing about these RFID chips is they are powered by the received radio signal; they don't need a battery.

Mylaps, whose name appears on the foam rectangle, is a company that supplies sports timing systems: the bibs with embedded RFID chips, the detection mats, and portable detection hardware. The detection system is designed to handle large numbers of runners, scanning more than 50 tags per second.

Removing the foam reveals an unusually-shaped metal antenna, the tiny RFID chip (the black dot above the word "DO", and the barely-visible word "Smartrac". Studying the Smartrac website reveals that this chip is the Impinj Monza 4 RFID chip, which operates in the 860-960 MHz frequency range and is recommended for sports timing.

The RFID circuit used to detect runners in the Bay to Breakers. The metal forms an antenna. The tiny black square in the center is the RFID chip.

Getting the chip off the bib was a bit tricky. I softened the bib material in Goof Off, dissolved the aluminum antenna metal with HCl and removed the adhesive with a mysterious BGA adhesive solvent I ordered from Shenzhen.

The chip itself is remarkably tiny, about the size of a grain of salt. The picture below shows the chip on a penny, as seen through a microscope: for scale, a grain of salt is by the R and the chip is on the U (in TRUST). This is regular salt, by the way, not coarse sea salt or kosher salt. I spent a lot of time trying to find the chip when it fell on my desk, since it is practically a speck.

The RFID chip used to identify runners is very small, about the size of one of the letters on a penny. A grain of salt (next to R) and the RFID chip (next to U).

In the picture above, you can see the four round contact points where the chip was connected to the antenna. There's still a blob of epoxy or something around the die, making it hard to see the details. The chip decapsulation gurus use use boiling nitric and sulfuric acids to remove epoxy, but I'm not that hardcore so I heated the chip over a stove flame. This burned off the epoxy much better than I expected, making the die clearly visible as you can see in the next photo.

I took 34 die photos using my metallurgical microscope and stitched them together to get a hi-res photo. (I described the stitching process in detail here). The result is the die photo below (click it for the large image). Surprisingly, there is no identifying name or number on the chip. However, comparing my die photo with the picture in the datasheet confirms that the chip is the Monza 4 RFID chip.

I can identify some of the chip's layout, but the chip is too dense and has too many layers for me to reverse engineer the exact details. Thus, the description that follows is slightly speculative.

Die photo of the Impinj Monza 4 RFID chip.

The four pins in the corners are where the antenna is connected. (The chip has four pins because two antennas can be used for improved detection.)

The left part of the chip is the analog logic, extracting power from the antenna, reading the transmitted signal, and modulating the return signal. The rectangles on the left are probably transistors and capacitors forming a charge pump to extract power from the radio signal (see patent 7,561,866).

The right third of the chip is so-called "random logic" that carries out the commands sent to the chip. According to the datasheet, the chip uses a digital logic finite state machine, so the chip probably doesn't have a full processor.

The 32 orderly stripes in the middle are the non-volatile memory (NVM). Above the stripes, the address decode circuitry is barely visible. The chip has 928 bits of storage (counting up the memory banks on the datasheet) so I suspect the memory is set up as a 32x29 array. Some NVM details are in patent 7307534.

Along the lower and right edges of the chip, red lines are visible; these connect chips together on the wafer for testing during manufacturing (patent 7307528).

The Impinj Monza 4 RFID chip on top of a 8751 microcontroller chip shows that the RFID chip is very small and dense.

To show how small the chip is, and how technology has changed, I put the RFID chip on top of an 8751 microcontroller die. The 8751 microcontroller is a chip in Intel's popular 8051 family dating from 1983. Note that the circuitry on the RFID chip is denser and the chip is much, much smaller. (The photo looks a bit photoshopped, but it genuinely is the RFID chip sitting on the surface of the 8751 die. I don't know why the RFID chip is pink.)

So, if you ran in the Bay to Breakers, that's the chip that tracked your time during the race. (There aren't a lot of other RFID die photos on the web, but a few are at Bunnie Studios, Zeptobars and ExtremeTech if you want to see more.)