Ken Shirriff's blog

Restoring YCombinator's Xerox Alto day 5: Microcode tracing with a logic analyzer

In today's Xerox Alto restoration session we investigated why the system doesn't boot. We find a broken wire, hook up a logic analyzer, generate a cloud of smoke, and discover that memory problems are preventing the computer from booting. (In previous episodes, we fixed the power supply, got the CRT display working and cleaned up the disk drive: days 1, 2, 3. and 4.)

The Alto was a revolutionary computer, designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team).

The broken wire

The Xerox Alto is built from 13 circuit boards, crammed with TTL chips. In 1973, minicomputers such as the Alto were built from a whole bunch of simple ICs instead of a primitive microprocessor chip. (People still do this as a retro project.) The Alto's CPU is split across 3 boards: an ALU board, a control board, and a control RAM board. The control board is the focus of today's adventures.

If a circuit board has a design defect or needs changes, it can be modified by attaching new wires to create the necessary connections. The photo below shows the control board with several white modification wires. While examining the control board, we noticed one of the wires had come loose. Could the boot failures be simply due to a broken wire?

Control board from the Xerox Alto, showing a broken wire. The white wires were for a modification, but one wire came loose.

We carefully resoldered the wire and powered up the system. The disk drive slowly came up to speed and the heads lowered onto the disk surface. We pressed the reset button (under the keyboard) to boot. As before, nothing happened and the display remained blank. Fixing the wire had no effect.

After investigation, it appears the rework wires were to support the Trident/Tricon hard disk. In the photo above, note the small edge connector in the upper right, with the white wires connected. The Trident disk controller used this connector, but our (Diablo) disk controller does not. In other words, the broken wire might have caused problems with a different disk drive, but it was irrelevant to us.

Microcode on the Xerox Alto

Some background on the Xerox Alto's architecture will help motivate our day's investigation. The Alto, like most modern computers, is implemented using microcode. Computers are programmed in machine instructions, where each instruction may involve several steps. For instance, a "load" instruction may first compute a memory address by adding an offset to an index register. Then the address is sent to memory. Finally the contents of memory are stored into a register. Instead of hardcoding these steps (as done in the 6502 or Z-80 for instance), modern computers run a sequence of "micro-instructions", where each micro-instruction performs one step of the larger machine instructions. This technique, called microcode, is used by the Xerox Alto.

The Alto uses microcode much more heavily than most computers. The Alto not only uses microcode to implement the instruction set, but implements part of the software in microcode directly. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, most video displays pull pixels out of memory and display them on the screen. In the Alto, the processor itself fetches pixels out of memory and passes them to the video hardware. Similarly, most disk interfaces transfer data between memory and the disk drive. But in the Alto, the processor moves each data word to/from memory itself. The code to perform these tasks is written in microcode.

To perform all these low-level activities, the Alto hardware manages 16 different tasks, listed below. High-priority tasks (such as handling high-speed data from the disk) can take over from low-priority tasks, such as handling the display cursor. The lowest-level task is the "emulator", the task that executes program instructions. (In a normal computer, the emulator task is the only thing microcode is doing.) Remember, these tasks are not threads or processes handled by the operating system. These are microcode tasks, below the operating system and scheduled directly by the hardware.

Task	Name	Description
0	Emulator	Lowest priority.
1	-	unused
2	-	unused
3	-	unused
4	KSEC	Disk sector task
5	-	unused
6	-	unused
7	ETHER	Ethernet task
8	MRT	Memory refresh task. Wakeup every 38.08 microseconds.
9	DWT	Display word task
10	CURT	Cursor task
11	DHT	Display horizontal task
12	DVT	Display vertical task. Wakeup every 16.666 milliseconds.
13	PART	Parity task. Wakeup generated by parity error.
14	KWD	Disk word task
15	-	unused

Last episode, we found that processor was running the various tasks, but never tried to access the disk. System boot is started by the emulator task, which stores a disk command in memory. The disk sector task (KSEC) periodically checks if there are any disk commands to perform. Thus, it seemed like something was going wrong in either the emulator task (setting up the disk request), or the disk sector task (performing the disk request). To figure out exactly what was happening, we needed to hook up a logic analyzer.

The logic analyzer

A logic analyzer is a piece of test equipment a bit like an oscilloscope, except instead of measuring voltages, it just measures 0's or 1's. A logic analyzer also has dozens of inputs, allowing many signals to be analyzed at once. By using a logic analyzer, we can log every micro-instruction the processor runs, track each task, and even record every memory access.

Most of the signals of interest are available on the Alto's backplane, which connects all the circuit cards. Since the backplane is wire-wrapped, it consists of pins that conveniently fit the logic analyzer probes. For each signal, you need to find the right card, and then count the pins until you find the right pin to attach the probe. This setup is very tedious, but Marc patiently connected all the probes, while Carl entered the configuration into the logic analyzer.

The backplane of the Xerox Alto, with probes from the logic analyzer attached to trace microcode execution. Note the thick power wires on the left.

Unfortunately, a few important signals (the addresses of the micro-instructions) were not available on the backplane, and we needed to attach probes to one of the PROM chips that hold the microcode. Fortunately, the Living Computer Museum in Seattle gave us an extender card; by plugging the extender card into the backplane and the circuit board into the extender card, the board was accessible and we could connect the probes.

Probes from the logic analyzer hooked up to the Xerox Alto. By plugging the control board into an extension board, probes can be attached to it.

Hours later, with all the probes connected and the configuration programmed into the logic analyzer, we were ready to power up the system and collect data.

Running the logic analyzer

"Smoke! Stop! Shut it off!"

As soon as we flipped the power switch, smoke poured out of the backplane. Had we destroyed this rare computing artifact? What had gone wrong? When something starts smoking, it's usually pretty obvious where the problem is. In our case, one of the ground wires from the logic analyzer pod had melted, turning its insulation into smoke. A bit of discussion followed: "Pin 3 is ground, right?" "No, pin 9 is ground, pin 3 is 5 volts." "Oops." It turns out that when you short +5 and ground, a probe wire is no match for a 60 amp power supply. Fortunately, this wire was the only casualty of the mishap.

This logic probe wire melted when we accidentally connected +5 volts and ground with it.

With this problem fixed, we were able to get a useful trace from the logic analyzer. The trace showed that the Alto started off with the emulator/boot task. After just four instructions, execution switched to the disk word task, which was rapidly interrupted by the parity error task. When that task finished, execution went back to the disk word task, which was interrupted a few instructions later by the display vertical task. The disk word task was able to run a few more instructions before the display horizontal task ran, followed by the cursor task.

The vintage Agilent 1670G logic analyzer that we connected to the Xerox Alto. The screen shows the start of the Alto's boot sequence.

It's rather amazing how much task switching is going on in the Alto, with low-priority tasks only getting a few instructions executed before being interrupted by a higher-priority task. Looking at the trace made me realize how much overhead these tasks have. In our case, the emulator task is running the boot code, so progress towards boot requires looking at hundreds of instructions in the logic analyzer.

The key thing we noticed in the traces is the parity error task ran right near the start, indicating an error in memory. This would explain why the system doesn't boot up. We ran a few more boot cycles through the logic analyzer. The specific order of tasks varied each time, as you'd expect since they are triggered asynchronously from hardware events. But we kept seeing the parity errors.

The Alto's memory system

The Alto was built in the early days of semiconductor memory, when RAM chips were expensive and unreliable. The original Alto module used Intel's 1103 memory chips, which were the first commercially available DRAM chip, holding just 1 kilobit. To provide 128 kilobytes of memory, the Alto I used 16 boards crammed full of chips. (If you're reading this on a computer with 4 gigabytes of memory, think about how much memory capacity has improved since the 1970s.)

We have the later Alto II XM (extended memory) system, which used more advanced 16 kilobit chips to fit 512 kilobytes of storage onto 4 boards. Each memory board stored a 10 bit chunk—why 10 bits? Because memory chips were unreliable, the Alto used error correction. To store a 32-bit word pair, 6 bits of Hamming error correction were added, along with a parity bit, and one unused bit. The extra bits allow single-bit errors to be corrected and double-bit errors to be detected. The four memory boards in parallel stored 40 bits at a time—the 32 bit word pair and the extra bits for error correction.

A 128KB memory card from the Xerox Alto. The board has eighty 4116 DRAM chips, each with 16 kilobits of storage.

In addition to the 4 memory boards, the Alto has three circuit boards to control memory. The "MEAT" (Memory Extension And Terminator) is a trivial board to support four memory banks (the extended memory in the Alto XM). The "AIM" board (Address Interface Module) is a complex board that maps addresses to memory control signals, as well as handling memory-mapped peripherals such as the keyboard, mouse, and printer. Finally, the "DIM" board (Data Interface Module) generates the Hamming error correcting code signals, and performs error detection and correction.

More probing showed that the DIM board was always expressing a parity error. At this point, we're not sure if some of the memory chips are bad or if the complex circuitry on the DIM board is malfunctioning and reporting errors. As you can tell from the above description, the memory system on the Alto is complex. It may be a challenge to debug the memory and find out why we're getting errors.

A look at the microcode

In this section, I'll give a brief view of what the microcode looks like and how it appears in the logic analyzer. Microcode is generally hard to understand because it is at a very low level in the system, below the instruction set and running on the bare hardware. The Alto's microcode seems especially confusing.

Each Alto micro-instruction specifies an ALU operation and two "functions". A function can be something like "read a register" or "send an address to memory". But a function can also change meaning depending on what task is running. For instance, when the Ethernet task is running, a function might mean "do a four-way branch depending on the Ethernet state". But during the display task, the same function could mean "display these pixels on the screen". As a result, you can't figure out what an instruction does unless you know which task it is a part of.

The image below shows a small part of the logic analyzer output (as printed on Marc's vintage HP line printer). Each line corresponds to one executed micro-instruction. The "address" column shows the address of the micro-instruction in the 1K PROM storage. The task field shows which task is running. You can see the task switch midway through execution; 0 is the emulator and 13 is the parity task. Finally, the 32-bit micro-instruction is broken into fields such as RSEL (register select), ALUF (ALU function) and F1 (function 1).

The start of the logic analyzer trace from booting the Xerox Alto. The trace shows us each micro-instruction that was executed.

Note that the addresses jump around a lot; this is because the microcode isn't stored linearly in the PROM. Every micro-instruction has a "next instruction address" field in the instruction, so you can think of it as a GOTO inside every instruction. To make it worse, this field can be modified by the interface hardware, turning a GOTO into a computed GOTO. To make this work, the assembler shuffles instructions around in memory, so it's hard to figure out what code goes with a particular address. The point of this is that the logic analyzer output shows us every micro-instruction as it executes, but the output is somewhat difficult and tedious to interpret.

Fortunately we have the source code for the microcode, but understanding it is a challenge. The image below shows a small section of the boot code. I won't attempt to explain the microcode in detail, but want to give you a feel for what it is like. Labels (along the left) and jumps to labels are highlighted in blue. Things such as IR, L, and T are registers, and they get assigned values as indicated by the arrows. MAR is the memory address register (giving an address to memory) and MD is memory data, reading or writing the memory value.

A short section of the Xerox Alto's microcode. Labels and jumps are colored blue. Comments are gray.

Figuring out the control flow of the microcode requires detailed understanding of what is happening in the hardware. For example, in the last line above, ":Q0" indicates a jump to label "Q0". However the previous line says "BUS", which means the contents of the data bus are ORed into the address, turning the jump into a conditional jump to Q0, Q1, Q2, etc. depending on the bus value. And "TASK" indicates that a task switch can happen after the next instruction. So matching up the instructions in the logic analyzer output with instructions in the source code is non-trivial.

I should mention that the authors of the Alto's microcode were really amazing programers. An important feature for graphics displays is BITBLT, bit block transfer. The idea is to take an arbitrary rectangle of pixels in memory (such as a character, image, or window) and copy it onto the screen. The tricky part is that the regions may not be byte-aligned, so you may need to extract part of a byte, shift it over, and combine it with part of the destination byte. In addition, BITBLT supports multiple writing modes (copy, XOR, merge) and other features. So BITBLT is a difficult function to implement, even in a high-level language. The incredible part is that the Xerox Alto has BITBLT implemented in hundreds of lines of complex microcode! Using microcode for BITBLT made the operation considerably faster than implementing it in assembly code. (It also meant that BITBLT was used as a single machine language instruction.)

Conclusion

Hooking up the logic analyzer was time consuming, but succeeded in showing us exactly what was happening inside the Alto processor. Although interpreting the logic analyzer output and mapping it to the microcode source is difficult, we were able to follow the execution and determined that the parity task was running. It appears that memory parity errors are preventing the system from booting. Next step will be to understand the memory system in detail to determine where these errors are coming from and how to fix them.

PRU tips: Understanding the BeagleBone's built-in microcontrollers

The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. While the PRUs provide the real-time processing capability lacking in Linux, using these processors has a learning curve. In this article, I show how to run a simple program on the PRU, and then dive into the libraries and device drivers to show what is happening behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

The BeagleBone uses the Sitara AM3358, an ARM processor chip running at 1 GHz—this is the thumbnail-sized chip in the center of the board below. If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system. However, the Sitara chip contains two 32-bit microcontrollers, called Programmable Realtime Units or PRUs. (It's almost fractal, having processors inside the processor.) By using a PRU, you can achieve fast, deterministic, real-time control of I/O pins and devices.

The BeagleBone computer is tiny. For some reason it was designed to fit inside an Altoids mint tin. The square thumbnail-sized chip in the center is the AM3358 Sitara processor chip. This chip contains an ARM processor as well as two 32-bit microcontrollers called PRUs.

The nice thing about using the PRU microcontrollers on the BeagleBone is that you get both real-time Arduino-style control, and "real computer" features such as a web server, WiFi, Ethernet, and multiple languages. The main processor on the BeagleBone can communicate with the PRUs, giving you the best of both worlds. The downside of the PRUs is there's a significant learning curve to use them since they have their own instruction set and run outside the familiar Linux world. Hopefully this article will help with the learning curve.

I wrote an article a couple weeks ago on The BeagleBone's I/O pins. That article discussed the pins controlled by the ARM processor, while this article focuses on the PRU microcontroller. If you think you've seen the present article before, they cover two different things, but I won't blame you for getting deja vu!

Running a "blink" program on the PRU

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED. This example is based on PRU GPIO example

Blinking an LED using the BeagleBone's PRU microcontroller.

The easiest way to compile and assemble PRU code is to do it on the BeagleBone, since the necessary tools are installed by default (at least if you get the Adafruit BeagleBone). Perform the following steps on the BeagleBone.

Connect an LED to BeagleBone header P8_11 through a 1K resistor.
Download the assembly code file blink.p:
Download the host file to load and run the PRU code, loader.c.
Download the device tree file, /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Assemble blink.p and compile the loader:

# pasm -b blink.p
# gcc -o loader loader.c -lprussdrv

Run the loader to execute the PRU binary:
```
# ./loader blink.bin
```

If all goes well, the LED should blink 10 times.[1]

Documentation

The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example.

The assembly code

If you're familiar with assembly code, the PRU's instruction set should be fairly straightforward. The instruction set is documented on the wiki and in the PRU reference manual (section 5).[2]

The demonstration blink code uses register R1 to count the number of blinks (10) and register R0 to provide the delay between flashes. The delay timing illustrates an important feature of the PRU: it is entirely deterministic. Most instructions takes 5 nanoseconds (at 200 MHz), although reads are slower. There is no pipelining, branch delays, memory paging, interrupts, scheduling, or anything else that interferes with instruction execution. This makes PRU execution predictable and suitable for real-time processing. (In contrast, the main ARM processor has all the scheduling and timing issues of Linux, making it unsuitable for real-time processing.)

Since the delay loop is two instructions long and executes 0xa00000 times (an arbitrary value that gave a visible delay), we can compute that each delay is 104.858 milliseconds, regardless of what else the processor is doing. (This is shown in the oscilloscope trace below.) A similar loop on the ARM processor would have variable timing, depending on system load.

LED output from the BeagleBone PRU demo, showing the 104ms oscillations.

The I/O pin

How do we know that pin P8_11 is controlled by bit 15 of R30? We're using one of the PRU output pins called "pr1_pru0_pru_r30_15", which is a PRU 0 output pin controlled by R30 bit 15. (Note that the PRU GPIO pins are separate from the "regular" GPIO pins.)[3] The BeagleBone header chart shows that this PRU I/O pin is connected to BeagleBone header pin P8_11.[4] To tell the BeagleBone we want to use this pin with the PRU, the device tree configuration is updated as discussed below.

Communication between the main processor and the PRU

The PRU microcontrollers are part of the processor chip, yet operate independently. In the blink demo, the main processor runs a loader program that downloads the PRU code to the microcontroller, signals the PRU to start executing, and then waits for the PRU to finish. But what happens behind the scenes to make this work? The PRU Linux Application Library (PRUSSDRV) provides the API between code running under Linux on the BeagleBone and the PRU.[5]

The prussdrv_exec_program() API call loads a binary program (in our case, the blink program) into the PRU so it can be executed. To understand how this works requires a bit of background on how memory is set up.

Each PRU microcontroller has 8K (yes, just 8K) of instruction memory, starting at address 0. Each PRU also has 8K of data memory, starting at address 0 (see TRM 4.3).[6] Since the main processor can't access all these memories at address 0, they are mapped into different parts of the main processor's memory as defined by the "global memory map".[7] For instance, PRU0's instruction memory is accessed by the host starting at physical address 0x4a334000. Thus, to load the binary code into the PRU, the API code copies the PRU binary file to that address.[8]

Then, to start execution, the API code starts the PRU running by setting the enable bit in the PRU's control register, which the host can access at a specific physical address (see TRM 4.5.1.1).

To summarize, the loader program running on the main processor loads the executable file into the PRU by writing it to special memory addresses. It then starts up the PRU by writing to another special memory address that is the PRU's control register.

The Userspace I/O (UIO) driver

At this point, you may wonder how the loader program can access to physical memory and low-level chip registers - usually is not possible from a user-level program. This is accomplished through UIO, the Linux Userspace I/O driver (details). The idea behind a UIO driver is that many devices (the PRU in our case) are controlled through a set of registers. Instead of writing a complex kernel device driver to control these registers, simply expose the registers to user code, and then user-level code can access the registers to control the device. The advantage of UIO is that user-level code is easier to write and debug than kernel code. The disadvantage is there's no control over device usage—buggy or malicious user-level code can easily mess up the device operation.[9]

For the PRU, the uio_pruss driver provides UIO access. This driver exposes the physical memory associated with the PRU as a device /dev/uio0. The user-level PRUSSDRV API library code does a mmap on this device to map the address space into its own memory. This gives the loader code access to the PRU memory through addresses in its own address space. The uio_pruss driver exposes information on the mapping to the user-level code through the directory /sys/class/uio/uio0/maps.

Another photo of the BeagleBone Black single-board computer inside an Altoids mint tin. At this point in the article, you may need a break from the text.

Device Tree

How does the uio_pruss driver know the address of the PRU? This is configured through the device tree, a complicated set of configuration files used to configure the BeagleBone hardware. I won't go into the whole explanation of device trees, but just the parts relevant to our story.[10] The pruss entry in the device tree is defined in am33xx.dtsi. Important excerpts are:

pruss: pruss@4a300000 {
 compatible = "ti,pruss-v2";
 reg = <0x4a300000 0x080000>;
 status = "disabled";
 interrupts = <20 21 22 23 24 25 26 27>;
};

The address 4a300000 in the device tree corresponds to the PRU's instruction/data/control space in the processor's address space. (See TRM Table 2-4 and section 4.3.2.) This address is provided to the uio_pruss driver so it knows what memory to access.

The "ti,pruss-v2" compatible line causes the matching uio_pruss driver to be loaded. Note that the status is "disabled", so the PRU is disabled by default until enabled in a device tree overlay.

The "interrupts" line specifies the eight ARM interrupts that are handled by the driver; these will be discussed in the Interrupts section below.

Device tree overlay

To enable the PRU and the GPIO output pin, we needed to load the PRU-GPIO-EXAMPLE-00A0.dts device tree overlay file.[11] This file is discussed in detail at credentiality, so I'll just give some highlights.

As described earlier, each header pin has eight different functions, so we need to tell the pin multiplexer driver which of the eight modes to use for the pin P8_11. Looking at the P8 header diagram, pin P8_11 has internal address 0x34 and has the PRU output function in mode 6. This is expressed in the overlay file:

exclusive-use = "P8.11", "pru0";
...
    target = <&am33xx_pinmux>;
...
       pinctrl-single,pins = <
         0x34 0x06

The above entry extends the am33xx_pinmux device tree description and is passed to the pinctrl-single driver, which configures the pins as desired by updating the appropriate chip control registers.

The device tree file also overrides the previous pruss entry, changing the status to "okay", which enables the uio_pruss device driver:

    target = <&pruss>;
    __overlay__ {
      status = "okay";

The cape manager

This device tree overlay was loaded with the command "echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots". What does this actually do?

The idea of the cape manager is if you plug a board (called a cape) onto the BeagleBone, the cape manager reads an EEPROM on the board, figures out what resources the board needs, and automatically loads the appropriate device tree fragments (from /lib/firmware) to configure the board (source, details, documentation). The cape manager is configured in the device tree file am335x-bone-common.dtsi.

Examples of BeagleBone capes. Image from beaglebone.org, CC BY-SA 3.0.

In our case, we aren't installing a new board, but just want to manually install a new device tree file. The cape manager exposes a bunch of files in /sys/devices/bone_capemgr.*. Sending a name to the file "slots" causes the cape manager to load the corresponding device tree file. This enables the PRU and the desired output pin.

Interrupts

Interrupts can be used to communicate between the PRU code and the host code. At the end of the PRU blink code, it sends an interrupt to the host by writing 35 to register R31. This wakes up the loader program, which is waiting with the API call "prussdrv_pru_wait_event(PRU_EVTOUT_0)". This process seems like it should be straightforward, but it's surprisingly complex. In this section, I'll explain how PRU interrupts work.

I'll start with the host (loader program) side. The PRU can send eight different events to the host program. The prussdrv_pru_wait_event(N) call waits for event N (0 through 7) by reading from /dev/uioN. This read will block until an interrupt happens, and then return. (This seems like a strange way to receive interrupts, but using the file system fits the Unix philosophy.)[12]

If you look in the file system, there are 8 uio files: /dev/uio0 through /dev/uio7. You might expect /dev/uio1 provides access to PRU1, but that's not the case. Instead, there are 8 event channels between the PRUs and the main processor, allowing 8 different types of interrupts. Each one of these has a separate /dev/uioN file; reading from the file waits for the corresponding event, labeled PRU_EVTOUT0 through PRU_EVTOUT7.

Jumping now to the PRU side, the interrupt is triggered by a write to R31. Register R31 is a special register - writing to it sends an event to the interrupt system.[13] The blink code generates PRU internal event 3 (with the crazy name pr1_pru_mst_intr[3]_intr_req), which is mapped to system event 19.

The following diagram shows how an event (right) gets mapped to a channel, then a host, and finally generates an interrupt (left). In our case, system event 19 from the PRU is given the label PRU0_ARM_INTERRUPT. This interrupt is mapped to channel 2, then host 2 which goes to ARM interrupt PRU_EVTOUT0, which is ARM interrupt 20 (see TRM 6.3). The device tree configured the pruss_uio driver to handle events 20 through 27, so the driver receives the interrupt and unblocks /dev/uio0, informing the host process of the PRU event.

Interrupt handling on the BeagleBone between the PRU microcontrollers and the ARM processor. System events (right) are mapped to channels, then hosts, finally generating interrupts (left).

How does the above complex mapping get set up? You might guess the device tree, but it's done by the PRUSSDRV user-level library code. The interrupt configuration is defined in PRUSS_INTC_INITDATA, a struct that defines the various mappings shown above. This struct is passed to prussdrv_pruintc_init(), which writes to the appropriate PRU interrupt registers (mapped into memory).[14]

Next steps

A real application would use a host program much more powerful than the example loader. For instance, the host program could run a web server and actively communicate with the PRU. Or the host program could do computation on data received from the PRU. Such a system uses the ARM processor to get all the advantages of Linux, and the PRU microcontrollers to perform real-time activities impossible in Linux.

It's easier to program the PRU in C using an IDE. That's a topic for another post, but for now you can look here, here, here or here for information on using the C compiler.

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing. Combining the BeagleBone's full Linux system with the PRU microcontrollers yields a very powerful system. The Linux side can provide powerful processing, network connectivity, web servers, and so forth, while the PRUs can interface with the external world. If you're using an Arduino but want more (Ethernet, web connectivity, USB, WiFi), the BeagleBone is an alternative to consider.

There is a significant learning curve to using the PRU microcontrollers, however. Much of what happens may seem like magic, hidden behind APIs and device tree files. Hopefully this article has filled in the gaps, explaining what happens behind the scene.

Notes and references

[1] If you get the error "prussdrv_open() failed" the problem is the PRU didn't get enabled. Make sure you loaded the device tree file into the cape manager.

[2] One of the instructions in the PRUs instruction set is LBBO (load byte burst), which reads a block of memory into the register file. That sounded to me like an obscure instruction I'd never need, but it turns out that this instruction is used for any memory reads. So don't make my mistake of ignoring this instruction!

[3] To understand the pin name "pr1_pru0_pru_r30_15": pru0 indicates the pin is controlled by PRU 0 (rather than PRU 1), r30 indicates the pin is an output pin, controlled by the output register R30 (as opposed to the input register R31), and 15 indicates I/O pin 15, which is controlled by bit 15 of the register. A mnemonic to remember that R30 is the output register and R31 is the input register: 0 looks like O for output, 1 looks like I for input.

[4] Most PRU pins conflict with HDMI or another subsystem, so you need to disable HDMI to use them. To avoid this problem, I picked one of the few non-conflicting PRU pins. Disabling HDMI is described here and here. See this chart for a list of available PRU pins and potential conflicts.

[5] The PRU library is documented in PRU Linux Application Loader API Guide. The library is often installed on the BeagleBone by default but can also be downloaded in the am335x_pru_package along with PRU documentation and some coding examples. Library source code is here.

[6] The PRU, like many microcontrollers, uses a Harvard architecture, with independent memory for code and data.

[7] In the main processor's address space, PRU0's data RAM starts at address 0x4a300000, PRU1's data RAM starts at 0x4a302000, PRU0's instruction RAM starts at 0x4a334000, and PRU1's instruction RAM starts at 0x4a338000. (See TRM Table 2-4 and section 4.3.2.) The PRU control registers also have addresses in the main processor's physical address space; PRU0's control registers start at address 0x4a322000 and PRU1's at 0x4a324000.

[8] The API uses prussdrv_pru_write_memory() to copy the contents into the memory-mapped region for PRU's instruction memory. The prussdrv_load_datafile API call is used to load a file into the PRU's data memory.

[9] Note that using Userspace I/O to control the PRU is the exact opposite philosophy of how the regular GPIO pins on the BeagleBone are controlled. Most of the BeagleBone's I/O pins are controlled through complex device drivers that provide access through simple file system interfaces. This is easy to use: just write 1 to /sys/class/gpio/gpio60/value to turn on GPIO 60. On the other hand, the BeagleBone gives you raw access to the PRU control registers - you can do whatever you want, but you'll need to figure out how to do it yourself. It's a bit strange that two parts of the BeagleBone have such wildly different philosophies.

[10] Device trees are discussed in more detail in my previous BeagleBone article. Also see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

[11] Why is the example device tree overlay (and most device tree files) labeled with version 00A0? I found out (thanks to Robert Nelson) that A0 is a common initial version for contract hardware. So 00A0 is the first version of something, followed by 00A1, etc.

[12] The blink example only sends events from the PRU to the host, but what about the other direction? The host can send an event (e.g. ARM_PRU0_INTERRUPT, 21) to a PRU using prussdrv_pru_send_event(). This writes the PRU's SRSR0 (System Event Status Raw/Set Register, TRM 4.5.3.13) to generate the event. The interrupt diagram shows this event flows to R31 bit 30. The PRUs don't receive interrupts as such, but must poll the top two bits of R31 to see if an interrupt signal has arrived.

[13] The PRU has 64 system events for all the different things that can trigger interrupts, such as timers or I/O (see TRM 4.4.2.2). When generating an event with R31, bit 5 triggers an event, while the lower 4 bits select the event number (see TRM 4.4.1.2). PRU system event numbers are offset by 16 from the internal event numbers, so internal event 3 is system event 19. See PRU interrupts for the full list of events.

[14] Multiple PRU registers are set up to initialize interrupt handling. The CMR (channel map registers, TRM 4.5.3.20) map from each system event to a channel. The HMR (host interrupt map registers, TRM 4.5.3.36) map from a channel to a host interrupt. The PRU interrupt registers (INTC) start at address 0x4a320000 (TRM Table 4-8).

The BeagleBone's I/O pins: inside the software stack that makes them work

The BeagleBone is a inexpensive, credit-card sized computer with many I/O pins. These pins can be easily controlled from software, but it can be very mysterious what is really happening. To control a general purpose input/output (GPIO) pin, you simply write a character to a special file and the pin turns on or off. But how do these files control the hardware pins? In this article, I dig into the device drivers and hardware and explain exactly what happens behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then. (Various web pages describe the GPIO pins, but if you just want a practical guide of how to use the GPIO pins, I recommend the detailed and informative book Exploring BeagleBone.)

This article focuses on the BeagleBone Black, the popular new member of the BeagleBoard family. If you're familiar with the Arduino, the BeagleBone is much more complex; while the Arduino is a microcontroller, the BeagleBone is a full computer running Linux. If you need more than an Arduino can easily provide (more processing, Ethernet, WiFi), the BeagleBone may be a good choice.

Beaglebone Black single-board computer. Photo by Gareth Halfacree, CC BY-SA 2.0

The BeagleBone uses the Sitara AM3358 processor chip running at 1 GHz - this is the thumbnail-sized chip in the center of the board above. This chip is surprisingly complicated; it appears they threw every feature someone might want into the chip. The diagram below shows what is inside the chip: it includes a 32-bit ARM processor, 64KB of memory, a real-time clock, USB, Ethernet, an LCD controller, a 3D graphics engine, digital audio, SD card support, various networks, I/O pins, an analog-digital converter, security hardware, a touch screen controller, and much more.[1] To support real-time applications, the Sitara chip also includes two separate 32-bit microcontrollers (on the chip itself - processors within processors!).

Functional diagram of the complex processor powering the BeagleBone Black. The TI AM3358 Sitara processor contains many functional units. Diagram from Texas Instruments.

The main document that describes the Sitara chip is the Technical Reference Manual, which I will call the TRM for short. This is a 5041 page document describing all the feature of the Sitara architecture and how to control them. But for specifics on the AM3358 chip used in the BeagleBone, you also need to check the 253 page datasheet. I've gone through these documents, so you won't need to, but I reference the relevant sections in case you want more details.

Using a GPIO pin

The chip's pins can be accessed through two BeagleBone connectors, called P8 and P9. To motivate the discussion, I'll use an LED connected to GPIO 49, which is on pin 23 of header P9 (i.e. P9_23). (How do you know this pin is GPIO 49? I explain that below.) Note that the Sitara chip's outputs can provide much less current than the Arduino, so be careful not to overload the chip; I used a 1K resistor.

To output a signal to this GPIO pin, simply write some strings to some files:

# echo 49 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio49/direction
# echo 1 > /sys/class/gpio/gpio49/value

The first line causes a directory for gpio49 to be created. The next line sets the pin mode as output. The third line turns the pin on; writing 0 will turn the pin off.

This may seem like a strange way to control the GPIO pins, using special file system entries. It's also strange that strings ("49", "out", and "1") are used to control the pin. However, the file-based approach fits with the Unix model, is straightforward to use from any language, avoids complex APIs, and hides the complexity of the chip.

The file-system approach is very slow, capable of toggling a GPIO pin at about 160 kHz, which is rather awful performance for a 1 GHz processor.[2] In addition, these oscillations may be interrupted for several milliseconds if the CPU is being used for other tasks, as you can see from the image below. This is because file system operations have a huge amount of overhead and context switches before anything gets done. In addition, Linux is not a real-time operating system. While the processor is doing something else, the CPU won't be toggling the pins.

An I/O pin can be toggled to form an oscillator. Unfortunately, oscillations can stop for several milliseconds if the processor is doing something else.

The moral here is that controlling a pin from Linux is fine if you can handle delays of a few milliseconds. If you want to produce an oscillation, don't manually toggle the pin, use a PWM (pulse width modulator) output instead. If you need more complex real-time outputs, use one of the microcontrollers (called a PRU) inside the processor chip.

What's happening internally with GPIOs?

Next, I'll jump to the low level, explaining what happens inside the chip to control the GPIO pins. A GPIO pin is turned on or off when a chip register at a particular address is updated. To access memory directly, you can use devmem2, a simple program that allows a memory register to be read or written. Using the devmem2 program, we can turn the GPIO pin first on and then off by writing a value to the appropriate register (assuming the pin has been initialized). The first number is the address, and the second number is value written to the address.

devmem2 0x4804C194 w 0x00020000
devmem2 0x4804C190 w 0x00020000

What are all these mystery numbers? GPIO 49 is also known as GPIO1_17, the 17th pin in the GPIO1 bank (as will be explained later). The value written to the register, 0x00020000, is a 1 bit shifted left 17 positions (1<<17) to control pin 17. The address 4804C194 is the address of the register used to turn on a GPIO pin. Specifically it is SETDATAOUT register for the 32 GPIO1 pins. By writing a bit to this register, the corresponding pin is turned on. Similarly, the address 4804C190 is the address of the CLEARDATAOUT register, which turns a pin off.

How do you determine these register addresses? It's all defined in the TRM document if you know where to look. The address 4804C000 is the starting address of GPIO1's registers (see Table 2-3 in the TRM). The offset 194h is the offset of the GPIO_SETDATAOUT register, and 190h is the GPIO_CLEARDATAOUT register (see TRM 25.4.1). Combining these, we get the address 4804C194 for GPIO1's SETDATAOUT and 4804C190 for CLEARDATAOUT. Writing a 1 bit to the SETDATAOUT register will set that GPIO, while leaving others unchanged. Likewise, writing a 1 bit to the CLEARDATOUT register will clear that GPIO. (Note that this design allows the selected register to be modified without risk of modifying other GPIO values.)

Putting this all together, writing the correct bit pattern to the address 0x4804C194 or 0x4804C190 turns the GPIO pin on or off. Even if you use the filesystem API to control the pin, these registers are what end up controlling the pin.

How does devmem2 work? It uses Linux's /dev/mem, which is a device file that is an image of physical memory. devmem2 maps the relevant 4K page of /dev/mem into the process's address space and then reads or writes the address corresponding to the desired physical address.

By using mmap and toggling the register directly from a C++ program, the GPIO can be toggled at about 2.8 MHz, much faster than the device driver approach.[3] If you think this approach will solve your problems, think again. As before, there are jitter and multi-millisecond dropouts if there is any other load on the system.

The names of pins

Confusingly, each pin has multiple names and numbers. For instance, pin 23 on header 9 is GPIO1_17, which is GPIO 49, which is pin entry 17 (unrelated to GPIO1_17) which is pin V14 on the chip itself. Meanwhile, the pin has 7 other functions including GMPC_A1, which is the name used for the pin in the documentation. This section explains the different names and where they come form. (If you just want to know the pin names, see the diagrams P8 header and P9 header from Exploring Beaglebone.)

The first complication is that each pin has eight different functions since the processor has many more internal functions than physical pins.[4] (The Sitara chip has about 639 I/O signals, but only 124 output pins available for these signals.) The solution is that each physical pin has eight different internal functions mapped to it. For each pin, you select a mode 0 through 7, which selects which of the eight possible functions you want.

Figuring out "from scratch" which functions are on each BeagleBone header pin is tricky and involves multiple documents. To start, you need to determine what functions are on each physical pin (actually ball) of the chip. Table 4-1 in the chip datasheet shows which signals are associated with each physical pin (see the "ZCZ ball number"). (Or see section 4.3 for the reverse mapping, from the signal to the pin.) Then look at the BeagleBone schematic to see the connection from each pin on the chip (page 3) to an external connector (page 11).

For instance, consider the GPIO pin used earlier. With the ZCZ chip package, chip ball V14 is called GPMC_A1 (in mode 0 this pin is used for the General Purpose Memory Controller). The Beaglebone documentation names the pin with the the more useful mode 7 name "GPIO1_17", indicating GPIO 17 in GPIO bank 1. (Each GPIO bank (0-3) has 32 pins. Bank 0 is numbered 0-31, bank 1 is 32-63, and so forth. So pin 17 in bank 1 is GPIO number 32+17=49.) The schematic shows this chip ball is connected to header P9 pin 23, earning it the name P9_23. This name is relevant when connecting wires to the board. It is also the name used in BeagleScript (discussed later).

Another pin identifier is the sequential pin number based on the the pin control registers. (This number is indicated under the column $PINS in the header diagrams.) Table 9-10 in the TRM lists all the pins in order along with their control address offsets. Inconveniently, the pin names are the mode 0 names (e.g. conf_gpmc_a1), not the more relevant names (e.g. gpio1_17). From this table, we can determine that the offset of con_gpmc_a1 is 844h. Counting the entries (or computing (844h-800h)/4) shows that this is pin 17 in the register list. (It's entirely coincidental that this number 17 matches GPIO1_17). This number is necessary when using the pin multiplexer (described later).

How writing to a file toggles a GPIO

When you write a string to /sys/class/gpio/gpio49/value, how does that end up modifying the GPIO? Now that some background has been presented, this can be explained. At the hardware level, toggling a GPIO pin simply involves setting a bit in a control register, as explained earlier. But it takes thousands of instructions in many layers to get from writing to the file to updating the register and updating the GPIO pin.

The write goes through the standard Linux file system code (user library, system call code, virtual file system layer) and ends up in the sysfs virtual file system. Sysfs is an in-memory file system that exposes kernel objects through virtual files. Sysfs will dispatch the write to the gpio device driver, which processes the request and updates the appropriate GPIO control register.

In more detail, the /sys/class/gpio filesystem entries are provided by the gpio device driver (documentation for sysfs and gpio). The main gpio driver code is gpiolib.c. There are separate drivers for many different types of GPIO chips; the Sitara chip (and the related OMAP family) uses gpio-omap.c. The Linux code is in C, but you can think of gpiolib.c as the superclass, with subclasses for each different chip. C function pointers are used to access the different "methods".

The gpiolib.c code informs sysfs of the various files to control GPIO pin attributes (/active_low, /direction/, /edge, /value), causing them to appear in the file system for each GPIO pin. Writes to the /value file are linked to the function gpio_value_store, which parses the string value, does error checking and calls gpio_set_value_cansleep, which calls chip->set(). This function pointer is where handling switches from the generic gpiolib.c code to the device-specific gpio-omap.c code. It calls gpio_set, which calls _set_gpio_dataout_reg, which determines the register and bit to set. This calls raw_writel, which is inline ARM assembler code to do a STR (store) instruction. This is the point at which the GPIO control register actually gets updated, changing the GPIO pin value.[5] The key point is that a lot of code runs to get from the file system operation to the actual register modification.

How does the code know the addresses of the registers to update? The file gpio-omah.h contains constants for the GPIO register offsets. Note that these are the same values we used earlier when manually updating the registers.

#define OMAP4_GPIO_CLEARDATAOUT  0x0190
#define OMAP4_GPIO_SETDATAOUT  0x0194

But how does the code know that the registers start at 0x4804C000? And how does the system know this is the right device driver to use? These things are specified, not in the code, but in a complex set of configuration files known as Device Trees, explained next.

Device Trees

How does the Linux kernel know what features are on the BeagleBone? How does it know what each pin does? How does it know what device drivers to use and where the registers are located? The BeagleBone uses a Linux feature called the Device Tree, where the hardware configuration is defined in a device tree file.

Linux used to define the hardware configuration in the kernel. But each new ARM chip variant required inconvenient kernel changes, which led to Linus Torvald's epic rant on the situation. The solution was to move this configuration out of kernel code and into files known as the Device Tree, which specifies the hardware associated with the device. This switch, in the 3.8 kernel, is described here.

As if device trees weren't complex enough, the next problem was that BeagleBone users wanted to change pin configuration dynamically. The solution to this was device tree overlays, which allow device tree files to modify other device tree files. With a device tree overlay, the base hardware configuration can be modified by configuration in the overlay file. Then the Capemgr was implemented to dynamically load device tree fragments, to manage the installation of BeagleBoard daughter cards (known as capes).

I won't go into the whole explanation of device trees, but just the parts relevant to our story. For more information see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

The BeagleBone's device tree is defined starting with am335x-boneblack.dts, which includes am33xx.dtsi and am335x-bone-common.dtsi.

The relevant lines are in am33xx.dtsi:

gpio1: gpio@44e07000 {
  compatible = "ti,omap4-gpio";
  ti,hwmods = "gpio1";
  gpio-controller;
  interrupt-controller;
  reg = <0x44e07000 0x1000>;
  interrupts = <96>;
};

The "compatible" line is very important as it causes the correct device driver to be loaded. While processing a device tree file, the kernel checks all the device drivers to find one with a matching "compatible" line. In this case, the winning device driver is gpio-omap.c, which we looked at earlier.

The other lines of interest specify the register base address 44e07000. This is how the kernel knows where to find the necessary registers for the chip. Thus, the Device Tree is the "glue" that lets the device drivers in the kernel know the specific details of the registers on the processor chip.

BoneScript

One easy way to control the BeagleBone pins is by using JavaScript along with BoneScript, a Node.js library. The BoneScript API is based on the Arduino model. For instance, the following code will turn on the GPIO on pin 23 of header P9.

var b = require('bonescript');
b.pinMode('P9_23', b.OUTPUT);
b.digitalWrite('P9_23', b.HIGH);

Using BoneScript is very slow: you can toggle a pin at about 370 Hz, with a lot of jitter and the occasional multi-millisecond gap. But for programs that don't require high speed, BoneScript provides a convenient programming model, especially for applications with web interaction.

For the most part, the BoneScript library works by reading and writing the file system pseudo-devices described earlier. You might expect BoneScript to have some simple code to convert the method calls to the appropriate file system operations, but BoneScript is surprisingly complex. The first problem is BoneScript supports different kernel versions with different cape managers and pin multiplexers, so it implements everything four different ways (source).

A bit surprise is that BoneScript generates and installs new device tree files as a program is running. In particular, for the common 3.8.13 kernel, BoneScript creates a new device tree overlay (e.g. /lib/firmware/bspm_P9_23_2f-00A0.dts) on the fly from a template, runs it through the dtc compiler and installs the device tree overlay through the cape manager by writing to /sys/devices/bone_capemgr.N/slots (source). That's right, when you do a pinMode() operation in the code, BoneScript runs a compiler!

Conclusion

The BeagleBone's GPIO pins can be easily controlled through the file system, but a lot goes on behind the scenes, making it very mysterious what is actually happening. Examining the documentation and the device drivers reveals how these file system writes affect the pins by writing to various control registers.[6] Hopefully after reading this article, the internal operation of the Beaglebone will be less mysterious.

Notes and references

[1] Many of the modules of the Sitara chip have cryptic names. A brief explanation of some of them, along with where they are described in the TRM:

PRU-ICSS (Programmable Real-Time Unit / Industrial Communication Subsystem, chapter 4): this is the two real-time microcontrollers included in the chip. They contain their own modules.
Industrial Ethernet is an extension of Ethernet protocols for industrial environments that require real-time, predictable communication. The Industrial Ethernet module provides timers and I/O lines that can be used to implement these protocols.
SGX (chapter 5) is a 2D/3D graphics accelerator.
GPMC is the general-purpose memory controller. OCMC is the on-chip memory controller, providing access to 64K of on-chip RAM. EMIF is the external memory interface. It is used on the BeagleBone to access the DDR memory. ELM (error location module) is used with flash memory to detect errors. Memory is discussed in chapter 7 of the TRM.
L1, L2, L3, L4: The processor has L1 instruction and data caches, and a larger L2 cache. The L3 interconnect provides high-speed communication between many of the modules of the processor using a "NoC" (Network on Chip) infrastructure. The slower L4 interconnect is used for communication with low-bandwidth modules on the chip. (See chapter 10 of the TRM.)
EDMA (enhanced direct memory access, chapter 11) provides transfers between external memory and internal modules.
TS_ADC (touchscreen, analog/digital converter, chapter 12) is a general-purpose analog to digital converter subsystem that includes support for resistive touchscreens. This module is used on the BeagleBone for the analog inputs.
LCD controller (chapter 13) provides support for LCD screens (up to 2K by 2K pixels). This module provides many signals to the TDA19988 HDMI chip that generates the BeagleBone's HDMI output.
EMAC (Ethernet Media Access Controller, chapter 14) is the Ethernet subsystem. RMII (reduced media independent interface) is the Ethernet interface used by the BeagleBone. MDIO (management data I/O) provides control signals for the Ethernet interface. GMII and RGMII are similar for gigabit Ethernet (unused on the BeagleBone).
The PWMSS (Pulse-width modulation subsystem, chapter 15) contains three modules. The eHRPWM (enhanced high resolution pulse width modulator) generates PWM signals, digital pulse trains with selectable duty cycle. These are useful for LED brightness or motor speed, for instance. These are sometimes called analog outputs, although technically they are not. This subsystem also includes the eCAP (enhanced capture) module which measures the time of incoming pulses. eQEP (Enhanced quadrature encoder pulse) module is a surprisingly complex module to process optical shaft encoder inputs (e.g. an optical encoder disk in a mouse) to determine its rotation.
MMC (multimedia card, chapter 18) provides the SD card interface on the BeagleBone.
UART (Universal Asynhronous Receiver/Transmitter, chapter 19) handles serial communication.
I2C (chapter 21) is a serial bus used for communication with devices that support this protocol.
The McASP (Multichannel audio serial port, chapter 22) provides digital audio input and output.
CAN (controller area network, chapter 23) is a bus used for communication on vehicles.
McSPI (Multichannel serial port interface, chapter 24) is used for serial communication with devices supporting the SPI protocol.

[2] The following C++ code uses the file system to switch GPIO 49 on and off. Remove the usleeps for maximum speed. Note: this code omits pin initialization; you must manually do "echo 49 > /sys/class/gpio/export" and "echo out > /sys/class/gpio/gpio49/direction".

#include <unistd.h>
#include <fcntl.h>

int main() {
  int fd =  open("/sys/class/gpio/gpio49/value", O_WRONLY);
  while (1) {
    write(fd, "1", 1);
    usleep(100000);
    write(fd, "0", 1);
    usleep(100000);
  }
}

[3] Here's a program based on devmem2 to toggle the GPIO pin by accessing the register directly. The usleeps are delays to make the flashing visible; remove them for maximum speed. For simplicity, this program does not set the pin directions; you must do that manually.

#include <fcntl.h>
#include <sys/mman.h>

int main() {
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    void *map_base = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
            fd, 0x4804C000);
    while (1) {
        *(unsigned long *)(map_base + 0x194) = 0x00020000;
        usleep(100000);
        *(unsigned long *)(map_base + 0x190) = 0x00020000;
        usleep(100000);
    }
}

For more on this approach, see BeagleBone Black GPIO through /dev/mem.

[4] Selecting one of the eight functions for each pin is done through the pin multiplexer (pinmux) which uses the pinctrl-single device driver. Pin usage is defined in the device tree. The details of this are complex and change from kernel to kernel. See GPIOs on the Beaglebone Black using the Device Tree Overlays or BeagleBone and the 3.8 Kernel for details.

[5] The function names tend to change in different kernel versions. I'm describing the Linux 3.8 kernel code.

[6] I've discussed just the GPIO pins, but other pins (LED, PWM, timer, etc) have similar file system entries, device drivers, and device tree entries.

Restoring Y Combinator's Xerox Alto, day 4: What's running on the system

This post describes our continuing efforts to restore a Xerox Alto. We checked that the low-level microcode tasks are running correctly and the processor is functioning. (The Alto uses an unusual architecture that runs multiple tasks in microcode.) Unfortunately the system still doesn't boot from disk, so the next step will be to get out the logic analyzer and see exactly what's happening. Here's Marc's video of the days's session:

The Alto was a revolutionary computer, designed at Xerox PARC to investigate personal computing, introducing the GUI, Ethernet and laser printers to the world. Y Combinator received an Alto from computer visionary Alan Kay. I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). For background, see my previous restoration articles: day 1, day 2, day 3.

Checking the clocks

We started by checked that all the clock signals were working properly by connecting an oscilloscope to the wirewrap pins on the computer's backplane. This took a lot of careful counting to make sure we connected to the right pins! The system clock signals are generated by an oscillator on the video display card, which isn't where I'd expect to find them. Since the clock signals control the timing of the entire system, nothing will happen if the clock is bad. Thus, checking the clock was an important first step.

At first, the clock signals all looked awful, but after finding a decent ground for the oscilloscope probes, the clock signals looked much better. We verified that the multiple clock outputs were all running nicely. We also tested the reset line to make sure it was being triggered properly - the Alto is reset by pushing a button at the back of the keyboard.

Connecting oscilloscope probes to the Xerox Alto backplane.

Microcode tasks

Next we looked at the running tasks. The Alto has 16 separate tasks running in microcode, doing everything from pushing pixels to the display to refreshing memory to moving disk words. Keep in mind that these are microcode tasks, not operating-system level tasks. The Alto was designed to reduce hardware by performing as many tasks in software as possible to reduce price and increase flexibility. The downside is the CPU can spend the majority of its time doing these tasks rather than "useful" work.

Alto task scheduling is fairly complex. Each task has a priority. When a task is ready to run, its request line is activated (by the associated hardware). The current task can offer to yield by executing the TASK function at convenient points. If there is a higher-priority task ready to run, it preempts the running task. If there's nothing better to run, task 0 runs - this task is what actually runs user code, emulating the Data General Nova instruction set.

The point of this explanation is that microcode instructions need to be running properly for task switching to happen. If the TASK function doesn't get called, the current task will run forever. And if all the task scheduling hardware isn't working right, task switching also won't happen.

Below is a picture of the microcode control board from the Alto. When you're using 1973-era chips, it takes a lot of chips to do anything. This board manages which task is running and the memory address of each task. It uses two special priority encoder chips to determine which waiting task has the highest priority. The board holds the microcode, 1024 micro-instructions of 32 bits each, using eight 1K x 4 bit PROM chips. (PROM, programmable read-only memory, is sort of like non-erasable flash memory.) The board has 8 open sockets allowing an upgrade of 1K of additional microcode to be installed. Note the tiny memory capacity of the time, just 512 bytes of storage per chip.

The microcode control board from a Xerox Alto

Since tasks can be interrupted, the board needs to store the current address of each task. It uses two i3101A RAM chips for this storage. The 3101 is historically interesting: it was the first solid state memory chip, introduced by Intel in 1969. This chip holds 64 bits as 16 words of 4 bits each. Just imagine a time when a memory chip held not gigabits but just 64 total bits.

Looking at the running tasks

The control board has a 4-bit task number available on the backplane, indicating which task is running. We hooked up the oscilloscope so we could see the running tasks. The good news is we saw the appropriate tasks running at the right intervals, with preemption working properly. The following traces show the four task number bits. Most of the time the low-priority task 0 runs (all active-low signals high). Task 12 is running in the middle. Task 8 (memory refresh) runs three times, 38.08 microseconds apart as expected. From the traces, everything seems to be functioning correctly with the task execution.

Trace of the 4-bit microcode task select lines on the Xerox Alto. Top (red) is 8, then 4, 2 and bottom (yellow) is 1 bit. Signals are active-low. Each time interval is 10 microseconds, so this shows a 100 microsecond time interval.

Seeing the running tasks is a big thing, since it shows a whole lot of the system is working properly. As explained earlier, since tasks are running and switching, the microcode processor must be fetching and executing micro-instructions correctly.

Display working better now

You may remember from the previous article that the Alto display was very, very dim and we suspected the CRT was failing. The good news is the display has steadily increased in brightness from its original very dim state, so we probably won't need to replace the CRT. We also managed to see some garbage on the screen along with a cursor, showing that RAM is storing something and the display interface is working.

The display of the Xerox Alto displaying random junk.

Boot still doesn't work

Lots of things are working at this point. The minor :-) remaining problem is the system doesn't boot. Last time, we got the disk drive working: we can put a 14-inch disk cartridge (below) in the drive, the drive spins up, and the heads load. But looking at the backplane signals, we found nothing is getting read from the disk (which explains the boot failure). The oscilloscope showed that the Alto isn't sending any commands to the disk - the Alto isn't even trying to read the disk. We checked for various hardware issues and couldn't find any problems. My suspicion is the boot code in microcode isn't running properly.

Inserting a hard disk into the Diablo drive.

A bit of explanation on the boot process: On reset, microcode task 0 handles the boot. If backspace is pressed on the keyboard, the Alto does a Ethernet boot. Otherwise it does a disk boot by setting up a disk command block in RAM. The microcode disk sector task gets triggered on each sector pulse (which we saw coming from the disk). It checks if there is a command block in RAM, and if so sends the command to the disk. When the read data comes from the disk, the disk word task copies the data into memory. At the end, the block read from disk will be executed, performing the disk boot. So three microcode tasks need to cooperate to boot from disk.

Since we're seeing no command sent to the disk, something must be going wrong between task 0 setting up the command block in RAM and the sector task executing the command block. There's a lot that needs to go right here. If anything is wrong in the ALU or RAM has problems, the command block will get corrupted. And then no disk operation will happen.

Conclusion

The next step is to use a logic analyzer to see exactly what is running, instruction by instruction. By looking at the microcode address lines, we will be able to see what code is executing and where things go wrong. Then we can probe the memory bus to see if RAM is the problem, and look at the ALU to see if it is causing the problem. This is where debugging will get more complex.

I've studied the microcode and it is very bizarre. (You can see the source here.) Instructions are in random order in the PROM, what an instruction does depends on what task is running, branches happen when a device board flips address bits on the bus, and some bits in the PROM are inverted for no good reason (probably to save an inverter chip somewhere). So looking at the microcode can be mind-bending. But hopefully with the logic analyzer we can narrow the problem down. We can also use the Living Computer Museum's simulator to cross-check against what microcode should be running.

For updates on the restoration, follow kenshirriff on Twitter.

Restoring Y Combinator's Xerox Alto, day 3: Inside the disk drive

I'm helping restore a Xerox Alto — a legendary minicomputer from 1973 that helped set the direction for personal computing. This post describes how we cleaned and restored the disk drive and then powered up the system. Spoiler: the drive runs but the system doesn't boot yet.

While creating the Alto, Xerox PARC invented much of the modern personal computer: everything from Ethernet and the laser printer to WYSIWYG editors with high-quality fonts. Getting this revolutionary system running again is a big effort but fortunately I'm working with a strong team: Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen, along with special guest Tim Curley from PARC.

If this article gives you deja vu, you probably saw Marc's restoration video (above) on Hacker News last week or read the earlier restoration updates: introduction, day 1, day 2.

Hard disk technology of the 1970s

For mass storage, the Alto uses a Diablo disk drive, which stores 2.5 megabytes on a removable 14 inch disk cartridge. With 1970s technology, you don't get much storage even on an inconveniently large disk, so Alto users were constantly short of disk space. The photo below shows the Xerox Alto, with the computer chassis (bottom) partially removed. Above the chassis and below the keyboard is the Diablo disk drive, which is the focus of this article.

The Xerox Alto II XM 'personal computer'. The card cage below the disk drive has been partially removed. Four cooling fans are visible at the front of it.

To insert the disk cartridge into the drive, the front of the drive swings down and the cartridge slides into place. The cartridge is an IBM 2315 disk pack, which was used by many manufacturers of the era such as DEC and HP, and contains a single platter inside the hard white protective case. The disk drive has been partially pulled out of the cabinet and the top removed, revealing the internals of the drive. During normal use, the disk drive is inside the cabinet, of course.

Inserting a hard disk into the Diablo drive.

Unlike modern hard disks, the Alto's disk is not sealed; the disk pack opens during use to provide access to the heads. To protect against contamination and provide cooling, filtered air is blown through the disk pack during use. Air enters the disk through a metal panel on the bottom of the disk (as seen below) and exits through the head opening, blowing any dust away from the disk surface.

Hard disk for the Xerox Alto, showing the air intake vent.

Although the heads are widely separated during disk pack insertion, they move very close to the disk surface during operation, floating on a cushion of air about one thousandth of a millimeter above the surface. The diagram below from the manual illustrates the danger of particles on the disk's surface. Any contamination can cause the head to crash into the disk surface, gouging out the oxide layer and destroying the disk and the head.

The Diablo disk and why contaminants are bad, from the Alto disk manual.

The magnified photo below shows the read/write head. The two air bleed holes ensure that the head is flying at the correct height above the disk surface. The long part of the cross contains the read/write coil, while the short part of the cross contains the erase coils (which erase a band between tracks).

Read/write head for the Diablo drive.

The following diagram shows how data is stored on the disk in 203 tracks (actually 203 cylinders, since there are tracks on the top and bottom surfaces). The drive moves the tiny read/write heads to the desired track. Each track is divided into 12 sectors, with 256 words of data in each sector.

Diagram of how the Diablo disk drive's read/write head stores data in tracks on the disk surface. From the Maintenance Manual.

In the photo below, we have removed the top of the disk pack revealing the hard disk inside. Note the vertical metal ring along the inside of the disk; it has twelve narrow slots that physically indicate the twelve sectors of the disk. A double slot is the index mark, indicating the first sector. To make sure the disk surface was clean, we wiped the disk surfaces clean with isopropyl alcohol. This seemed a bit crazy to me, but apparently it's a normal thing to do with disks of that era.

Inside the disk pack used by the Xerox Alto.

The photo below shows the motor spindle that rotates the hard disk at 1500 RPM. In front of the spindle, you can see the sensor that detects the slots that indicate sectors. To the left is the air duct that provides filtered airflow into the disk pack. (The air intake on the bottom of the disk pack was shown in an earlier photo.) Around the edge of the air duct is foam to provide a seal between the duct and the disk cartridge, ensuring airflow through the cartridge.

The motor spindle (center) rotates the hard disk. In front of the spindle is the sensor to detect sectors. To the left is the ventilation air duct for the disk.

After 40 years, the foam had deteriorated into mush and needed to be replaced. The foam no longer provided an airtight seal. Even worse, particles could break off the foam. If a piece of foam got blown onto the disk surface, it would probably trigger a catastrophic disk crash. To replace the foam, we used weatherstripping, which isn't standard but seemed to get the job done.

As well as replacing the foam, we vacuumed any dust out of the drive and carefully cleaned the heads and other drive components.

How the Diablo drive works

The drive itself has fairly limited logic, with most of the functionality inside the Alto. The drive can seek to a particular track, indicate the current sector, and read or write a stream of raw bits. Since there's no buffering in the disk drive, the Alto must supply every bit at the precise time based on the disk's rotation. In the Alto, microcode performs many interfacing tasks that are usually done in hardware. Instead of using DMA, the Alto's microcode moves data words one at a time to the disk interface card in the Alto, which does the serial/parallel conversion.

The Diablo drive opened for servicing.

Modern disk drives use a dense disk controller integrated circuit. The Diablo drive, in contrast, implements its limited functionality with transistors and individual chips (mostly gates and flip flops), so it requires boards of components. The photo above shows the 6 main circuit boards of the Alto, plugged into the "mother board": three on the left side and three on the right side. For ease of maintenance, the electronics assembly pops up as seen above, allowing access to the boards. The leftmost board is the analog circuitry, generating the write signals for the heads and amplifying the signals read back from the disk. You can see a wire running from the board to the read/write heads. The next board detects sector and index marks and controls the motor speed. The third board has a counter to keep track of the current sector number.

The three boards on the right perform seeks, moving the disk head to the desired track. The first board computes the difference between the previous track number and the requested track number. The next board counts tracks as the head moves to determine the distance remaining. The rightmost board controls the servo that moves the head to the right track. The seek servo has a four-speed drive, so the head moves rapidly at first and slows down as it approaches the right track, more sophistication than I expected. The Diablo drive manual has detailed schematics.

The photo below shows some of the colorful resistors and diodes on the analog read/write board, along with some transistors. Modern circuit boards would be much denser, with tightly packed surface mounted components.

Circuitry inside the Diablo 31 drive.

The head positioning mechanism is shown below. The turquoise circles rotate as the drive moves to a new track and the yellow pointer indicates the track number on the dial. The heads themselves are on the arm below (lower center). In front of the heads (bottom of the picture) is the metal bar that opens the disk pack when it is inserted.

Inside the Diablo disk drive. The heads are visible in the center. In front of them is the metal bar that opens the disk pack.

As the disk pack enters the drive, it opens up to provide access to the disk surface. The photo below shows the same mechanism as the previous photo, but from the side and with a disk inserted. You can see the exposed surface of the disk, brownish from the magnetizable iron oxide layer over the aluminum platter. As described earlier, the airflow exits the cartridge here, preventing dust from entering through this opening. The read/write head is visible above the disk's surface, with another head below the disk.

Closeup of the hard disk inside the Diablo drive. The read/write head (metal/yellow) is visible above the disk surface (brown).

The drive largely uses primitive DTL chips—diode transistor logic, an early form of digital logic, as well as some slightly more modern TTL chips. The photo below shows some of the chips on the sector counting board. The chips labeled MC858P provide four NAND gates, so there's not much logic per chip. (7651 is the date code, indicating the chip was manufactured in week 51 of 1976.)

Chips on a control board for the Diablo drive.

Conclusion

After putting the disk drive back together, we carefully powered up the system. The disk drive spun up to high speed, the heads dropped to the surface, and the disk slowed to 1500 RPM as expected. (One surprising complexity of the drive is it runs at a faster speed for a while so the airflow will blow contaminants out of the disk pack before loading the heads; it has counters and logic to implement this.) We verified that the disk surface remained undamaged, so the drive works properly, at least mechanically.

This was the first time we had powered up the Alto circuitry. Happily, nothing emitted smoke. But not surprisingly, the Alto failed to boot from the disk. Unless the Alto can read boot code from the disk (or Ethernet), nothing happens, not even a prompt on the screen. The photo below shows the disk with the ready light illuminated, and the empty screen.

The Xerox Alto's drive powered up, along with monitor (showing a white screen).

We have a long debugging task ahead of us, to trace through the Alto's logic circuits and find out what's going wrong. We're also building a disk emulator using a FPGA, so we will be able to run the Alto with an emulated disk, rather than depending on the Diablo drive to keep running. The restoration is likely to keep us busy for a while, so expect more updates. One item we are missing is the Alignment Cartridge (or C.E. Pack), a disk cartridge with specially-recorded tracks used to align the drive; let us know if you happen to have one lying around!

For updates on the restoration, follow kenshirriff on Twitter. Thanks to Al Kossow and Keith Hayes for assistance with restoration.