Ken Shirriff's blog

Restoring YCombinator's Xerox Alto day 6: Fixed a chip, data read from disk

In today's Xerox Alto restoration session we investigated why the disk drive isn't working and found a failed chip. With this chip repaired, we were able to read a block from disk, although the system still doesn't boot. (In previous episodes, we fixed the power supply, got the CRT display working, cleaned up the disk drive and hooked up a logic analyzer: days 1, 2, 3, 4 and 5.)

Our test setup for the Xerox Alto. The Alto computer itself is the metal cabinet in the center with the visible circuit boards. On the left is a vintage HP line printer, with the logic analyzer behind it. The video display for the Alto is visible on the right, behind the oscilloscope.

The Alto was a revolutionary computer, designed at Xerox PARC in 1973 to investigate personal computing. It introduced the GUI, Ethernet and laser printers to the world, among other things. Y Combinator received an Alto from computer visionary Alan Kay and I'm helping restore the system, along with Marc Verdiell, Luca Severini, Ron Crane, Carl Claunch and Ed Thelen (from the IBM 1401 restoration team). Marc's video of this restoration session is below.

The missing disk sector task

In the Alto, like most modern computers, each machine instruction is implemented in an even more primitive form of code called microcode. But unlike most computers, the Alto also implements some of its low-level software in microcode. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, a microcode sector task processes each disk sector and a word task stores each word of data as it arrives from the disk drive; most computers do this with DMA hardware.

Last week we hooked a logic analyzer to the Alto to trace the executing microcode and found the disk sector task was failing to run. Each track on the Alto's hard disk is divided into 12 sectors, with 12 slots in the hub to indicate the sectors. We verified that the disk drive was detecting these slots and sending the sector pulses every 3.33 milliseconds. The disk sector task is supposed to run for each sector and perform any disk command, but the logic analyzer showed that this task was not running.

The hard disk pack for the Xerox Alto has 12 sectors. Slots cut into the disk hub trigger a signal for each sector. Four of the sector slots are labeled in the photo.

Why was the sector task not running? The disk interface board provides a signal to indicate when the sector task should run (WAKEST), but we found it was not being activated even though the disk drive was providing sector pulses to the disk interface board. Looking at the disk interface board schematic, the sector pulse circuit is fairly simple: just a few flip flops. (You don't need to understand the schematic below. The key point is the sector pulse comes in on the left, goes through a few chips, and the wakeup signal comes out on the right.) I've heard that old TTL flip flops fail regularly, so I figured one of the flip flop chips had failed. We decided to hook up an oscilloscope and see where things were going wrong, but one problem stood in our way.

Schematic from the Xerox Alto's disk controller card. This circuit processes sector pulses from the disk drive and generates signals to wake up the microcode sector task.

The extender card

The Alto consists of 13 circuit cards plugged into a wire-wrapped backplane, making them inaccessible to probing. Fortunately, the Living Computer Museum gave us an extender card, a board that goes between an Alto board and the backplane, physically extending the Alto board out of the cabinet where it can be diagnosed. Last week, we used the extender card to probe signals on the CPU control board. But no matter how hard I tried, I couldn't get the extender board to plug into the disk interface board's slot. Marc noticed out that the board was hitting something, and we realized that the disk interface board had a notch on the right, allowing the board to clear a bar that was in the way. The extender board, like most of the Alto boards, lacked this notch. A bit more investigation revealed that memory boards had a notch, but on the left.

Why did some boards have notches? Most of the boards are powered with 5 volts. The memory boards also require -5 volts and +12 volts for the 4116 DRAM chips. The I/O boards (Ethernet and disk) have +/- 15 volts as well as 5 volts. The Alto backplane was apparently designed so you couldn't plug a board into a slot with the wrong voltages (which would have been catastrophic). Boards with unusual power requirements had a notch that allowed them to fit into slots wired with unusual voltages. The consequence was that we couldn't use the extender board with the disk interface without cutting a notch in it, which we did (see photo below).

Milling a notch into the extender board.

We were worried that by cutting a notch in the extender board and using it in a slot where it wasn't intended we might destroy the computer in a spectacular show of sparks and smoke. The concern was that the extender board doesn't simply pass the 162 lines through, but wires all the ground lines to a ground plane and wires the +5 lines together. If the disk interface card had +15 volts where the extender board expected, say, +5 volts, the extender card would run +15 volts to all the chips and destroy them. We verified the wiring five times to make sure nothing would get shorted, plugged in the extender board, and turned the Alto on with some trepidation. Fortunately our calculations were correct and nothing blew up.

Debugging the disk interface

The photo below shows the disk interface card extended out of the Alto cabinet, with some oscilloscope probes attached to the flip flop chips. (The ribbon cable attached to the board connects to the disk drive, while the ribbon cable hanging above the board allows us to probe microcode signals with the logic analyzer.) Strangely, we didn't see any signals either going into the flip flops or coming out. We checked that the sector pulses were showing up in the logic analyzer, and on the connector from the disk drive, but the flip flops were getting nothing. Eventually we turned our attention to the inverter chip (see earlier schematic). We saw the sector signal going into the inverter, but not coming out. Could this simple chip be causing the problems?

Debugging the disk interface card in the Xerox Alto.

The 7414 TTL chip contains 6 inverters, which turn a 1 input into a 0 output and vice versa. We pulled the chip out of the disk interface board and tested it with a simple LED circuit (see photo below). Five of the six inverters worked fine, but one of the inverters had entirely failed. The chip is a bit unusual since it uses a Schmitt trigger—a circuit that cleans up noisy signals (such as the sector pulses that traveled over a long cable from the disk drive)—so we couldn't get a replacement at Fry's or Radio Shack. Were we stuck for the day?

Testing the 7414 inverter chip from the Xerox Alto's disk interface card. One inverter was burnt out, preventing the disk from working.

Fortunately we could work around the faulty chip. Carl studied the schematics and discovered that one of the good inverters on the chip was unused. We rewired the chip to replace the bad inverter with the unused good inverter by using an ugly but effective "dead bug" hack. We bent out the pins from the good inverter and attached wires. We cut off the pins from the bad inverter. Finally, we stuck the wires into the socket along with the IC, so the good inverter was wired in place of the bad inverter.

We re-wired a 7414 inverter chip. An unused inverter replaced the failed inverter.

We booted the Alto and found that our chip hack actually worked and the system worked much better than before: the sector pulses got through the inverter, were processed by the flip flops, and triggered the sector task as we hoped. The sector task read the disk command from memory and sent it to the disk drive. The disk drive read the desired sector and started sending bits back. For each word, the disk word task read the word from the disk interface and stored it in memory. In summary, we were now reading data from disk!

Reading data from disk was a big milestone, since most of the system needed to be working properly for this to happen. Unfortunately the Alto didn't boot up, and we'll need to figure out where things went wrong. Is the boot block not running correctly? Is the read data corrupted? Is the disk returning an error at some point? Is our disk not a boot disk? Strangely, there was no sign of the parity errors we kept seeing last week.

The timeline diagram below shows task switching in the Alto over an interval of 700 microseconds.. You can see that the microcode is constantly switching between tasks. Today's accomplishment can be seen in the periodic execution of disk word task (KWT) at the bottom of the image; this task runs about every 9 microseconds when each word comes from the disk drive. The disk sector task (KSEC) runs at the start of the next sector (at which time the word task stops). Other tasks are the memory refresh task (MRT) and cursor task (CURT) that run periodically. (You can see where the higher-priority MRT task interrupted the KSEC task.) The lowest priority task is the Nova emulator (NOVEM), which runs program code when nothing else is happening. The numbers at the bottom show the micro-instruction count since boot; at this point we are 14.8 milliseconds into the boot process. I generated the diagram below by processing the logic analyzer output to show each running task. An interactive version is here, allowing zoom and pan with the mouse.

Timeline showing task switching on the Xerox Alto. These are microcode tasks switched by hardware, not operating system level processes or threads.

Conclusion

In today's repair session, we found a failed 7414 inverter chip that was preventing disk operation. By working around that issue, we could finally read from disk, but boot is still failing for unknown reasons. Nonetheless, today's session got us much closer to a working system. We'll need to dig through the logic analyzer output to figure out where the boot process is breaking down.

Lacking safety features, cheap MacBook chargers create big sparks

You might wonder if it's worth spending $79 for a genuine MacBook charger when you can get a charger on eBay for under $15. You shouldn't get a cheap charger because they are often dangerous and lack safety features. In addition, they produce poor-quality power that isn't good for your laptop and may charge more slowly. I've written before about the safety problems with cheap chargers, but they say a picture is worth a thousand words, so here is why you shouldn't buy a cheap knockoff charger:

A knockoff MacBook charger emits large sparks if short-circuited. Genuine Apple chargers have safety features to protect against this.

If the connector comes in contact with something metal (a paperclip in this instance), it shorts out, creating a big spark. (Don't try this at home.) The genuine Apple charger (below) has safety features that protect against a short circuit. Shorting the connector on a genuine charger has no effect.

A genuine Apple MacBook charger has safety features that protect it from short circuits.

It's really hard to tell a genuine charger from a knockoff from the outside, since the knockoffs look just like the real thing. If you carefully read the text on this charger, you'll notice that "Apple" is missing. However, many knockoff chargers duplicate the text from a real charger, so often you can't tell if it is genuine or not just by looking. Big sparks, however, are a clear sign.

A cheap MacBook charger from eBay. Unlike most cheap chargers, this one doesn't claim to be an Apple charger, but just a "Replacement AC Adapter".

Why does a fake charger produce sparks, while a genuine one doesn't? The fake charger constantly outputs 20 volts, so if any metal shorts the connector, it produces a big spark with all its 85 watts of power. On the other hand, the genuine charger doesn't power up until it has been securely connected to the laptop for a full second. Until it is properly connected (details), it outputs a tiny amount of power (0.6 volts at 100µA) that can't produce a spark. To manage this, the genuine charger includes a powerful microcontroller (more powerful than the microprocessor in the original Macintosh by some measures). Since this processor increases the cost of the charger, knockoff chargers omit it, even though this makes the charger more dangerous.

As the photos below show, the cheap charger (left) omits as much as possible. On the other hand, the genuine Apple charger (right) is crammed full of components. Many of these components filter the power to provide higher-quality power to your laptop. The Apple charger also includes power factor correction, making the charger more efficient.

The cheap MacBook charger (left) omits most of the components found in a genuine Apple charger (right). The genuine charger includes more filtering, power factor correction (left), and a powerful microcontroller (board in upper right).

I've written in detail before about how chargers work, but I'll give a quick explanation here. The AC power comes in the red wires at the top and is converted to high-voltage DC (170V or 340V, depending on if you're in the US or Europe). A transistor (black component on left) chops the power into high-frequency pulses. The pulses create a changing magnetic field in the flyback transformer (large blue box), generating a high-current, low-voltage output. The output is converted to DC by diodes (black component, upper right), and filtered by capacitors (cylinders), to produce the 20 volt output (wires at bottom). A control IC (see photo below) controls the system to regulate the voltage. This may seem like an excessively complicated way to generate 20 volts, but switching power supplies like this are very compact, lightweight and efficient compared to simpler power supplies.

Shorting a cheap charger with a paperclip creates impressive sparks.

Looking at the underside of the cheap charger board shows it has very few components, while the genuine Apple charger's board is covered with tiny components. The two chargers are worlds apart as far as complexity, and this complexity is what provides more efficiency, more safety, and better quality power in the Apple charger.

The cheap MacBook charger (left) uses very simple circuits compared to the genuine Apple charger (right), which is crammed full of components.

Conclusion

While buying a cheap charger saves a lot of money, these chargers omit many safety features and can be hazardous to you and your computer. Don't buy a cheap knockoff charger; if you don't want to pay for a genuine Apple charger, at least buy a charger from a name-brand manufacturer.

Maybe you think these safety issues don't matter because you don't poke your charger with a paperclip. But if you have any metal objects on your desk, a random contact could yield a surprisingly large spark.

I've written a bunch of articles before about chargers, so if this article seems familiar, you're probably thinking of an earlier article, such as: Counterfeit MacBook charger teardown, Magsafe charger teardown, iPhone charger teardown or iPad charger teardown.

Follow me on Twitter to find out about my new articles.

Notes

If you're interested in the components inside the cheap charger, I have some details. The PWM control IC is a SiFirst 1560, a basic control IC for a flyback converter. The IC datasheet has the approximate schematic for the charger. The switching transistor is a 2N601 2 amp, 600 volt N-channel MOSFET. The voltage reference is an AZ431, similar to the ubiquitous TL431. The optoisolator is an 817C. The output diode is a MBRF20100C 10 amp Schottky diode pair. The electrolytic capacitors are from HKLCON.

A cheap charger emits large sparks if you short the connector with a paperclip. Safety features in a genuine charger protect against shorts.

Restoring YCombinator's Xerox Alto day 5: Microcode tracing with a logic analyzer

In today's Xerox Alto restoration session we investigated why the system doesn't boot. We find a broken wire, hook up a logic analyzer, generate a cloud of smoke, and discover that memory problems are preventing the computer from booting. (In previous episodes, we fixed the power supply, got the CRT display working and cleaned up the disk drive: days 1, 2, 3. and 4.)

The broken wire

The Xerox Alto is built from 13 circuit boards, crammed with TTL chips. In 1973, minicomputers such as the Alto were built from a whole bunch of simple ICs instead of a primitive microprocessor chip. (People still do this as a retro project.) The Alto's CPU is split across 3 boards: an ALU board, a control board, and a control RAM board. The control board is the focus of today's adventures.

If a circuit board has a design defect or needs changes, it can be modified by attaching new wires to create the necessary connections. The photo below shows the control board with several white modification wires. While examining the control board, we noticed one of the wires had come loose. Could the boot failures be simply due to a broken wire?

Control board from the Xerox Alto, showing a broken wire. The white wires were for a modification, but one wire came loose.

We carefully resoldered the wire and powered up the system. The disk drive slowly came up to speed and the heads lowered onto the disk surface. We pressed the reset button (under the keyboard) to boot. As before, nothing happened and the display remained blank. Fixing the wire had no effect.

After investigation, it appears the rework wires were to support the Trident/Tricon hard disk. In the photo above, note the small edge connector in the upper right, with the white wires connected. The Trident disk controller used this connector, but our (Diablo) disk controller does not. In other words, the broken wire might have caused problems with a different disk drive, but it was irrelevant to us.

Microcode on the Xerox Alto

Some background on the Xerox Alto's architecture will help motivate our day's investigation. The Alto, like most modern computers, is implemented using microcode. Computers are programmed in machine instructions, where each instruction may involve several steps. For instance, a "load" instruction may first compute a memory address by adding an offset to an index register. Then the address is sent to memory. Finally the contents of memory are stored into a register. Instead of hardcoding these steps (as done in the 6502 or Z-80 for instance), modern computers run a sequence of "micro-instructions", where each micro-instruction performs one step of the larger machine instructions. This technique, called microcode, is used by the Xerox Alto.

The Alto uses microcode much more heavily than most computers. The Alto not only uses microcode to implement the instruction set, but implements part of the software in microcode directly. Part of the Alto's design philosophy was to use software (i.e. microcode) instead of hardware where possible. For instance, most video displays pull pixels out of memory and display them on the screen. In the Alto, the processor itself fetches pixels out of memory and passes them to the video hardware. Similarly, most disk interfaces transfer data between memory and the disk drive. But in the Alto, the processor moves each data word to/from memory itself. The code to perform these tasks is written in microcode.

To perform all these low-level activities, the Alto hardware manages 16 different tasks, listed below. High-priority tasks (such as handling high-speed data from the disk) can take over from low-priority tasks, such as handling the display cursor. The lowest-level task is the "emulator", the task that executes program instructions. (In a normal computer, the emulator task is the only thing microcode is doing.) Remember, these tasks are not threads or processes handled by the operating system. These are microcode tasks, below the operating system and scheduled directly by the hardware.

Task	Name	Description
0	Emulator	Lowest priority.
1	-	unused
2	-	unused
3	-	unused
4	KSEC	Disk sector task
5	-	unused
6	-	unused
7	ETHER	Ethernet task
8	MRT	Memory refresh task. Wakeup every 38.08 microseconds.
9	DWT	Display word task
10	CURT	Cursor task
11	DHT	Display horizontal task
12	DVT	Display vertical task. Wakeup every 16.666 milliseconds.
13	PART	Parity task. Wakeup generated by parity error.
14	KWD	Disk word task
15	-	unused

Last episode, we found that processor was running the various tasks, but never tried to access the disk. System boot is started by the emulator task, which stores a disk command in memory. The disk sector task (KSEC) periodically checks if there are any disk commands to perform. Thus, it seemed like something was going wrong in either the emulator task (setting up the disk request), or the disk sector task (performing the disk request). To figure out exactly what was happening, we needed to hook up a logic analyzer.

The logic analyzer

A logic analyzer is a piece of test equipment a bit like an oscilloscope, except instead of measuring voltages, it just measures 0's or 1's. A logic analyzer also has dozens of inputs, allowing many signals to be analyzed at once. By using a logic analyzer, we can log every micro-instruction the processor runs, track each task, and even record every memory access.

Most of the signals of interest are available on the Alto's backplane, which connects all the circuit cards. Since the backplane is wire-wrapped, it consists of pins that conveniently fit the logic analyzer probes. For each signal, you need to find the right card, and then count the pins until you find the right pin to attach the probe. This setup is very tedious, but Marc patiently connected all the probes, while Carl entered the configuration into the logic analyzer.

The backplane of the Xerox Alto, with probes from the logic analyzer attached to trace microcode execution. Note the thick power wires on the left.

Unfortunately, a few important signals (the addresses of the micro-instructions) were not available on the backplane, and we needed to attach probes to one of the PROM chips that hold the microcode. Fortunately, the Living Computer Museum in Seattle gave us an extender card; by plugging the extender card into the backplane and the circuit board into the extender card, the board was accessible and we could connect the probes.

Probes from the logic analyzer hooked up to the Xerox Alto. By plugging the control board into an extension board, probes can be attached to it.

Hours later, with all the probes connected and the configuration programmed into the logic analyzer, we were ready to power up the system and collect data.

Running the logic analyzer

"Smoke! Stop! Shut it off!"

As soon as we flipped the power switch, smoke poured out of the backplane. Had we destroyed this rare computing artifact? What had gone wrong? When something starts smoking, it's usually pretty obvious where the problem is. In our case, one of the ground wires from the logic analyzer pod had melted, turning its insulation into smoke. A bit of discussion followed: "Pin 3 is ground, right?" "No, pin 9 is ground, pin 3 is 5 volts." "Oops." It turns out that when you short +5 and ground, a probe wire is no match for a 60 amp power supply. Fortunately, this wire was the only casualty of the mishap.

This logic probe wire melted when we accidentally connected +5 volts and ground with it.

With this problem fixed, we were able to get a useful trace from the logic analyzer. The trace showed that the Alto started off with the emulator/boot task. After just four instructions, execution switched to the disk word task, which was rapidly interrupted by the parity error task. When that task finished, execution went back to the disk word task, which was interrupted a few instructions later by the display vertical task. The disk word task was able to run a few more instructions before the display horizontal task ran, followed by the cursor task.

The vintage Agilent 1670G logic analyzer that we connected to the Xerox Alto. The screen shows the start of the Alto's boot sequence.

It's rather amazing how much task switching is going on in the Alto, with low-priority tasks only getting a few instructions executed before being interrupted by a higher-priority task. Looking at the trace made me realize how much overhead these tasks have. In our case, the emulator task is running the boot code, so progress towards boot requires looking at hundreds of instructions in the logic analyzer.

The key thing we noticed in the traces is the parity error task ran right near the start, indicating an error in memory. This would explain why the system doesn't boot up. We ran a few more boot cycles through the logic analyzer. The specific order of tasks varied each time, as you'd expect since they are triggered asynchronously from hardware events. But we kept seeing the parity errors.

The Alto's memory system

The Alto was built in the early days of semiconductor memory, when RAM chips were expensive and unreliable. The original Alto module used Intel's 1103 memory chips, which were the first commercially available DRAM chip, holding just 1 kilobit. To provide 128 kilobytes of memory, the Alto I used 16 boards crammed full of chips. (If you're reading this on a computer with 4 gigabytes of memory, think about how much memory capacity has improved since the 1970s.)

We have the later Alto II XM (extended memory) system, which used more advanced 16 kilobit chips to fit 512 kilobytes of storage onto 4 boards. Each memory board stored a 10 bit chunk—why 10 bits? Because memory chips were unreliable, the Alto used error correction. To store a 32-bit word pair, 6 bits of Hamming error correction were added, along with a parity bit, and one unused bit. The extra bits allow single-bit errors to be corrected and double-bit errors to be detected. The four memory boards in parallel stored 40 bits at a time—the 32 bit word pair and the extra bits for error correction.

A 128KB memory card from the Xerox Alto. The board has eighty 4116 DRAM chips, each with 16 kilobits of storage.

In addition to the 4 memory boards, the Alto has three circuit boards to control memory. The "MEAT" (Memory Extension And Terminator) is a trivial board to support four memory banks (the extended memory in the Alto XM). The "AIM" board (Address Interface Module) is a complex board that maps addresses to memory control signals, as well as handling memory-mapped peripherals such as the keyboard, mouse, and printer. Finally, the "DIM" board (Data Interface Module) generates the Hamming error correcting code signals, and performs error detection and correction.

More probing showed that the DIM board was always expressing a parity error. At this point, we're not sure if some of the memory chips are bad or if the complex circuitry on the DIM board is malfunctioning and reporting errors. As you can tell from the above description, the memory system on the Alto is complex. It may be a challenge to debug the memory and find out why we're getting errors.

A look at the microcode

In this section, I'll give a brief view of what the microcode looks like and how it appears in the logic analyzer. Microcode is generally hard to understand because it is at a very low level in the system, below the instruction set and running on the bare hardware. The Alto's microcode seems especially confusing.

Each Alto micro-instruction specifies an ALU operation and two "functions". A function can be something like "read a register" or "send an address to memory". But a function can also change meaning depending on what task is running. For instance, when the Ethernet task is running, a function might mean "do a four-way branch depending on the Ethernet state". But during the display task, the same function could mean "display these pixels on the screen". As a result, you can't figure out what an instruction does unless you know which task it is a part of.

The image below shows a small part of the logic analyzer output (as printed on Marc's vintage HP line printer). Each line corresponds to one executed micro-instruction. The "address" column shows the address of the micro-instruction in the 1K PROM storage. The task field shows which task is running. You can see the task switch midway through execution; 0 is the emulator and 13 is the parity task. Finally, the 32-bit micro-instruction is broken into fields such as RSEL (register select), ALUF (ALU function) and F1 (function 1).

The start of the logic analyzer trace from booting the Xerox Alto. The trace shows us each micro-instruction that was executed.

Note that the addresses jump around a lot; this is because the microcode isn't stored linearly in the PROM. Every micro-instruction has a "next instruction address" field in the instruction, so you can think of it as a GOTO inside every instruction. To make it worse, this field can be modified by the interface hardware, turning a GOTO into a computed GOTO. To make this work, the assembler shuffles instructions around in memory, so it's hard to figure out what code goes with a particular address. The point of this is that the logic analyzer output shows us every micro-instruction as it executes, but the output is somewhat difficult and tedious to interpret.

Fortunately we have the source code for the microcode, but understanding it is a challenge. The image below shows a small section of the boot code. I won't attempt to explain the microcode in detail, but want to give you a feel for what it is like. Labels (along the left) and jumps to labels are highlighted in blue. Things such as IR, L, and T are registers, and they get assigned values as indicated by the arrows. MAR is the memory address register (giving an address to memory) and MD is memory data, reading or writing the memory value.

A short section of the Xerox Alto's microcode. Labels and jumps are colored blue. Comments are gray.

Figuring out the control flow of the microcode requires detailed understanding of what is happening in the hardware. For example, in the last line above, ":Q0" indicates a jump to label "Q0". However the previous line says "BUS", which means the contents of the data bus are ORed into the address, turning the jump into a conditional jump to Q0, Q1, Q2, etc. depending on the bus value. And "TASK" indicates that a task switch can happen after the next instruction. So matching up the instructions in the logic analyzer output with instructions in the source code is non-trivial.

I should mention that the authors of the Alto's microcode were really amazing programers. An important feature for graphics displays is BITBLT, bit block transfer. The idea is to take an arbitrary rectangle of pixels in memory (such as a character, image, or window) and copy it onto the screen. The tricky part is that the regions may not be byte-aligned, so you may need to extract part of a byte, shift it over, and combine it with part of the destination byte. In addition, BITBLT supports multiple writing modes (copy, XOR, merge) and other features. So BITBLT is a difficult function to implement, even in a high-level language. The incredible part is that the Xerox Alto has BITBLT implemented in hundreds of lines of complex microcode! Using microcode for BITBLT made the operation considerably faster than implementing it in assembly code. (It also meant that BITBLT was used as a single machine language instruction.)

Conclusion

Hooking up the logic analyzer was time consuming, but succeeded in showing us exactly what was happening inside the Alto processor. Although interpreting the logic analyzer output and mapping it to the microcode source is difficult, we were able to follow the execution and determined that the parity task was running. It appears that memory parity errors are preventing the system from booting. Next step will be to understand the memory system in detail to determine where these errors are coming from and how to fix them.

PRU tips: Understanding the BeagleBone's built-in microcontrollers

The BeagleBone Black is an inexpensive, credit-card sized computer that has two built-in microcontrollers called PRUs. While the PRUs provide the real-time processing capability lacking in Linux, using these processors has a learning curve. In this article, I show how to run a simple program on the PRU, and then dive into the libraries and device drivers to show what is happening behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then.

The BeagleBone uses the Sitara AM3358, an ARM processor chip running at 1 GHz—this is the thumbnail-sized chip in the center of the board below. If you want to perform real-time operations, the BeagleBone's ARM processor won't work well since Linux isn't a real-time operating system. However, the Sitara chip contains two 32-bit microcontrollers, called Programmable Realtime Units or PRUs. (It's almost fractal, having processors inside the processor.) By using a PRU, you can achieve fast, deterministic, real-time control of I/O pins and devices.

The BeagleBone computer is tiny. For some reason it was designed to fit inside an Altoids mint tin. The square thumbnail-sized chip in the center is the AM3358 Sitara processor chip. This chip contains an ARM processor as well as two 32-bit microcontrollers called PRUs.

The nice thing about using the PRU microcontrollers on the BeagleBone is that you get both real-time Arduino-style control, and "real computer" features such as a web server, WiFi, Ethernet, and multiple languages. The main processor on the BeagleBone can communicate with the PRUs, giving you the best of both worlds. The downside of the PRUs is there's a significant learning curve to use them since they have their own instruction set and run outside the familiar Linux world. Hopefully this article will help with the learning curve.

I wrote an article a couple weeks ago on The BeagleBone's I/O pins. That article discussed the pins controlled by the ARM processor, while this article focuses on the PRU microcontroller. If you think you've seen the present article before, they cover two different things, but I won't blame you for getting deja vu!

Running a "blink" program on the PRU

To motivate the discussion, I'll use a simple program that uses the PRU to flash an LED. This example is based on PRU GPIO example

Blinking an LED using the BeagleBone's PRU microcontroller.

The easiest way to compile and assemble PRU code is to do it on the BeagleBone, since the necessary tools are installed by default (at least if you get the Adafruit BeagleBone). Perform the following steps on the BeagleBone.

Connect an LED to BeagleBone header P8_11 through a 1K resistor.
Download the assembly code file blink.p:
Download the host file to load and run the PRU code, loader.c.
Download the device tree file, /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dts.

Compile and install the device tree file to enable the PRU:

# dtc -O dtb -I dts -o /lib/firmware/PRU-GPIO-EXAMPLE-00A0.dtbo -b 0 -@ PRU-GPIO-EXAMPLE-00A0.dts
# echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots
# cat /sys/devices/bone_capemgr.?/slots

Assemble blink.p and compile the loader:

# pasm -b blink.p
# gcc -o loader loader.c -lprussdrv

Run the loader to execute the PRU binary:
```
# ./loader blink.bin
```

If all goes well, the LED should blink 10 times.[1]

Documentation

The most important document that describes the Sitara chip is the 5041-page Technical Reference Manual (TRM for short). This article references the TRM where appropriate, if you want more information. Information on the PRU is inconveniently split between the TRM and the AM335x PRU-ICSS Reference Guide. For specifics on the AM3358 chip used in the BeagleBone, see the 253 page datasheet. Texas Instruments' has the PRU wiki with more information.

If you're looking to use the BeagleBone and/or PRU I highly recommend the detailed and informative book Exploring BeagleBone. Helpful web pages on the PRU include BeagleBone Black PRU: Hello World, Working with the PRU and BeagleBone PRU GPIO example.

The assembly code

If you're familiar with assembly code, the PRU's instruction set should be fairly straightforward. The instruction set is documented on the wiki and in the PRU reference manual (section 5).[2]

The demonstration blink code uses register R1 to count the number of blinks (10) and register R0 to provide the delay between flashes. The delay timing illustrates an important feature of the PRU: it is entirely deterministic. Most instructions takes 5 nanoseconds (at 200 MHz), although reads are slower. There is no pipelining, branch delays, memory paging, interrupts, scheduling, or anything else that interferes with instruction execution. This makes PRU execution predictable and suitable for real-time processing. (In contrast, the main ARM processor has all the scheduling and timing issues of Linux, making it unsuitable for real-time processing.)

Since the delay loop is two instructions long and executes 0xa00000 times (an arbitrary value that gave a visible delay), we can compute that each delay is 104.858 milliseconds, regardless of what else the processor is doing. (This is shown in the oscilloscope trace below.) A similar loop on the ARM processor would have variable timing, depending on system load.

LED output from the BeagleBone PRU demo, showing the 104ms oscillations.

The I/O pin

How do we know that pin P8_11 is controlled by bit 15 of R30? We're using one of the PRU output pins called "pr1_pru0_pru_r30_15", which is a PRU 0 output pin controlled by R30 bit 15. (Note that the PRU GPIO pins are separate from the "regular" GPIO pins.)[3] The BeagleBone header chart shows that this PRU I/O pin is connected to BeagleBone header pin P8_11.[4] To tell the BeagleBone we want to use this pin with the PRU, the device tree configuration is updated as discussed below.

Communication between the main processor and the PRU

The PRU microcontrollers are part of the processor chip, yet operate independently. In the blink demo, the main processor runs a loader program that downloads the PRU code to the microcontroller, signals the PRU to start executing, and then waits for the PRU to finish. But what happens behind the scenes to make this work? The PRU Linux Application Library (PRUSSDRV) provides the API between code running under Linux on the BeagleBone and the PRU.[5]

The prussdrv_exec_program() API call loads a binary program (in our case, the blink program) into the PRU so it can be executed. To understand how this works requires a bit of background on how memory is set up.

Each PRU microcontroller has 8K (yes, just 8K) of instruction memory, starting at address 0. Each PRU also has 8K of data memory, starting at address 0 (see TRM 4.3).[6] Since the main processor can't access all these memories at address 0, they are mapped into different parts of the main processor's memory as defined by the "global memory map".[7] For instance, PRU0's instruction memory is accessed by the host starting at physical address 0x4a334000. Thus, to load the binary code into the PRU, the API code copies the PRU binary file to that address.[8]

Then, to start execution, the API code starts the PRU running by setting the enable bit in the PRU's control register, which the host can access at a specific physical address (see TRM 4.5.1.1).

To summarize, the loader program running on the main processor loads the executable file into the PRU by writing it to special memory addresses. It then starts up the PRU by writing to another special memory address that is the PRU's control register.

The Userspace I/O (UIO) driver

At this point, you may wonder how the loader program can access to physical memory and low-level chip registers - usually is not possible from a user-level program. This is accomplished through UIO, the Linux Userspace I/O driver (details). The idea behind a UIO driver is that many devices (the PRU in our case) are controlled through a set of registers. Instead of writing a complex kernel device driver to control these registers, simply expose the registers to user code, and then user-level code can access the registers to control the device. The advantage of UIO is that user-level code is easier to write and debug than kernel code. The disadvantage is there's no control over device usage—buggy or malicious user-level code can easily mess up the device operation.[9]

For the PRU, the uio_pruss driver provides UIO access. This driver exposes the physical memory associated with the PRU as a device /dev/uio0. The user-level PRUSSDRV API library code does a mmap on this device to map the address space into its own memory. This gives the loader code access to the PRU memory through addresses in its own address space. The uio_pruss driver exposes information on the mapping to the user-level code through the directory /sys/class/uio/uio0/maps.

Another photo of the BeagleBone Black single-board computer inside an Altoids mint tin. At this point in the article, you may need a break from the text.

Device Tree

How does the uio_pruss driver know the address of the PRU? This is configured through the device tree, a complicated set of configuration files used to configure the BeagleBone hardware. I won't go into the whole explanation of device trees, but just the parts relevant to our story.[10] The pruss entry in the device tree is defined in am33xx.dtsi. Important excerpts are:

pruss: pruss@4a300000 {
 compatible = "ti,pruss-v2";
 reg = <0x4a300000 0x080000>;
 status = "disabled";
 interrupts = <20 21 22 23 24 25 26 27>;
};

The address 4a300000 in the device tree corresponds to the PRU's instruction/data/control space in the processor's address space. (See TRM Table 2-4 and section 4.3.2.) This address is provided to the uio_pruss driver so it knows what memory to access.

The "ti,pruss-v2" compatible line causes the matching uio_pruss driver to be loaded. Note that the status is "disabled", so the PRU is disabled by default until enabled in a device tree overlay.

The "interrupts" line specifies the eight ARM interrupts that are handled by the driver; these will be discussed in the Interrupts section below.

Device tree overlay

To enable the PRU and the GPIO output pin, we needed to load the PRU-GPIO-EXAMPLE-00A0.dts device tree overlay file.[11] This file is discussed in detail at credentiality, so I'll just give some highlights.

As described earlier, each header pin has eight different functions, so we need to tell the pin multiplexer driver which of the eight modes to use for the pin P8_11. Looking at the P8 header diagram, pin P8_11 has internal address 0x34 and has the PRU output function in mode 6. This is expressed in the overlay file:

exclusive-use = "P8.11", "pru0";
...
    target = <&am33xx_pinmux>;
...
       pinctrl-single,pins = <
         0x34 0x06

The above entry extends the am33xx_pinmux device tree description and is passed to the pinctrl-single driver, which configures the pins as desired by updating the appropriate chip control registers.

The device tree file also overrides the previous pruss entry, changing the status to "okay", which enables the uio_pruss device driver:

    target = <&pruss>;
    __overlay__ {
      status = "okay";

The cape manager

This device tree overlay was loaded with the command "echo PRU-GPIO-EXAMPLE > /sys/devices/bone_capemgr.?/slots". What does this actually do?

The idea of the cape manager is if you plug a board (called a cape) onto the BeagleBone, the cape manager reads an EEPROM on the board, figures out what resources the board needs, and automatically loads the appropriate device tree fragments (from /lib/firmware) to configure the board (source, details, documentation). The cape manager is configured in the device tree file am335x-bone-common.dtsi.

Examples of BeagleBone capes. Image from beaglebone.org, CC BY-SA 3.0.

In our case, we aren't installing a new board, but just want to manually install a new device tree file. The cape manager exposes a bunch of files in /sys/devices/bone_capemgr.*. Sending a name to the file "slots" causes the cape manager to load the corresponding device tree file. This enables the PRU and the desired output pin.

Interrupts

Interrupts can be used to communicate between the PRU code and the host code. At the end of the PRU blink code, it sends an interrupt to the host by writing 35 to register R31. This wakes up the loader program, which is waiting with the API call "prussdrv_pru_wait_event(PRU_EVTOUT_0)". This process seems like it should be straightforward, but it's surprisingly complex. In this section, I'll explain how PRU interrupts work.

I'll start with the host (loader program) side. The PRU can send eight different events to the host program. The prussdrv_pru_wait_event(N) call waits for event N (0 through 7) by reading from /dev/uioN. This read will block until an interrupt happens, and then return. (This seems like a strange way to receive interrupts, but using the file system fits the Unix philosophy.)[12]

If you look in the file system, there are 8 uio files: /dev/uio0 through /dev/uio7. You might expect /dev/uio1 provides access to PRU1, but that's not the case. Instead, there are 8 event channels between the PRUs and the main processor, allowing 8 different types of interrupts. Each one of these has a separate /dev/uioN file; reading from the file waits for the corresponding event, labeled PRU_EVTOUT0 through PRU_EVTOUT7.

Jumping now to the PRU side, the interrupt is triggered by a write to R31. Register R31 is a special register - writing to it sends an event to the interrupt system.[13] The blink code generates PRU internal event 3 (with the crazy name pr1_pru_mst_intr[3]_intr_req), which is mapped to system event 19.

The following diagram shows how an event (right) gets mapped to a channel, then a host, and finally generates an interrupt (left). In our case, system event 19 from the PRU is given the label PRU0_ARM_INTERRUPT. This interrupt is mapped to channel 2, then host 2 which goes to ARM interrupt PRU_EVTOUT0, which is ARM interrupt 20 (see TRM 6.3). The device tree configured the pruss_uio driver to handle events 20 through 27, so the driver receives the interrupt and unblocks /dev/uio0, informing the host process of the PRU event.

Interrupt handling on the BeagleBone between the PRU microcontrollers and the ARM processor. System events (right) are mapped to channels, then hosts, finally generating interrupts (left).

How does the above complex mapping get set up? You might guess the device tree, but it's done by the PRUSSDRV user-level library code. The interrupt configuration is defined in PRUSS_INTC_INITDATA, a struct that defines the various mappings shown above. This struct is passed to prussdrv_pruintc_init(), which writes to the appropriate PRU interrupt registers (mapped into memory).[14]

Next steps

A real application would use a host program much more powerful than the example loader. For instance, the host program could run a web server and actively communicate with the PRU. Or the host program could do computation on data received from the PRU. Such a system uses the ARM processor to get all the advantages of Linux, and the PRU microcontrollers to perform real-time activities impossible in Linux.

It's easier to program the PRU in C using an IDE. That's a topic for another post, but for now you can look here, here, here or here for information on using the C compiler.

Conclusion

The PRU microcontrollers give the BeagleBone real-time, deterministic processing. Combining the BeagleBone's full Linux system with the PRU microcontrollers yields a very powerful system. The Linux side can provide powerful processing, network connectivity, web servers, and so forth, while the PRUs can interface with the external world. If you're using an Arduino but want more (Ethernet, web connectivity, USB, WiFi), the BeagleBone is an alternative to consider.

There is a significant learning curve to using the PRU microcontrollers, however. Much of what happens may seem like magic, hidden behind APIs and device tree files. Hopefully this article has filled in the gaps, explaining what happens behind the scene.

Notes and references

[1] If you get the error "prussdrv_open() failed" the problem is the PRU didn't get enabled. Make sure you loaded the device tree file into the cape manager.

[2] One of the instructions in the PRUs instruction set is LBBO (load byte burst), which reads a block of memory into the register file. That sounded to me like an obscure instruction I'd never need, but it turns out that this instruction is used for any memory reads. So don't make my mistake of ignoring this instruction!

[3] To understand the pin name "pr1_pru0_pru_r30_15": pru0 indicates the pin is controlled by PRU 0 (rather than PRU 1), r30 indicates the pin is an output pin, controlled by the output register R30 (as opposed to the input register R31), and 15 indicates I/O pin 15, which is controlled by bit 15 of the register. A mnemonic to remember that R30 is the output register and R31 is the input register: 0 looks like O for output, 1 looks like I for input.

[4] Most PRU pins conflict with HDMI or another subsystem, so you need to disable HDMI to use them. To avoid this problem, I picked one of the few non-conflicting PRU pins. Disabling HDMI is described here and here. See this chart for a list of available PRU pins and potential conflicts.

[5] The PRU library is documented in PRU Linux Application Loader API Guide. The library is often installed on the BeagleBone by default but can also be downloaded in the am335x_pru_package along with PRU documentation and some coding examples. Library source code is here.

[6] The PRU, like many microcontrollers, uses a Harvard architecture, with independent memory for code and data.

[7] In the main processor's address space, PRU0's data RAM starts at address 0x4a300000, PRU1's data RAM starts at 0x4a302000, PRU0's instruction RAM starts at 0x4a334000, and PRU1's instruction RAM starts at 0x4a338000. (See TRM Table 2-4 and section 4.3.2.) The PRU control registers also have addresses in the main processor's physical address space; PRU0's control registers start at address 0x4a322000 and PRU1's at 0x4a324000.

[8] The API uses prussdrv_pru_write_memory() to copy the contents into the memory-mapped region for PRU's instruction memory. The prussdrv_load_datafile API call is used to load a file into the PRU's data memory.

[9] Note that using Userspace I/O to control the PRU is the exact opposite philosophy of how the regular GPIO pins on the BeagleBone are controlled. Most of the BeagleBone's I/O pins are controlled through complex device drivers that provide access through simple file system interfaces. This is easy to use: just write 1 to /sys/class/gpio/gpio60/value to turn on GPIO 60. On the other hand, the BeagleBone gives you raw access to the PRU control registers - you can do whatever you want, but you'll need to figure out how to do it yourself. It's a bit strange that two parts of the BeagleBone have such wildly different philosophies.

[10] Device trees are discussed in more detail in my previous BeagleBone article. Also see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

[11] Why is the example device tree overlay (and most device tree files) labeled with version 00A0? I found out (thanks to Robert Nelson) that A0 is a common initial version for contract hardware. So 00A0 is the first version of something, followed by 00A1, etc.

[12] The blink example only sends events from the PRU to the host, but what about the other direction? The host can send an event (e.g. ARM_PRU0_INTERRUPT, 21) to a PRU using prussdrv_pru_send_event(). This writes the PRU's SRSR0 (System Event Status Raw/Set Register, TRM 4.5.3.13) to generate the event. The interrupt diagram shows this event flows to R31 bit 30. The PRUs don't receive interrupts as such, but must poll the top two bits of R31 to see if an interrupt signal has arrived.

[13] The PRU has 64 system events for all the different things that can trigger interrupts, such as timers or I/O (see TRM 4.4.2.2). When generating an event with R31, bit 5 triggers an event, while the lower 4 bits select the event number (see TRM 4.4.1.2). PRU system event numbers are offset by 16 from the internal event numbers, so internal event 3 is system event 19. See PRU interrupts for the full list of events.

[14] Multiple PRU registers are set up to initialize interrupt handling. The CMR (channel map registers, TRM 4.5.3.20) map from each system event to a channel. The HMR (host interrupt map registers, TRM 4.5.3.36) map from a channel to a host interrupt. The PRU interrupt registers (INTC) start at address 0x4a320000 (TRM Table 4-8).

The BeagleBone's I/O pins: inside the software stack that makes them work

The BeagleBone is a inexpensive, credit-card sized computer with many I/O pins. These pins can be easily controlled from software, but it can be very mysterious what is really happening. To control a general purpose input/output (GPIO) pin, you simply write a character to a special file and the pin turns on or off. But how do these files control the hardware pins? In this article, I dig into the device drivers and hardware and explain exactly what happens behind the scenes. Warning: this post uses the 3.8.13-bone79 kernel; many things have changed since then. (Various web pages describe the GPIO pins, but if you just want a practical guide of how to use the GPIO pins, I recommend the detailed and informative book Exploring BeagleBone.)

This article focuses on the BeagleBone Black, the popular new member of the BeagleBoard family. If you're familiar with the Arduino, the BeagleBone is much more complex; while the Arduino is a microcontroller, the BeagleBone is a full computer running Linux. If you need more than an Arduino can easily provide (more processing, Ethernet, WiFi), the BeagleBone may be a good choice.

Beaglebone Black single-board computer. Photo by Gareth Halfacree, CC BY-SA 2.0

The BeagleBone uses the Sitara AM3358 processor chip running at 1 GHz - this is the thumbnail-sized chip in the center of the board above. This chip is surprisingly complicated; it appears they threw every feature someone might want into the chip. The diagram below shows what is inside the chip: it includes a 32-bit ARM processor, 64KB of memory, a real-time clock, USB, Ethernet, an LCD controller, a 3D graphics engine, digital audio, SD card support, various networks, I/O pins, an analog-digital converter, security hardware, a touch screen controller, and much more.[1] To support real-time applications, the Sitara chip also includes two separate 32-bit microcontrollers (on the chip itself - processors within processors!).

Functional diagram of the complex processor powering the BeagleBone Black. The TI AM3358 Sitara processor contains many functional units. Diagram from Texas Instruments.

The main document that describes the Sitara chip is the Technical Reference Manual, which I will call the TRM for short. This is a 5041 page document describing all the feature of the Sitara architecture and how to control them. But for specifics on the AM3358 chip used in the BeagleBone, you also need to check the 253 page datasheet. I've gone through these documents, so you won't need to, but I reference the relevant sections in case you want more details.

Using a GPIO pin

The chip's pins can be accessed through two BeagleBone connectors, called P8 and P9. To motivate the discussion, I'll use an LED connected to GPIO 49, which is on pin 23 of header P9 (i.e. P9_23). (How do you know this pin is GPIO 49? I explain that below.) Note that the Sitara chip's outputs can provide much less current than the Arduino, so be careful not to overload the chip; I used a 1K resistor.

To output a signal to this GPIO pin, simply write some strings to some files:

# echo 49 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio49/direction
# echo 1 > /sys/class/gpio/gpio49/value

The first line causes a directory for gpio49 to be created. The next line sets the pin mode as output. The third line turns the pin on; writing 0 will turn the pin off.

This may seem like a strange way to control the GPIO pins, using special file system entries. It's also strange that strings ("49", "out", and "1") are used to control the pin. However, the file-based approach fits with the Unix model, is straightforward to use from any language, avoids complex APIs, and hides the complexity of the chip.

The file-system approach is very slow, capable of toggling a GPIO pin at about 160 kHz, which is rather awful performance for a 1 GHz processor.[2] In addition, these oscillations may be interrupted for several milliseconds if the CPU is being used for other tasks, as you can see from the image below. This is because file system operations have a huge amount of overhead and context switches before anything gets done. In addition, Linux is not a real-time operating system. While the processor is doing something else, the CPU won't be toggling the pins.

An I/O pin can be toggled to form an oscillator. Unfortunately, oscillations can stop for several milliseconds if the processor is doing something else.

The moral here is that controlling a pin from Linux is fine if you can handle delays of a few milliseconds. If you want to produce an oscillation, don't manually toggle the pin, use a PWM (pulse width modulator) output instead. If you need more complex real-time outputs, use one of the microcontrollers (called a PRU) inside the processor chip.

What's happening internally with GPIOs?

Next, I'll jump to the low level, explaining what happens inside the chip to control the GPIO pins. A GPIO pin is turned on or off when a chip register at a particular address is updated. To access memory directly, you can use devmem2, a simple program that allows a memory register to be read or written. Using the devmem2 program, we can turn the GPIO pin first on and then off by writing a value to the appropriate register (assuming the pin has been initialized). The first number is the address, and the second number is value written to the address.

devmem2 0x4804C194 w 0x00020000
devmem2 0x4804C190 w 0x00020000

What are all these mystery numbers? GPIO 49 is also known as GPIO1_17, the 17th pin in the GPIO1 bank (as will be explained later). The value written to the register, 0x00020000, is a 1 bit shifted left 17 positions (1<<17) to control pin 17. The address 4804C194 is the address of the register used to turn on a GPIO pin. Specifically it is SETDATAOUT register for the 32 GPIO1 pins. By writing a bit to this register, the corresponding pin is turned on. Similarly, the address 4804C190 is the address of the CLEARDATAOUT register, which turns a pin off.

How do you determine these register addresses? It's all defined in the TRM document if you know where to look. The address 4804C000 is the starting address of GPIO1's registers (see Table 2-3 in the TRM). The offset 194h is the offset of the GPIO_SETDATAOUT register, and 190h is the GPIO_CLEARDATAOUT register (see TRM 25.4.1). Combining these, we get the address 4804C194 for GPIO1's SETDATAOUT and 4804C190 for CLEARDATAOUT. Writing a 1 bit to the SETDATAOUT register will set that GPIO, while leaving others unchanged. Likewise, writing a 1 bit to the CLEARDATOUT register will clear that GPIO. (Note that this design allows the selected register to be modified without risk of modifying other GPIO values.)

Putting this all together, writing the correct bit pattern to the address 0x4804C194 or 0x4804C190 turns the GPIO pin on or off. Even if you use the filesystem API to control the pin, these registers are what end up controlling the pin.

How does devmem2 work? It uses Linux's /dev/mem, which is a device file that is an image of physical memory. devmem2 maps the relevant 4K page of /dev/mem into the process's address space and then reads or writes the address corresponding to the desired physical address.

By using mmap and toggling the register directly from a C++ program, the GPIO can be toggled at about 2.8 MHz, much faster than the device driver approach.[3] If you think this approach will solve your problems, think again. As before, there are jitter and multi-millisecond dropouts if there is any other load on the system.

The names of pins

Confusingly, each pin has multiple names and numbers. For instance, pin 23 on header 9 is GPIO1_17, which is GPIO 49, which is pin entry 17 (unrelated to GPIO1_17) which is pin V14 on the chip itself. Meanwhile, the pin has 7 other functions including GMPC_A1, which is the name used for the pin in the documentation. This section explains the different names and where they come form. (If you just want to know the pin names, see the diagrams P8 header and P9 header from Exploring Beaglebone.)

The first complication is that each pin has eight different functions since the processor has many more internal functions than physical pins.[4] (The Sitara chip has about 639 I/O signals, but only 124 output pins available for these signals.) The solution is that each physical pin has eight different internal functions mapped to it. For each pin, you select a mode 0 through 7, which selects which of the eight possible functions you want.

Figuring out "from scratch" which functions are on each BeagleBone header pin is tricky and involves multiple documents. To start, you need to determine what functions are on each physical pin (actually ball) of the chip. Table 4-1 in the chip datasheet shows which signals are associated with each physical pin (see the "ZCZ ball number"). (Or see section 4.3 for the reverse mapping, from the signal to the pin.) Then look at the BeagleBone schematic to see the connection from each pin on the chip (page 3) to an external connector (page 11).

For instance, consider the GPIO pin used earlier. With the ZCZ chip package, chip ball V14 is called GPMC_A1 (in mode 0 this pin is used for the General Purpose Memory Controller). The Beaglebone documentation names the pin with the the more useful mode 7 name "GPIO1_17", indicating GPIO 17 in GPIO bank 1. (Each GPIO bank (0-3) has 32 pins. Bank 0 is numbered 0-31, bank 1 is 32-63, and so forth. So pin 17 in bank 1 is GPIO number 32+17=49.) The schematic shows this chip ball is connected to header P9 pin 23, earning it the name P9_23. This name is relevant when connecting wires to the board. It is also the name used in BeagleScript (discussed later).

Another pin identifier is the sequential pin number based on the the pin control registers. (This number is indicated under the column $PINS in the header diagrams.) Table 9-10 in the TRM lists all the pins in order along with their control address offsets. Inconveniently, the pin names are the mode 0 names (e.g. conf_gpmc_a1), not the more relevant names (e.g. gpio1_17). From this table, we can determine that the offset of con_gpmc_a1 is 844h. Counting the entries (or computing (844h-800h)/4) shows that this is pin 17 in the register list. (It's entirely coincidental that this number 17 matches GPIO1_17). This number is necessary when using the pin multiplexer (described later).

How writing to a file toggles a GPIO

When you write a string to /sys/class/gpio/gpio49/value, how does that end up modifying the GPIO? Now that some background has been presented, this can be explained. At the hardware level, toggling a GPIO pin simply involves setting a bit in a control register, as explained earlier. But it takes thousands of instructions in many layers to get from writing to the file to updating the register and updating the GPIO pin.

The write goes through the standard Linux file system code (user library, system call code, virtual file system layer) and ends up in the sysfs virtual file system. Sysfs is an in-memory file system that exposes kernel objects through virtual files. Sysfs will dispatch the write to the gpio device driver, which processes the request and updates the appropriate GPIO control register.

In more detail, the /sys/class/gpio filesystem entries are provided by the gpio device driver (documentation for sysfs and gpio). The main gpio driver code is gpiolib.c. There are separate drivers for many different types of GPIO chips; the Sitara chip (and the related OMAP family) uses gpio-omap.c. The Linux code is in C, but you can think of gpiolib.c as the superclass, with subclasses for each different chip. C function pointers are used to access the different "methods".

The gpiolib.c code informs sysfs of the various files to control GPIO pin attributes (/active_low, /direction/, /edge, /value), causing them to appear in the file system for each GPIO pin. Writes to the /value file are linked to the function gpio_value_store, which parses the string value, does error checking and calls gpio_set_value_cansleep, which calls chip->set(). This function pointer is where handling switches from the generic gpiolib.c code to the device-specific gpio-omap.c code. It calls gpio_set, which calls _set_gpio_dataout_reg, which determines the register and bit to set. This calls raw_writel, which is inline ARM assembler code to do a STR (store) instruction. This is the point at which the GPIO control register actually gets updated, changing the GPIO pin value.[5] The key point is that a lot of code runs to get from the file system operation to the actual register modification.

How does the code know the addresses of the registers to update? The file gpio-omah.h contains constants for the GPIO register offsets. Note that these are the same values we used earlier when manually updating the registers.

#define OMAP4_GPIO_CLEARDATAOUT  0x0190
#define OMAP4_GPIO_SETDATAOUT  0x0194

But how does the code know that the registers start at 0x4804C000? And how does the system know this is the right device driver to use? These things are specified, not in the code, but in a complex set of configuration files known as Device Trees, explained next.

Device Trees

How does the Linux kernel know what features are on the BeagleBone? How does it know what each pin does? How does it know what device drivers to use and where the registers are located? The BeagleBone uses a Linux feature called the Device Tree, where the hardware configuration is defined in a device tree file.

Linux used to define the hardware configuration in the kernel. But each new ARM chip variant required inconvenient kernel changes, which led to Linus Torvald's epic rant on the situation. The solution was to move this configuration out of kernel code and into files known as the Device Tree, which specifies the hardware associated with the device. This switch, in the 3.8 kernel, is described here.

As if device trees weren't complex enough, the next problem was that BeagleBone users wanted to change pin configuration dynamically. The solution to this was device tree overlays, which allow device tree files to modify other device tree files. With a device tree overlay, the base hardware configuration can be modified by configuration in the overlay file. Then the Capemgr was implemented to dynamically load device tree fragments, to manage the installation of BeagleBoard daughter cards (known as capes).

I won't go into the whole explanation of device trees, but just the parts relevant to our story. For more information see A Tutorial on the Device Tree, Device Tree for Dummies, or Introduction to the BeagleBone Black Device Tree.

The BeagleBone's device tree is defined starting with am335x-boneblack.dts, which includes am33xx.dtsi and am335x-bone-common.dtsi.

The relevant lines are in am33xx.dtsi:

gpio1: gpio@44e07000 {
  compatible = "ti,omap4-gpio";
  ti,hwmods = "gpio1";
  gpio-controller;
  interrupt-controller;
  reg = <0x44e07000 0x1000>;
  interrupts = <96>;
};

The "compatible" line is very important as it causes the correct device driver to be loaded. While processing a device tree file, the kernel checks all the device drivers to find one with a matching "compatible" line. In this case, the winning device driver is gpio-omap.c, which we looked at earlier.

The other lines of interest specify the register base address 44e07000. This is how the kernel knows where to find the necessary registers for the chip. Thus, the Device Tree is the "glue" that lets the device drivers in the kernel know the specific details of the registers on the processor chip.

BoneScript

One easy way to control the BeagleBone pins is by using JavaScript along with BoneScript, a Node.js library. The BoneScript API is based on the Arduino model. For instance, the following code will turn on the GPIO on pin 23 of header P9.

var b = require('bonescript');
b.pinMode('P9_23', b.OUTPUT);
b.digitalWrite('P9_23', b.HIGH);

Using BoneScript is very slow: you can toggle a pin at about 370 Hz, with a lot of jitter and the occasional multi-millisecond gap. But for programs that don't require high speed, BoneScript provides a convenient programming model, especially for applications with web interaction.

For the most part, the BoneScript library works by reading and writing the file system pseudo-devices described earlier. You might expect BoneScript to have some simple code to convert the method calls to the appropriate file system operations, but BoneScript is surprisingly complex. The first problem is BoneScript supports different kernel versions with different cape managers and pin multiplexers, so it implements everything four different ways (source).

A bit surprise is that BoneScript generates and installs new device tree files as a program is running. In particular, for the common 3.8.13 kernel, BoneScript creates a new device tree overlay (e.g. /lib/firmware/bspm_P9_23_2f-00A0.dts) on the fly from a template, runs it through the dtc compiler and installs the device tree overlay through the cape manager by writing to /sys/devices/bone_capemgr.N/slots (source). That's right, when you do a pinMode() operation in the code, BoneScript runs a compiler!

Conclusion

The BeagleBone's GPIO pins can be easily controlled through the file system, but a lot goes on behind the scenes, making it very mysterious what is actually happening. Examining the documentation and the device drivers reveals how these file system writes affect the pins by writing to various control registers.[6] Hopefully after reading this article, the internal operation of the Beaglebone will be less mysterious.

Notes and references

[1] Many of the modules of the Sitara chip have cryptic names. A brief explanation of some of them, along with where they are described in the TRM:

PRU-ICSS (Programmable Real-Time Unit / Industrial Communication Subsystem, chapter 4): this is the two real-time microcontrollers included in the chip. They contain their own modules.
Industrial Ethernet is an extension of Ethernet protocols for industrial environments that require real-time, predictable communication. The Industrial Ethernet module provides timers and I/O lines that can be used to implement these protocols.
SGX (chapter 5) is a 2D/3D graphics accelerator.
GPMC is the general-purpose memory controller. OCMC is the on-chip memory controller, providing access to 64K of on-chip RAM. EMIF is the external memory interface. It is used on the BeagleBone to access the DDR memory. ELM (error location module) is used with flash memory to detect errors. Memory is discussed in chapter 7 of the TRM.
L1, L2, L3, L4: The processor has L1 instruction and data caches, and a larger L2 cache. The L3 interconnect provides high-speed communication between many of the modules of the processor using a "NoC" (Network on Chip) infrastructure. The slower L4 interconnect is used for communication with low-bandwidth modules on the chip. (See chapter 10 of the TRM.)
EDMA (enhanced direct memory access, chapter 11) provides transfers between external memory and internal modules.
TS_ADC (touchscreen, analog/digital converter, chapter 12) is a general-purpose analog to digital converter subsystem that includes support for resistive touchscreens. This module is used on the BeagleBone for the analog inputs.
LCD controller (chapter 13) provides support for LCD screens (up to 2K by 2K pixels). This module provides many signals to the TDA19988 HDMI chip that generates the BeagleBone's HDMI output.
EMAC (Ethernet Media Access Controller, chapter 14) is the Ethernet subsystem. RMII (reduced media independent interface) is the Ethernet interface used by the BeagleBone. MDIO (management data I/O) provides control signals for the Ethernet interface. GMII and RGMII are similar for gigabit Ethernet (unused on the BeagleBone).
The PWMSS (Pulse-width modulation subsystem, chapter 15) contains three modules. The eHRPWM (enhanced high resolution pulse width modulator) generates PWM signals, digital pulse trains with selectable duty cycle. These are useful for LED brightness or motor speed, for instance. These are sometimes called analog outputs, although technically they are not. This subsystem also includes the eCAP (enhanced capture) module which measures the time of incoming pulses. eQEP (Enhanced quadrature encoder pulse) module is a surprisingly complex module to process optical shaft encoder inputs (e.g. an optical encoder disk in a mouse) to determine its rotation.
MMC (multimedia card, chapter 18) provides the SD card interface on the BeagleBone.
UART (Universal Asynhronous Receiver/Transmitter, chapter 19) handles serial communication.
I2C (chapter 21) is a serial bus used for communication with devices that support this protocol.
The McASP (Multichannel audio serial port, chapter 22) provides digital audio input and output.
CAN (controller area network, chapter 23) is a bus used for communication on vehicles.
McSPI (Multichannel serial port interface, chapter 24) is used for serial communication with devices supporting the SPI protocol.

[2] The following C++ code uses the file system to switch GPIO 49 on and off. Remove the usleeps for maximum speed. Note: this code omits pin initialization; you must manually do "echo 49 > /sys/class/gpio/export" and "echo out > /sys/class/gpio/gpio49/direction".

#include <unistd.h>
#include <fcntl.h>

int main() {
  int fd =  open("/sys/class/gpio/gpio49/value", O_WRONLY);
  while (1) {
    write(fd, "1", 1);
    usleep(100000);
    write(fd, "0", 1);
    usleep(100000);
  }
}

[3] Here's a program based on devmem2 to toggle the GPIO pin by accessing the register directly. The usleeps are delays to make the flashing visible; remove them for maximum speed. For simplicity, this program does not set the pin directions; you must do that manually.

#include <fcntl.h>
#include <sys/mman.h>

int main() {
    int fd = open("/dev/mem", O_RDWR | O_SYNC);
    void *map_base = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_SHARED,
            fd, 0x4804C000);
    while (1) {
        *(unsigned long *)(map_base + 0x194) = 0x00020000;
        usleep(100000);
        *(unsigned long *)(map_base + 0x190) = 0x00020000;
        usleep(100000);
    }
}

For more on this approach, see BeagleBone Black GPIO through /dev/mem.

[4] Selecting one of the eight functions for each pin is done through the pin multiplexer (pinmux) which uses the pinctrl-single device driver. Pin usage is defined in the device tree. The details of this are complex and change from kernel to kernel. See GPIOs on the Beaglebone Black using the Device Tree Overlays or BeagleBone and the 3.8 Kernel for details.

[5] The function names tend to change in different kernel versions. I'm describing the Linux 3.8 kernel code.

[6] I've discussed just the GPIO pins, but other pins (LED, PWM, timer, etc) have similar file system entries, device drivers, and device tree entries.