As soon as we finished repairing a printer failure at the Computer History Museum, Murphy's law struck and the card reader started malfunctioning. The printer and card reader were attached to an IBM 1401, a business computer that was announced in 1959 and went on to become the best-selling computer of the mid-1960s. In that era, data records were punched onto 80-column punch cards and then loaded into the computer by the card reader, which read cards at the remarkable speed of 13 cards per second. This blog post describes how we debugged the card reader problem, eventually tracking down and replacing a faulty electromechanical relay inside the card reader.
The card reader malfunction started happening every time the "Non-Process Run-Out" mechanism (NPRO) was used. During normal use, if the card reader stopped in the middle of processing, unread cards could remain inside the reader. To remove them, the operator would press the "Non-Process Run-Out" switch, which would run the remaining cards through the reader without processing them. Normally, the "Reader Stop" light (below) would illuminate after performing an NPRO. The problem was the "Reader Check" light also came on, indicating an error in the card reader. Since there was no actual error and the light could be cleared simply by pressing the "Check Reset" button, this problem wasn't serious, but we still wanted to fix it.
To track down the problem, the 1401 restoration team started by probing the card reader error circuitry inside the computer with an oscilloscope. (The card reader itself is essentially electromechanical; the logic circuitry is all inside the computer.) The photo below shows the 1401 computer with a swing-out gate opened to access the circuitry. Finding a circuit inside the IBM 1401 is made possible by the binders full of documentation showing the location and wiring of all the computers circuitry. This documentation was computer generated (originally by a vacuum tube IBM 704 or 705 mainframe), so the diagrams were called Automated Logic Diagrams (ALD).
The reader check error condition is stored in a latch circuit; the latch is set when an error signal comes in, and cleared when you press the reset button. To find this circuit we turned to the ALD page that documented the card reader's error checking circuitry. The diagram below is a small part of this ALD page, showing the read check latch of interest: "READ CHK LAT". Each basic circuit (such as a logic gate) is drawn on the ALD as a box, with lines showing how they are connected. Inside the box, cryptic text indicates what the circuit does and where it is inside the computer. For example, the latch (lower two boxes) is constructed from a circuit card of type CQZV (an inverter) and a CHWW card (a NAND gate). The text inside the box also specifies the location of each card (slots D21 and D24 in gate 01B4), allowing us to find the cards in the computer. Note that the RD REL CHK signal comes from ALD page 56.70.21.2; this will be important later.
The schematic below shows the latch redrawn with modern symbols. If the RESET line goes low, it will force the inverted output high. Otherwise, the output will cycle around through the gates, latching the value. If any OR input is high (indicating an error), it will force the latch low. Note the use of wired-OR—instead of using an OR gate, signals are simply wired together so if any signal is high it will pull the line high. Because transistors were expensive when the IBM 1401 was built, IBM used tricks like wired-OR to reduce the transistor count. Unfortunately, the wired-OR made it harder to determine which error input was triggering the latch because the signals were all tied together.
Once we located the circuit cards for the latch, we used an oscilloscope to verify that the latch itself was operating properly.
Next, we needed to determine why it was receiving an error input.
After disconnecting wires to get around the wired-OR, we found that the error was not coming from the Read/punch error or the Feed error signal.
The RD REL CHK was the obvious suspect; this signal was part of the optional Read Punch Release feature1.
However, the team insisted that Read Punch Release wasn't installed in our 1401.
The source of this signal was ALD 56.70.21.2 and
our documentation didn't include ALD section 56, confirming that this feature wasn't present in our system.
Additional oscilloscope tracing showed a lot of noise on some of the signals from the card reader. This wasn't unexpected since the card reader is built from electromechanical parts: relays, cam switches, brushes, motors, solenoids and other components that generate noise and voltage spikes. I considered the possibility that a noise spike was triggering the latch, but the noise wasn't reaching that circuit.
At this point, I was at a dead end, so I took another look at the RD REL CHK signal to see if maybe it did exist. The 1401's documentation includes "plug charts," diagrams that show what circuit card is plugged into each position in the computer. I looked at the plug chart for the card reader circuitry in gate 01B4 (swing-out gate, not logic gate). The plug chart (above) showed cards assigned to the mysterious ALD 56.70.21.2, such as the cards in slots A15-A17 and B15. (The plug chart also had numerous pencil updates, crossing out cards and adding new ones, which didn't give me a lot of confidence in its accuracy.) I looked inside the computer and found that these cards, the Read Punch Release cards generating RD REL CHK, were indeed installed in the computer. So somehow our computer did have this feature.
The problem was that even though these cards were present in the system, we didn't have the ALDs that included them, leaving us in the dark for debugging. I checked the second 1401 at the Computer History Museum; although it too had the cards for Read Punch Release, its documentation binders also mysteriously lacked the section 56 ALDs. Fortunately, back in 2006, the Australian Computer Museum Society sent us scans of the ALDs for their 1401 computer. I took a look and found that the Australian scans included the mysterious section 56. There was no guarantee that their 1401 had the same wiring as ours (since the design changed over time), but this was all I had to track down RD REL CHK.
According to the Australian ALD above, RD REL CHK was generated by the CGVV card in slot A17 (upper right box above). The oscilloscope trace below confirmed that this card was generating the RD REL CHK signal (yellow), but its input (RD BR IMP CB) (cyan) looked bad. Notice that the yellow line jumps up suddenly (as you'd expect from a logic signal), but the cyan line takes a long time to drop from the high level to the low level. Perhaps a weak transistor in a circuit was pulling the signal down slowly, or some other component had failed.
We looked into the RD BR IMP CB signal, short for "ReaD BRush IMPulse Circuit Breakers". In IBM terminology, a "circuit breaker" is a cam-operated switch, not a modern circuit breaker that trips when overloaded. The read brush circuit breakers generate timing pulses when the read brushes that detect holes in the punch card are aligned with a row of holes, telling the computer to read the hole pattern.
We looked up yet another ALD page to find the source of the strangely slow RD BR IMP CB signal. That signal originated in the card reader and then passed through an NGXX integrator card (above). Earlier I mentioned that the signals from the card reader were full of noise. This isn't a big problem inside the card reader since brief noise spikes won't affect relays. But once signals reach the computer, the noise must be eliminated. This is done in the 1401 by putting the signal through a resistor-capacitor low-pass filter, which IBM calls an "integrator". That card eliminates noise by making the signal change very slowly. In other words, although the signal on the oscilloscope looked strange, it was the expected behavior and not a problem. But why was there any signal there at all?
After some discussion, the team hypothesized that the pulses on RD BR IMP CB shouldn't be getting to the 1401 at all doing a Non-Process Run-Out, since cards aren't being read. The schematic4 for the card reader (above) shows the complex arrangement of cams and microswitches that generates the pulses. During an NPRO, relay #4 will be energized, opening the "READ STOP 4-4" relay contacts. This will stop the BRUSH IMP CB pulses from reaching the 1401.53 In other words, relay #4 should have blocked the pulses that we were seeing.
The card reader contains rows of relays; the reader's hardware is a generation older than the 1401 and it implements its basic control functions with relays rather than logic gates. Frank pulled out relay #4 and inspected it.6 The relay (below) has 6 sets of contacts, activated by an electromagnet coil (yellow) and held in position by a second coil. Springs help move the contacts to the correct positions. One of the springs appeared to be weak, preventing the relay from functioning properly.7 Frank put in a replacement relay and found that the card reader now performed Non-Process Run-Outs without any errors. We loaded a program from cards just to make sure the card reader still performed its main task, and that worked too. We had fixed the problem, just in time for lunch.
Conclusions
It is still a mystery why section 56 of the ALDs was missing from our documentation. As for the presence of the Read Punch Release feature on our 1401, that feature turns out to be standard on 1401 systems like the ones at the museum.8 I think the belief that our 1401 didn't include this feature resulted from confusion with the Punch Feed Read feature, which we don't have. (That feature allowed a card to be read and then punched with additional data as it passed through the card reader.)
The team that fixed this problem included Frank King, Alexey Toptygin, Ron Williams and Bill Flora. My previous blog post about fixing the 1402 card reader is here, tracking down an elusive problem with a misaligned cam.
I announce my latest blog posts on Twitter, so follow me at @kenshirriff for future articles. I also have an RSS feed. The Computer History Museum in Mountain View runs demonstrations of the IBM 1401 on Wednesdays and Saturdays so if you're in the area you should definitely check it out (schedule).
Notes and references
-
Read Punch Release was a feature to allow the CPU to operate while reading a card. IBM's large 7000 series mainframes used "data channels," which were high-performance I/O connections using DMA and controlled by separate I/O processors. With a data channel, the CPU could process data while the channel performed I/O. But the 1401 was a much simpler machine, designed to replace electromechanical accounting machines that would read a card, process the card, and print out results. On the 1401, when the CPU executed an instruction to read a card, the CPU would wait while the card moved through the card reader and passed under the brushes to be read. The mechanical cycle to read a card took 75 ms (corresponding to 800 cards per minute), of which only 10 ms was available for the CPU to perform computation and the rest was wasted (from the CPU's perspective). The Read Punch Release feature was a workaround for this. The programmer could issue an SRF (Start Read Feed) instruction, which would cause a card to start moving through the card reader. The program had 21 ms to perform computation and execute the read instruction before the card reached the reading station. (If the program executed the read instruction too late, the computer wouldn't be able to read the card and would halt with an error.) This provided extra computation time in each card read cycle. IBM charged a monthly fee for additional features; Read Punch Release was relatively inexpensive at $25 per month (equivalent to about $200 today). ↩
-
The Read Punch Release feature also provided a similar instruction for punching a card, allowing an extra 37 ms of computation while punching a card. See page 16 of the IBM 1402 Card Read-Punch Manual for details on card timing and the read punch release operation. ↩
-
The card reader schematic shows that six separate cams were required to generate the RD BR IMP CB signal. The problem is that cards are read at high speed, so rows on the card are read just 3.75 ms apart. Cams and microswitches are too slow to generate pulses at this rate. To get around this, pulses for odd rows and even rows are generated separately. In addition, one set of switches closes for the start of a pulse and a second set opens for the end of a pulse. Needless to say, it is a pain to adjust all these cams so the pulses have the right timing and duration. If this timing is off, cards won't read correctly.
To improve reliability and reduce maintenance, IBM eventually replaced these cams with a "solar cell" (i.e. a photo-cell), slotted disk, and light. The light passing through the slotted disk triggered a pulse from the photo-cell. Our ALDs had some penciled-in modifications suggesting that our 1401 was originally configured to work with a solar cell card reader and then modified to work with the older circuit breaker card reader. ↩
-
The schematic for the 1402 card reader is here. The read brush impulse CB signal is generated on pdf page 8. This document also includes instructions on how to upgrade from the "circuit breaker" circuit to the "solar cell" circuit, a change that is indicated as taking 1.0 to 1.5 hours for the hardware installation, 1.5 to 2.0 hours for miscellaneous electrical changes, and 2.3 to 3.7 hours to wire up the new circuit. (See pdf pages 47-53.) ↩
-
The relays involved in an NPRO operation are documented in 1402 Card Read-Punch Customer Engineering Manual of Instruction, page 4-3 or pdf page 31. ↩
-
The relay is a "permissive make" relay, a type of relay that IBM designed to be twice as fast as regular relays. For a detailed discussion of IBM's relays, see Commutation and Control. The permissive make relay is discussed on page 59 (pdf page 18). ↩
-
Stan Paddock on the 1401 team built a relay tester that we could use to check the bad relay. Unfortunately, the 1401 workshop at the Computer History Museum is closed due to construction so we couldn't access the tester (or the collection of spare relays in the workshop). Fortunately, we had a spare relay that wasn't in the workshop for some reason. ↩
-
The IBM Sales Manual lists the various 1401 features and their prices (pdf page 50). It states the Read Punch Release feature is standard on the 1401 Model C, the "729 Tape/Card System". ↩
Fascinating. I did a lot of 1401 progranning at the UCLA Computing Facility in Boelter Hall in the early 1960's.
ReplyDeletePrograms I remember:
A 1401 assembler that has 4 passes, wrote intermediate output to tape, and was many times faster than the IBM disk-based assembler. It ended up replacing the IBM assembler for most users because of its speed advantage.
An assembler for the SWAC computer (see https://en.wikipedia.org/wiki/SWAC_(computer)). For a year starting in August, 1950, this was the fastest computer in the world. It was used until 1967. A tricky thing about this assembler is that the SWAC read cards in row binary and the 1401 punched cards in column binary, and the 1401 was not easy to do bit-fiddling with.
A multiple-precision floating point package. This was made easier by the 1401's arbitrary-precision integer operations.
A typewriter simulator which read punched cards and printed them to the 1403 printer one character at a time. Not useful, but fun.
There were probably others, but over fifty years is a long time.
Definitely a fun machine.
The biggest loss when card readers became obsolete were the fun girls in the punchroom.
ReplyDelete