Ken Shirriff's blog

TV-B-Gone for the Arduino

I've ported the TV-B-Gone code to run on the Arduino board. If you haven't seen a TV-B-Gone, it's a cute gadget that you point at a TV that's bothering you, and it turns the TV off. Internally, it's an infrared remote that broadcasts more than 100 different off codes that work on almost any TV. I figured it would be interesting to get the TV-B-Gone running on the Arduino.
TV-B-Gone running on an Arduinio

Update (November 2010)

I have a new, improved version of the software. Details and download are here.

The software

I started with the TV-B-Gone firmware, which is available under Creative Commons license. The TV-B-Gone runs on an ATtiny85 microcontroller, not the ATmega328 used by the Arduino, so some porting was required. One tricky part of the port is the IR signals are generated using the low-level hardware timer for PWM (pulse-width modulation). Since the ATtiny85 runs at 8MHz, and the Arduino runs at 16MHz, the timing code needed to be re-written, both the 10us delay loop and the timer register operations. The quick summary is I'm using Timer 2, scaling the clock by 8, using OCR2A to control the frequency and OCR2B to control the duty cycle, and have output on pin 3 (OC2B). (You are not expected to understand this. See my article on Secrets of Arduino PWM for details of how these timers work.)

#define freq_to_timerval(x) (F_CPU / 8 / x - 1)
...
    pinMode(IRLED, OUTPUT);
    TCCR2A = _BV(COM2A0) | _BV(COM2B1) | _BV(WGM21) | _BV(WGM20);
    TCCR2B = _BV(WGM22) | _BV(CS21);
...
    OCR2A = freq; 
    OCR2B = freq / 3; // 33% duty cycle

Another tricky thing is the original firmware stores the IR codes in program space, not RAM. The microcontrollers use a Harvard architecture, which means they have separate memory and data paths for code and data, unlike normal processors that use a von Neumann architecture. Because RAM storage is so small on these microcontrollers, the IR codes are stored in program space, with the result you need to use special methods to access them, rather than normal C pointers. Program memory storage is indicated with the PROGMEM macro and pulled out of memory using special functions such as pgm_read_byte. This makes the code somewhat confusing, for example:

const struct IrCode *NApowerCodes[] PROGMEM = {
  &code_na000Code,
...
}

data_ptr = (PGM_P)pgm_read_word(NApowerCodes+i);

The original firmware stores the codes as compressed indices into tables of durations. See the TV-B-Gone design for details. Unfortunately, the Arduino's compiler didn't like the zero-length array in the IrCode structure, so I needed to rewrite the long file of codes to include another level of indirection.

Apart from these factors, porting was straightforward, and I tried to keep the original code where possible. I ripped out all the power-up and watchdog code. I replaced the low-level serial code with the Arduino's Serial library. Finally, I packaged it up into an Arduino sketch, which you can download. (Update: get the improved version here.)

The hardware

The hardware is straightfoward. I use pin 13 for the status LED; some Arduinos conveniently have an LED already wired up. Pin 3 is the PWM output to the IR LED through a 100 ohm resistor. If you want more range, add a driver transistor and use multiple IR LEDs. Pin 0 (update: pin 5) selects North American or European codes; leave it unconnected for North America and ground it for European. Pin 1 (update: 2) is connected via a pushbutton to ground to trigger the code transmission. (Update: pins have changed in the new version.)

To use the Arduino TV-B-Gone, point the IR LED at your TV, push the button, and your TV should turn off when the circuit hits the right code, which could take up to a minute. The visible LED should flash for each code that is transmitted. If the circuit doesn't work, use a cellphone camera to verify that the IR LED is transmitting. Don't expect more than a few feet range unless you use a transistor to increase the power. The code will print debugging output to the serial port if you set DEBUG in main.h.

Summary

Is the Arduino TV-B-Gone practical? If you want a TV-B-Gone that you can carry around to amaze your friends, the kit or product is much more compact, has high-powered IR outputs for longer range, and is nicely packaged. However, the Arduino platform is convenient for experimenting, and many people already have it, so I expect people will find it interesting.

You might wonder why I'm not using my Arduino Infrared Library for this project. Actually, I plan to at some point, but I wanted to get the straightforward port working first. In addition, my library will need some extensions to support all the codes that the TV-B-Gone supports.

I hope you enjoy the Arduino TV-B-Gone, and remember to use it for good, not evil!

Controlling your stereo over the web with the Arduino infrared library

Here's how you can control your stereo over the web. Not only that, but any other electronics with a IR remote control can now be controlled through your browser. You can even use your smart phone's browser to control your stereo from across the room or around the world.
Remote control on G1 phone

This project has several pieces: a simple Python web server running on your computer, an Arduino with my infrared library, and an IR LED or other emitter. The Python web server provides a web page with a graphical remote control. Clicking on the web page sends a code to the web server, which sends it over the serial port to the Arduino, which sends it to the IR LED, which controls your device.

The web server

I used a rather trivial Python web server (based on SimpleHTTPServer) that performs two tasks. First, it provides the static HTML pages and images. Second, it receives the POST requests and sends them to the Arduino using the pyserial library.

The following code excerpt shows the handler that processes POSTs to /arduino by extracting the code value out of the POST data and sending it to the Arduino over the serial line. The Python server automatically provides static pages out of the current directory by default; this is how the HTML files and images are served. The code assumes the serial port is /dev/ttyUSB0, which is typically the case on Linux.

class MyHandler(SimpleHTTPRequestHandler):
  def do_POST(self):
    if self.path == '/arduino':
      form = cgi.FieldStorage(fp=self.rfile, headers=self.headers,
        environ={'REQUEST_METHOD':'POST'})
      code = form['code'].value
      arduino.write(code)
      self.send_response(200)
      self.send_header('Content-type', 'text/html')
      return
    return self.do_GET()

arduino = serial.Serial('/dev/ttyUSB0', 9600, timeout=2)
server = HTTPServer(('', 8080), MyHandler).serve_forever()

The server can be accessed locally at http://localhost:8080. You may need to mess around with your firewall and router to access it externally; this is left as an exercise for the reader. I found that using my cell phone's browser via Wi-Fi worked surprisingly well, and over the cellular network there was just a slight lag (maybe 1/3 second).

The Arduino code

The code on the Arduino is pretty simple. It reads a command from the serial port and makes the appropriate IR library call. Commands consist of a character indicating the type of code, followed by 8 hex characters. For instance, "S0000004d1" sends the code 4d1 using Sony protocol; this is "play" on my Sony CD player. "N010e03fc" sends 010e03fc using NEC protocol; this turns my Harman Kardon stereo on. The full code is here, but some highlights:

void processSerialCode() {
  if (Serial.available() < 9) return;
  char type = Serial.read();
  unsigned long code = 0;
  // Read 8 hex characters into code (omitted)
  if (type == 'N') {
    irsend.sendNEC(code, 32);
  } 
  else if (type == 'S') {
    // Send Sony code 3 times
    irsend.sendSony(code, 12);
    delay(50);
    irsend.sendSony(code, 12);
    delay(50);
    irsend.sendSony(code, 12);
  }
  // More code for RC5 and RC6
}

In more detail, the Arduino waits for 9 characters to be available on the serial port. It then parses the hex value and calls the appropriate IR library send routine. The Arduino code does some special-case stuff for the different code types. Sony codes are transmitted three times as the protocol requires. The RC5 and RC6 protocol uses a toggle bit that is flipped on each transmission. (Disclaimer: I don't have RC5/RC6 devices, so this code is untested.) If your device uses a different protocol that the library doesn't support, you're out of luck unless you add the protocol to the library. That's probably not too hard; a couple people have already implemented new protocols.
Arduino controlling stereo via IR

The web page

Most of the smarts of the system are in the web page. In my setup, I have a HK-3370 stereo, and a Sony CDP-CE335 CD player. I took a picture of the remote for each and used an HTML image map to make each button clickable. Clicking on a button uses Ajax to POST the appropriate IR code to the server.

I use Ajax to send the code to the server to avoid reloading the web page on every click. The Javascript code is verbose but straightforward. The first part of the code creates the XML request object; unfortunately different browsers use different objects. The next part of the code creates and sends the POST request. The actual data sent to the server is, for instance, "code=N12345678" to send 0x12345678 using NEC protocol.

function button(value) {
  if (window.XMLHttpRequest) {
   request = new XMLHttpRequest();
  } else if (window.ActiveXObject) {
    try {
      request = new ActiveXObject("Msxml2.XMLHTTP");
    } catch (e) {
      try {
        request = new ActiveXObject("Microsoft.XMLHTTP");
      } catch (e) {}
    }
  }
  request.open('POST', '/arduino', true);
  request.setRequestHeader('Content-type', 'application/x-www-form-urlencoded');
  request.setRequestHeader('Content-length', value.length);
  request.setRequestHeader('Connection', 'close');
  request.send('code=' + value);
}

The clickable buttons are defined in the HTML through an image and an image map. Generating the image and the image map is the hard part of the whole project:

<img src="hk3370.png" width="268" height="800" border="0" usemap="#map" />
<map name="map">
<area shape="rect" coords="60,97,85,111" href="#" alt="on" onClick="button('N010e03fc')" />
<area shape="rect" coords="104,98,129,109" href="#" alt="off" onClick="button('N010ef906')" />
...

Each line in the map defines a region of the image as a button with a particular code. When clicked, this region will call the button function with the appropriate code, causing the code to be sent to the web server, the Arduino, and finally to the stereo. The alt text isn't actually used, but I recommend it to keep track of what button is associated with each line. The href causes the cursor to change when you go over the region, but isn't necessary.
Image of remote control in Internet Explorer

Image of remote control in Internet Explorer

In this approach, the Arduino code and web server code are very simple, as they don't need to know the functions of the codes. A different way to implement this system would be to put the table of codes ("SONY_CD_ON" = 4d1, etc.) either in the web server code or the Arduino code.

To generate the web pages, I took a picture of the remote with my camera and cleaned up the picture with GIMP. I used the GIMP image map plugin to create the image map. I outlined each button, then filled in the URL with the appropriate IR code, and filled in the alt text with the name of the button. Finally, I copied the map file into the HTML file and edited it to use the Javascript function.

The easiest way to obtain the IR codes is to use the IRrecvDump example sketch included in my IR library. Simply press the button on your remote, see what code was sent, and put that code into the image map. Alternatively, you may be able to find codes in the LIRC database.

Once you get going, it's actually fairly quick to generate the image map. Select a button in the editor, click the physical button on the remote, copy the displayed value into the editor, and move on to the next button. As long as you don't get obsessive and start tweaking the regions to line up perfectly, it's pretty quick.

If you don't want to mess around with the image map, take a look at simple.html. This file shows how to use standard HTML buttons. Not as cool as the image, but much easier:

<input type="button" value="on" onClick="button('N010e03fc')" >
<input type="button" value="off" onClick="button('N010ef906')" >
...

The hardware

The Arduino does the work of converting the hex code into an IR signal. The IR library uses digital PWM pin 3 as output, which must be connected to your IR emitter. I use a 100 ohm resistor to limit current. Wiring it up is trivial.
Schematic of IR connection to Arduino

I used a Tivo IR blaster that I had lying around, but a plain IR LED will work too. The following picture shows an IR blaster attached to my sterero. Note that the blaster needs to be positioned about the stereo's IR receiver; it may be easier to see the receiver with a flashlight.
IR blaster attached to stereo

Putting it all together

To summarize the steps:

Download the files and unpack them.
Get the infrared library and install it.
Aim the IR source at the device(s) to control, and connect it to the Arduino. (You may want to use a visible LED for testing.)
Connect the Arduino to the computer's USB port.
Compile and install IRwebremote.pde using the Arduino IDE.
Create HTML files for your remote controls. (This is the hard step.)
Run the Python server: python server.py
Go to http://localhost:8080 and control the devices.

This project has multiple pieces and requires customization, so it's not a plug-and-play project. I only recommend trying this if you know what you're doing. Good luck!

Lorem Ipsue: when internationalization goes bad

I recently saw a Master cable lock for sale with the interesting text "Lore Ipsum!" and "Lorem Ipsue!" underneath the word "Pull". If you've done any graphic design or web mockups, you're probably familiar with the Lorem ipsum text that's traditionally used as a placeholder. This Latin-based text is used, for instance, when you want to show the style and layout of a website, but don't want people to get distracted by the words.
Lock with text 'Lore ipsum' and 'Lorem ipsue'

Lock with text 'Lore ipsum' and 'Lorem ipsue'

Apparently when they designed the lock packaging, they put in placeholders but forgot to replace them with the French and Spanish translations ("Tirez!" and "¡Tira!"). I find "Lorem Ipsue!" in place of "Lorem Ipsum" interesting; maybe it is supposed to be a more Spanish-sounding placeholder, but I would have been more impressed by ¡Lorem Ipsum!

The moral is: make sure you check your internationalization; just because it looks foreign doesn't mean it's right.

Notes on dual-booting Windows 7 and Linux

Dual-booting Windows 7 and Linux is mostly straightforward, but I ran into a few problems. The following notes describe my setup, mostly for my own benefit.

If you're looking for a comprehensive dual-boot tutorial, try apc or Loko.

My setup

I'm running Windows 7 and Fedora 11 Linux on a Gateway SX2800.

Partitioning

I use the Gnome Partition Editior (GParted). Windows 7 conveniently includes a partition software under Disk Management, but inconveniently it can't shrink a volume by a large amount.

My partitioning setup is a bit crazy. Because I bought my computer before Windows 7 was released, I used Windows 7 beta until I got the official upgrade DVD. One unpleasant surprise is you have to reinstall to get from the beta to Windows 7 Home Premium; you can't just upgrade. I had an empty E: partition, so I installed the offical release there, transferred my files with Windows Easy Transfer, and then spent an evening reinstalling all my software. I store most of my data on a FAT32 partition D:to avoid messing around with NTFS on Fedora.

You'll probably want something simpler than what I have, but my partition table is:

$fdisk /dev/sda
...
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1        1698    13631488   27  Unknown
/dev/sda2   *        1699       14445   102390277+   7  HPFS/NTFS
/dev/sda3           14446       27193   102398310    7  HPFS/NTFS
/dev/sda4           27194       77825   406701540    5  Extended
/dev/sda5           27194       28289     8803588+  82  Linux swap / Solaris
/dev/sda6           28290       70176   336457296    b  W95 FAT32
/dev/sda7           70177       77825    61440561   83  Linux

sda1 is a mysterious Gateway recovery partition, sda2 is C:, sda3 is E:, sda6 is D: for most of my data, and sda7 is the Linux partition.

`/etc/fstab`

/dev/sda7               /                       ext3    defaults        1 1
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
/dev/sda5  swap                    swap    defaults        0 0
/dev/sda6  /d   vfat auto,rw,user,sync,exec,dev,suid,uid=500,gid=500,umask=000 0 2

My GRUB config

default=1
timeout=5
splashimage=(hd0,2)/boot/grub/splash.xpm.gz
hiddenmenu
title Fedora (2.6.30.9-96.fc11.i586)
 root (hd0,6)
 kernel /boot/vmlinuz-2.6.30.9-96.fc11.i586 ro root=/dev/sda7 rhgb quiet
 initrd /boot/initrd-2.6.30.9-96.fc11.i586.img
title Windows 7
 rootnoverify (hd0,1)
 chainloader +1

Note the off-by-one: (hd0,6) boots off sda7. For Windows, you need to specify the partition with the Windows boot loader; if I try to boot E: directly with (hd0,2), I get BOOTMGR IS MISSING.

Editing the GRUB config from Linux rescue boot

Invariably I end up unable to boot Linux from my disk. To fix this, I boot from my Linux CD and then fix the GRUB config. The following assumes sda7 is the Linux partition:

mount /mnt /dev/sda7
cd /mnt/boot/grub
vi menu.lst
grub-install --root-directory=/mnt /dev/sda

Editing the Windows boot loader configuration

If you have multiple versions of Windows, you'll need to configure the Windows boot loader through bcdedit: details.

Other random errors

My Fedora Core boot failed with an infinite loop of "init: ttyx respawning too fast, stopped". The solution was to change selinux=enforcing to selinux=permissive in /etc/selinux/config.

Conclusion

The above notes are mostly for my own benefit the next time I need to configure GRUB, but maybe they'll help someone else. Disclaimer: it's very easy to lose data or mess up your system so it won't boot. Back everything up and make sure you have Linux and Windows rescue disks. Don't blame me if things go wrong.

IR Bubbles: Controlling a relay with an Arduino and an IR remote

An Arduino can be used to turn devices on and off via an infrared remote. As an entirely impractical example of how to use my Arduino infrared remote library to turn things on and off, I have set up my Arduino to turn a bubble maker on and off under remote control. The same techniques can be used to control an LED or anything else via IR remote.

Arduino blowing bubbles

The circuit is straightforward. An IR sensor is connected to the Arduino to detect the infrared signal. On the output side, I used a standard relay circuit. To control the bubble maker, I wired the relay in place of the power switch. A 5 volt source wasn't enough to close the relay I had handy, but 9 volts was. For this reason, I powered the Arduino off a 9V battery and connected the relay to Vin so the relay would get the full 9 volts. (A 9 volt AC adapter also works.)

Arduino IR relay schematic

The code

The following code controls the relay. Any detected IR code will toggle the output on or off. Press a button once on the remote and the output will turn on. Press a button again and the output will turn off. (You will, of course, need to download my Arduino infrared remote library to use this code.)

#include <IRremote.h>

int RECV_PIN = 11;
int OUTPUT_PIN = 4;

IRrecv irrecv(RECV_PIN);
decode_results results;

void setup()
{
  pinMode(OUTPUT_PIN, OUTPUT);
  pinMode(13, OUTPUT);
  irrecv.enableIRIn(); // Start the receiver
}

int on = 0;
unsigned long last = millis();

void loop() {
  if (irrecv.decode(&results)) {
    // If it's been at least 1/4 second since the last
    // IR received, toggle the relay
    if (millis() - last > 250) {
      on = !on;
      digitalWrite(OUTPUT_PIN, on ? HIGH : LOW);
    }
    last = millis();      
    irrecv.resume(); // Receive the next value
  }
}

The above code only toggles the output if there is at least a 1/4 second (250 ms) gap since the last received signal. The motivation behind this is that remotes often repeat the signal as long as the button is held down, and we don't want the output to toggle continuously. One other thing to notice is the call to resume(); after a code is received, the library stops detection until resume() is called.

I should make it clear that although the above example uses a relay output, the relay is just for example. The above code will work with any arbitrary output: you can use it to control a LED or anything else that can be controlled by an Arduino output pin.

Using two buttons

Another way you might want to use an IR remote is have one button turn the output on, and another button turn it off. Here's the loop code to use a Sony DVD remote; the "play" button turns the output on, and the "stop" button turns the output off:

void loop() {
  if (irrecv.decode(&results)) {
    if (results.value == 0x4cb92) { // Sony DVD play
      digitalWrite(OUTPUT_PIN, HIGH);
    } 
    else if (results.value == 0x1cb92) { // Sony DVD stop
      digitalWrite(OUTPUT_PIN, LOW);
    }   
    irrecv.resume(); // Receive the next value
  }
}

Just remember that you have to call resume after all decodes, not just the ones you're interested in.

Getting the codes

You may wonder how I got the Sony codes. One way is to look at the appropriate LIRC config file. (At some point I'll write up a full explanation of the LIRC config files.) An easier way is to just dump out to the serial port the codes that your Arduino receives. Press the buttons on your remote, and read out the values received:

void loop() {
  if (irrecv.decode(&results)) {
    Serial.println(results.value, HEX);
    irrecv.resume(); // Receive the next value
  }
}

Using a specific button

A third way you might want to use an IR remote is to have a specific button toggle the output on and off. Here's code that uses the Tivo button on a Tivo Series 2 remote to toggle the output on and off. Again, you can get the button code from a LIRC config file, or by dumping the output to your serial port.

void loop() {
  if (irrecv.decode(&results)) {
    if (results.value == 0xa10ce00f) { // TIVO button
      // If it's been at least 1/4 second since the last
      // IR received, toggle the relay
      if (millis() - last > 250) {
        on = !on;
        digitalWrite(OUTPUT_PIN, on ? HIGH : LOW);
      }
      last = millis();
    }    
    irrecv.resume(); // Receive the next value
  }
}

The hardware setup

The hardware setup is pretty simple. The above photo shows the IR detector on the left, and the relay on the right. The drive transistor, resistor, and diode are behind the relay, and the blue terminal strip is in front. The black box to the right of the Arduino is a 9V battery holder. The box underneath is just to hold everything up.

You may wonder why I placed the input and output components so far apart, and why I didn't use the protoboard power and ground buses. It turns out that the motor on the bubble maker is a nasty source of RFI interference that would jam the IR detector's signal if the wires to the bubble maker came near the IR detector wiring. Capacitors didn't help, but distance did. Most likely this won't be a problem for you, but I figured I should mention it.

In conclusion, happy hacking with the infrared library. Let me know if you do anything interesting with it!

Simple genome analysis with Arc: In which I examine the XMRV genome and discover little of interest

I decided to examine the genome of the retrovirus XMRV to see what I could learn. This virus, with the absurd name Xenotropic Murine leukemia virus-Related Virus, was found a few years ago in many cases of prostate cancer and very recently found in many people suffering from chronic fatigue syndrome (more, more). If these studies turn out to be true, it would be quite remarkable that an obscure new retrovirus causes such differing diseases.

My goal was to take the XMRV genome (the sequence of c's, g's, a's, and t's that make up its RNA), and see if the combination "cg" is a lot more rare than you'd expect. This probably seems like a pretty random thing to do, but there's actually a biological reason.

One way the human immune system detects invaders is by looking for DNA containing a cytosine followed by a guanine (which is called a CpG dinucleotide). In mammals, most CpG dinucleotides are methylated (specially tagged), so any CpG that shows up is a sign of something invading. My hypothesis was that XMRV might have a very low number of CpG dinucleotides, helping it avoid the immune system and contributing to its ability to cause disease.

Conveniently, the genome of XMRV has been sequenced and can be downloaded, consisting of 8185 bases:

        1 gcgccagtca tccgatagac tgagtcgccc gggtacccgt gttcccaata aagccttttg
       61 ctgtttgcat ccgaagcgtg gcctcgctgt tccttgggag ggtctcctca gagtgattga
...

It would be a simple matter to use Python to count the number of CG pairs in the genome, but I figured using the Arc language would be more interesting.

The first step is to write a function to parse out the DNA sequence from the genome file; this consists of the lines between ORIGIN and //, with the spaces and numbers removed.

(def skip-to-origin (infile)
  (let line (readline infile)
    (if (is 'eof line) line ;quit on eof
        (litmatch "ORIGIN" line) line ;quit on ORIGIN
        (skip-to-origin infile)))) ; recurse

; Reads the data from one line
; Returns nil on EOF or // line
(def read-data-line (infile)
  (let line (readline infile)
    (if (is 'eof line) nil
        (litmatch "//" line) nil
 (keep [in _ #\c #\g #\a #\t] line))))

; Read the sequence data from ORIGIN to //
; Returns a single string
(def readseq (infile)
   (skip-to-origin infile)
   (string (drain (read-data-line infile))))

This code simply uses skip-to-origin to read up to the ORIGIN line, and then read-data-line to read the cgat data from each following line.

The next step is a histogram function to count the number of times each nucleotide occurs into a table. (Edit: I've shortened my original code with some advice from arclanguage.org.)

((def hist (seq)
  (counts (coerce seq 'cons)))

After downloading the genome as "xmrv", I can load the sequence into seq and generate the histogram:

arc> (= seq (w/infile inf "xmrv" (readseq inf)))
"gcgccagtcatccgatagactgagtcgcccgggtacccgtgt ...etc"
arc> (len seq)
8185
arc>(hist seq)
#hash((#\t . 1732) (#\g . 2057) (#\a . 2078) (#\c . 2318))

The sequence is 8185 nucleotides long as expected, with 1732 T's, 2057 G's, 2078 A's, and 2318 C's. To count the number of times each pair of nucleotides occurs, I made a more general function that will handle pairs or any other sequence of n nucleotides:

(def histn (seq n)
  (w/table h
    (for i 0 (- (len seq) n)
      (++ (h (cut seq i (+ i n)) 0)))
    h))

Since I got tired of looking at raw hash tables, I made a short function to format the output as well as generating percentages.

(def prettyhist (h)
  (let count (reduce + (vals h))
    (let sorted (sort (fn ((k1 v1) (k2 v2)) (> v1 v2))
                      (accum addit (each elt h (addit elt))))
      (each (k v) sorted
        (prn k ": " v " (" (num (* (/ v count) 100.) 2) "%)")))))

Running these functions:

arc> (prettyhist (histn seq 2))
cc: 862 (10.53%)
gg: 639 (7.81%)
ag: 616 (7.53%)
ct: 612 (7.48%)
aa: 588 (7.18%)
ga: 585 (7.15%)
ca: 537 (6.56%)
ac: 529 (6.46%)
tg: 494 (6.04%)
tc: 469 (5.73%)
gc: 458 (5.6%)
tt: 401 (4.9%)
gt: 375 (4.58%)
ta: 368 (4.5%)
at: 344 (4.2%)
cg: 307 (3.75%)

This shows that CG is the most uncommon combination, but not super-low. How does this compare to the frequence if the C/G/A/T frequency was the same, but they were ordered randomly? We can compute this by computing all the possibilities of taking two letters from the original sequence. Instead of actually summing all the combinations, we can just take the cross-product of the original histogram:

(def cross (h1 h2)
  (let h (table)
    (each (k1 v1) h1
      (each (k2 v2) h2
        (let combo (string k1 k2)
   (if (no (h combo))
     (= (h combo) 0))
   (= (h combo) (+ (h combo) (* v1 v2))))))
    h))

This gives us the result:

arc> (prettyhist (cross (hist seq) (hist seq)))
cc: 5373124 (8.02%)
ac: 4816804 (7.19%)
ca: 4816804 (7.19%)
cg: 4768126 (7.12%)
gc: 4768126 (7.12%)
aa: 4318084 (6.45%)
ag: 4274446 (6.38%)
ga: 4274446 (6.38%)
gg: 4231249 (6.32%)
tc: 4014776 (5.99%)
ct: 4014776 (5.99%)
at: 3599096 (5.37%)
ta: 3599096 (5.37%)
gt: 3562724 (5.32%)
tg: 3562724 (5.32%)
tt: 2999824 (4.48%)

This tells us that CG would appear 7.12% of the time if the sequence were randomly shuffled, but instead it appears only 3.75% of the time, so CG shows up about half as often as expected. This is in line with known results for the related MuLV virus, so it seems that there's nothing special about XMRV.

On the other hand, it's known that the HIV genome has a severely low amount of CpG:

arc> (prettyhist (histn (w/infile inf "hivbru" (readseq inf)) 2))
aa: 1096 (11.88%)
ag: 971 (10.52%)
ca: 766 (8.3%)
ga: 762 (8.26%)
at: 691 (7.49%)
ta: 665 (7.21%)
gg: 631 (6.84%)
tg: 548 (5.94%)
ac: 530 (5.74%)
tt: 524 (5.68%)
gc: 430 (4.66%)
ct: 428 (4.64%)
gt: 409 (4.43%)
cc: 381 (4.13%)
tc: 315 (3.41%)
cg: 81 (.88%)

I also took a look at the H1N1 flu sequences. The influenza genome is on 8 separate strands of RNA, so there are 8 separate sequences to process. (The HA segment is the "H" and the NA segment is the "N" in H1N1.) Many H1N1 sequences can be downloaded. I somewhat arbitrarily picked A/Beijing/02/2009(H1N1) since all 8 strands were sequenced. After saving the strands as flu1, ..., flu8, I ran the histogram on all the strands:

arc> (for i 1 8 (prn "---" i "---")
  (prettyhist (histn (w/infile inf (string "flu" i) (readseq inf)) 2)))

On all strands, "cg" was at the bottom of the frequency chart. Especially low-frequency strands are 1 (PB2) at 1.93% CpG, 2 (PB1) at 1.52%, 4 (HA) at 1.57%, and 6 (NA) at 1.77%. Strand 8 (NEP) was highest at 3.09%. So it looks like influenza is pretty low in CpG frequency, but not as low as HIV. (Edit: I've since found a paper that examines CpG in influenza in more detail.)

To conclude, Arc can be used for simple genome analysis. My original hypothesis that XMRV would have low levels of CpG dinucleotides holds, but not as dramatically as for HIV or H1N1 influenza. Apologies to biologists for my oversimplifications, and apologies to statisticians for my lack of p-values :-)

An Arduino universal remote: record and playback IR signals

I've implemented a simple IR universal remote that will record an IR code and retransmit it on demand as an example for my IR library. Handling IR codes is a bit more complex than it might seem, as many protocols require more than simply recording and playing back the signal.

To use the universal remote, simply point your remote control at the IR module and press a button on the remote control. Then press the Arduino button whenever you want to retransmit the code. My example only supports a single code at a time, but can be easily extended to support multiple codes.

The hardware

The above picture shows the 9V battery pack, the Arduino board, and the proto board with (top-to-bottom) the IR LED, IR receiver, and pushbutton.

The circuitry is simple: an IR sensor module is connected to pin 11 to record the code, an IR LED is connected to pin 3 to transmit the code, and a control button is connected to pin 12. (My IR library article has details on the sensor and LED if you need them.)
Schematic of the IR recorder

The software

The code can be downloaded as part of my IRremote library download; it is the IRrecord example.

Handling different protocols

The code supports multiple IR protocols, each with its own slight complications:

Sony codes can be recorded and played back directly. The button must be held down long enough to transmit a couple times, as Sony devices typically require more than one transmission.

The common NEC protocol is complicated by its "repeat code". If you hold down a button, the remote transmits the code once followed by multiple transmissions of special repeat code. The universal remote records the code, not the repeat code. On playback, it transmits the code once, followed by the repeat code.

The RC5 and RC6 protocols handle repeated transmissions differently. They use two separate codes for each function, differing in a "toggle bit". The first time you hold down a button, the first code is transmitted repeatedly. The next time you hold down a button, the second code is transmitted repeatedly. Subsequent presses continue to alternate. The universal remote code flips the toggle bit each time it transmits.

The universal remote handles any other unknown protocol as a "raw" sequence of modulated IR on and off. The main complication is that IR sensor modules typically stretch out the length of the "on time" by ~100us, and shorten the "off time" correspondingly. The code compensates for this.

Most likely there are some codes that that can't handle, but it has worked with the remotes I've tried.

The code also prints debugging information to the serial console, which can be helpful for debugging any problems.

In conclusion, this is intended as a proof of concept rather than a useful product. The main limitation is supporting one code at a time, but it's straightforward to extend the code. Also note that the record and playback functions can be separated; if you know the IR codes you're dealing with, you can use just the necessary function.