Sequence comparison

By comparing genetic sequences, we can tell how they are related. While there are fancy programs (such as BLAST and ALIGN) to make these comparisons, even a very simple comparison will revel a lot.

An easy way to compare two sequences is to make a big grid with one DNA sequence along the X axis and the other along the Y axis. Wherever the corresponding DNA bases from the two sequences match, you put a dot. Now, if the sequences are nearly the same, there will be a diagonal line where they match up. The closer the match, the stronger the line.

There will also be random dots where they coincidentally match. If you mark every match, about 1 in 4 positions will be a random match. If you only count spots where, say, 5 in a row match, then the number of random matches will be much lower.

I've written a short C program to do this matching. It puts a dot if 5 bases in a row match. This page shows the results of various matchings.

HIV-1 compared with itself. There is a strong diagonal line where the sequence matches itself. There are also lots of random dots.
HIV-1 compared with HIV-2. Note that the diagonal line is weaker, indicating that there are many differences between HIV-1 and HIV-2. Since the line is still visible, they are clearly related, though.
HIV-1 compared with SIV. Note that the two viruses are related.
HIV-1 compared with visna. There is a very faint diagonal line visible in places, indicating there is some relationship between the viruses. Note, however, that HIV-1 is much closer to SIV than to visna.
HIV-1 compared with BLV. These viruses are far enough apart that my program can't detect any similarity.
HIV-1 compared with HTLV-1. These viruses are also too far apart for my program.
A BLV/visna splice compared with visna. For this image, I simulated a splice of BLV and visna by merging together parts of the two sequence files. Note from the image that the splicing is very clear. By comparing this image with the HIV-1 to visna comparison above, it should be very clear that HIV-1 was not made by splicing visna with anything.
HIV-2 compared with SIV. Note that HIV-2 is even closer to SIV than HIV-1 is.

(The exact sequences used were HIV-1=HIVBRUCG, HIV-2=HIVV2RODX, BLV=BLVCG, SIV=SIVAGMTYO, visna=VLVCG, HTLV-1=HTVPRCAR.)

Conclusions

From these comparisons, several things are clear. Most importantly, HIV-1 is much closer to SIV than to visna, HTLV-1, or BLV. This illustrates that HIV came from SIV (or they both came from some closely related virus). Second, HIV does not show long sequences that closely match visna, HTLV-1, or BLV. This shows that HIV was not formed by splicing together parts of these viruses.

Ken Shirriff: [email protected]