Open Source–It’s in the Genes

What happens when you release 500,000 human genomes as open source? This.

DNA is digital. The three billion chemical bases that make up the human genome encode data not in binary, but in a quaternary system, using four compounds—adenine, cytosine, guanine, thymine—to represent four genetic "digits": A, C, G and T. Although this came as something of a surprise in 1953, when Watson and Crick proposed an A–T and C–G pairing as a "copying mechanism for genetic material" in their famous double helix paper, it's hard to see how hereditary information could have been transmitted efficiently from generation to generation in any other way. As anyone who has made photocopies of photocopies is aware, analog systems are bad at loss-free transmission, unlike digital encodings. Evolution of progressively more complex structures over millions of years would have been much harder, perhaps impossible, had our genetic material been stored in a purely analog form.

Although the digital nature of DNA was known more than half a century ago, it was only after many years of further work that quaternary data could be extracted at scale. The Human Genome Project, where laboratories around the world pieced together the three billion bases found in a single human genome, was completed in 2003, after 13 years of work, for a cost of around $750 million. However, since then, the cost of sequencing genomes has fallen—in fact, it has plummeted even faster than Moore's Law for semiconductors. A complete human genome now can be sequenced for a few hundred dollars, with sub-$100 services expected soon.

As costs have fallen, new services have sprung up offering to sequence—at least partially—anyone's genome. Millions have sent samples of their saliva to companies like 23andMe in order to learn things about their "ancestry, health, wellness and more". It's exciting stuff, but there are big downsides to using these companies. You may be giving a company the right to use your DNA for other purposes. That is, you are losing control of the most personal code there is—the one that created you in the boot-up process we call gestation. Deleting sequenced DNA can be hard.