COMPUTABILITY PARADIGMS BASED ON DNA COMPLEMENTARITY

Arto Salomaa
Department of Mathematics,University of Turku
SF-20014 Turku, Finland


Adleman demonstrated (Science 266,1994,1021-1024) how standard methods of molecular biology could be used to solve a difficult computational problem. Since then the interest on "DNA based computers" has been growing rapidly. There are still considerable obstructions to creating a practical molecular computer, notably physical obstructions arising primarily from difficulties in dealing with large scale systems and in coping with errors.

Although the real practical feasibility of molecular computers remains still in doubt, the field has opened new vistas and important research problems both for computer scientists and biologists. The computer scientist and mathematician is looking for new models of computation. I would like to call such models "Watson-Crick machines" because of the Watson-Crick complementarity in DNA: the four nucleotides A (adenine), T (thymine), C (cytosine) and G (guanine) form two complementary pairs (A,T) and (C,G). In a Watson-Crick machine, Turing's slow diligent clerk is replaced by DNA strands acting in a test tube. In this way the massive parallelism of DNA strands may render tractable some computational problems that were beyond the reach of the diligent clerk; not because the DNA strands are smarter but simple because they can make many tries at once. For the biologist, the unexpected results in DNA computing indicate that models of DNA computers could be significant also for the study of important biological problems such as evolution. Moreover, the techniques of DNA manipulation developed originally for computational purposes could find relevant applications in genetic engineering. However, because of the rather diverse research traditions, it will not be easy to establish common idioms, let alone common vocabulary, for the researchers in computer science and biology.

In the issue of Science containing Adleman's seminal work, David Gifford wrote (Science 266, 1994, 993-994): "If we are able to construct a universal machine out of biological macromolecular components, then we could perform any computation by means of biological techniques. There are certainly powerful practical motivations for this approach, including the information-encoding density offered by macromolecules and the high energy efficiency of enzyme systems. At present, there is no known way of creating a synthetic universal system based on macromolecules. Universal systems require the ability to store and retrieve information, and DNA is certainly up to the task if one could design appropriate molecular mechanisms to interpret and update the information in DNA. This ultimate goal remains elusive, but once solved, it will revolutionize the way we think about both computer science and molecular biology."

Although such an ultimate goal still remains elusive, there are by now already many DNA based universal computational models. (For instance, see G. Paun and A.Salomaa, Mathematica Japonica 43, 1996, 607-632.) That there are many diverse ways of constructing DNA based universal computers is due to the following overall observation. Watson-Crick complementarity guarantees universal computations in any model of DNA computers having sufficient capabilities of handling inputs and outputs. This is a consequence of the close similarity between the Watson-Crick complementarity and the TWIN- SHUFFLE LANGUAGE. (G.Rozenberg and A.Salomaa, Leiden University Computer Science TR 96-28) The latter is a language over four letters, known to be powerful enough to serve as a basis for arbitrary computations. This state of affairs can be viewed also as a mathematical explanation to the number of nucleotides in DNA being four, rather than three or five.

The purpose of this paper is to discuss the automata-theoretic significance of Watson-Crick complementarity. Our main reference is the forthcoming paper "Automata on DNA double strands" by R.Freund, G. Paun, G.Rozenberg and A.Salomaa. The essential difference between our automata and the customary ones lies in the data structures they handle. While the customary automata operate on linear (one-dimensional) strings of symbols, our automata take double strands as their objects. Moreover, the double strands resemble DNA molecules in the following sense. The matching letters (nucleotides) are COMPLEMENTARY, the relation of complementarity being defined for pairs of letters of the basic alphabet, similarly to the Watson-Crick complementarity of the pairs (A,T) and (C,G) of the DNA alphabet. Most importantly, we assume that such data structures, double strands satisfying the complementarity requirement mentioned, are freely available in the sense that we do not have to check in any way that the matching letters indeed are complementary. This assumption can be interpreted to mean, in language-theoretic terms, that our automata have the twin-shuffle language as a built-in feature.

It seems obvious that theoretical studies about DNA computing must make use of the following two advantages stemming from DNA molecules. (i) Watson-Crick complementarity which renders the power of the twin-shuffle language available, and (ii) the multitude of DNA molecules which brings massive parallelism to the computing scene. It seems that theoretical studies have so far concentrated on (i). It is still quite an open area to model (ii) mathematically, as well as to combine (i) and (ii) into one model of DNA computing.


Further links: