Adleman demonstrated (Science 266,1994,1021-1024) how standard
methods of molecular biology could be used to solve a difficult
computational problem. Since then the interest on "DNA based computers"
has been growing rapidly. There are still considerable obstructions
to creating a practical molecular computer, notably physical
obstructions arising primarily from difficulties in dealing with
large scale systems and in coping with errors.
Although the real practical feasibility of molecular computers
remains still in doubt, the field has opened new vistas and important
research problems both for computer scientists and biologists. The
computer scientist and mathematician is looking for new models of
computation. I would like to call such models "Watson-Crick machines"
because of the Watson-Crick complementarity in DNA: the four nucleotides
A (adenine), T (thymine), C (cytosine) and G (guanine) form two
complementary pairs (A,T) and (C,G). In a Watson-Crick machine,
Turing's slow diligent clerk is replaced by DNA strands acting in
a test tube. In this way the massive parallelism of DNA strands
may render tractable some computational problems that were beyond
the reach of the diligent clerk; not because the DNA strands are
smarter but simple because they can make many tries at once. For the
biologist, the unexpected results in DNA computing indicate that
models of DNA computers could be significant also for the study of
important biological problems such as evolution. Moreover, the
techniques of DNA manipulation developed originally for computational
purposes could find relevant applications in genetic engineering.
However, because of the rather diverse research traditions, it will
not be easy to establish common idioms, let alone common
vocabulary, for the researchers in computer science and biology.
In the issue of Science containing Adleman's seminal work,
David Gifford wrote (Science 266, 1994, 993-994): "If we are able to
construct a universal machine out of biological macromolecular
components, then we could perform any computation by means of
biological techniques. There are certainly powerful practical
motivations for this approach, including the information-encoding
density offered by macromolecules and the high energy efficiency
of enzyme systems. At present, there is no known way of creating
a synthetic universal system based on macromolecules. Universal
systems require the ability to store and retrieve information,
and DNA is certainly up to the task if one could design appropriate
molecular mechanisms to interpret and update the information in
DNA. This ultimate goal remains elusive, but once solved, it will
revolutionize the way we think about both computer science and
molecular biology."
Although such an ultimate goal still remains elusive, there
are by now already many DNA based universal computational models.
(For instance, see G. Paun and A.Salomaa, Mathematica Japonica 43,
1996, 607-632.) That there are many diverse ways of constructing
DNA based universal computers is due to the following overall
observation. Watson-Crick complementarity guarantees universal
computations in any model of DNA computers having sufficient capabilities
of handling inputs and outputs. This is a consequence of the close
similarity between the Watson-Crick complementarity and the TWIN-
SHUFFLE LANGUAGE. (G.Rozenberg and A.Salomaa, Leiden University
Computer Science TR 96-28) The latter is a language over four letters,
known to be powerful enough to serve as a basis for arbitrary
computations. This state of affairs can be viewed also as a
mathematical explanation to the number of nucleotides in DNA being
four, rather than three or five.
The purpose of this paper is to discuss the automata-theoretic
significance of Watson-Crick complementarity. Our main reference
is the forthcoming paper "Automata on DNA double strands" by R.Freund,
G. Paun, G.Rozenberg and A.Salomaa. The essential difference
between our automata and the customary ones lies in the data structures
they handle. While the customary automata operate on linear
(one-dimensional) strings of symbols, our automata take double
strands as their objects. Moreover, the double strands resemble
DNA molecules in the following sense. The matching letters
(nucleotides) are COMPLEMENTARY, the relation of complementarity
being defined for pairs of letters of the basic alphabet, similarly
to the Watson-Crick complementarity of the pairs (A,T) and (C,G)
of the DNA alphabet. Most importantly, we assume that such data
structures, double strands satisfying the complementarity
requirement mentioned, are freely available in the sense that we
do not have to check in any way that the matching letters indeed
are complementary. This assumption can be interpreted to mean,
in language-theoretic terms, that our automata have the twin-shuffle
language as a built-in feature.
It seems obvious that theoretical studies about DNA computing
must make use of the following two advantages stemming from DNA
molecules. (i) Watson-Crick complementarity which renders the
power of the twin-shuffle language available, and (ii) the
multitude of DNA molecules which brings massive parallelism
to the computing scene. It seems that theoretical studies have
so far concentrated on (i). It is still quite an open area to
model (ii) mathematically, as well as to combine (i) and (ii)
into one model of DNA computing.