On the Languages Accepted by Watson-Crick Finite Automata with Delays

: In this work, we analyze the computational power of Watson-Crick ﬁnite automata (WKFA) if some restrictions over the transition function in the model are imposed. We consider that the restrictions imposed refer to the maximum length difference between the two input strands which is called the delay. We prove that the language class accepted by WKFA with such restrictions is a proper subclass of the languages accepted by arbitrary WKFA in general. In addition, we initiate the study of the language classes characterized by WKFAs with bounded delays. We prove some of the results by means of various relationships between WKFA and sticker systems.


Introduction
DNA computing is a research area that draws its inspiration from the processes that take place in nature based on the functioning and structure of biomolecules, DNA, RNA, and proteins [1]. For more than 20 years, several models have been proposed in this research area, most of them universal and complete models, inspired by natural processes carried out in living cells. One of the first computational models proposed in the framework of DNA computing was the Watson-Crick finite automaton (WKFA) [2], which is a type of finite state abstract machine that takes its input tape from the DNA double strand and takes into account the complementarity relationships between symbols as it happens in nature with respect to the DNA nucleotides. That is, adenine (A) is related with thymine (T) and cytosine (C) is related with guanine (G), according to the Watson-Crick complementarity rule. The other aspect taken under consideration in this model is the ability of DNA to recombine. DNA recombination occurs at the ends of a molecule that has been previously denatured and fractionated, that is the primers of the molecule that have a length that allows to keep the molecules stable. The length of these "sticky ends" will be used in this work to characterize the language acceptation by WKFA. Several variants of the WKFA model have been proposed in recent years. We can mention, among others, the reversible WK automata [3], the unary WK automata [4], the WK Jumping Finite Automata [5], and the 5 → 3 sensing WKFA [6], among others. In [7], a system with several WKFA and parallel communication is proposed. A survey on the WKFA model can be found in [8].
Another computation model proposed in the DNA computing framework was the sticker systems [9]. The characteristics of this model are similar to WKFA (double strands and complementarity) but within a generative approach. In this case, we have a finite set of (possibly incomplete) double strands (the axioms) and a finite set of rules (dominoes) that can enlarge the axioms by using an operation over strands called sticking. The sticking operation is based again on the DNA recombination by the double strand and the complementarity relationship. It could be considered that sticker systems and WKFA are very similar models whose main difference is that WKFA are accepting systems (at least in their basic definition), while sticker systems are generating systems.
In this work, we propose an additional restriction in order to accept or reject a double strand in the WKFA. We consider that the length of the primers of the double strand during the computation time, does not exceed a predefined positive integer value. We can impose this feature in the computations performed during the application of transitions in the WKFA, as is the case in the delayed computations of sticker systems [10], or we can define it explicitly in the transitions of the model. In this work we will propose the latter approach, and explicitly define the delays in the definition of the model transition function. Once the characteristic feature we are working with has been explicitly defined, we will proceed to study its computational power by relating it to the classical theory of formal languages. It is important to study such kinds of restrictions in the model because if we wish to see practical applications of the model such as its application in bioinformatics [11], then working with the model in a generic way implies having to solve extremely difficult problems such as working with ambiguous and non-deterministic models. By imposing these types of constraints, the model can be put into a practical context that allows the application of machine learning techniques in order to solve problems of a very diverse nature [12]. Hence, it is important: (1) To study the scope of the restrictions from a formal point of view (i.e., study the classes of languages that can be defined with the restricted model), and (2) explicitly define the restrictions imposed for better use in other application areas.
The structure of this paper is as follows: In the next section we will introduce the basic concepts about formal language theory and the WKFA and sticker systems that we will use in the rest of the work. In addition, we define the delays in sticker systems and we relate sticker systems with WKFA. In Section 3, we will introduce the WKFA model with bounded delays and we define some relationships between their language classes. In addition, we provide a sufficient condition to define bounded delays as the intial delays in the WKFA. Finally, in Section 4, we provide some results about the language classes in WKFA with only initial delays. We finish our work with some conclusions and future work on the topic addressed in this work.

Basic Concepts and Notation
In this section, we introduce basic concepts from formal language theory according to [13,14] and from stickers according to [1] that we will use in the sequel.
An alphabet Σ is a finite nonempty set of elements named symbols. A string defined over Σ is a finite ordered sequence of symbols from Σ. The infinite set of all the strings defined over Σ is denoted by Σ * . Given a string x ∈ Σ * we denote its length by |x|. The empty string is denoted by λ and Σ + denotes Σ * − {λ}. Given a string x we denote the reversal string of x by x r . Given two strings x = x 1 x 2 · · · x p and y = y 1 y 2 · · · y m , we denote the product (concatenation) of x by y as the string xy = x 1 x 2 · · · x p y 1 y 2 · · · y m . A language L defined over Σ is a set of strings over Σ.
A grammar is a construct G = (N, Σ, P, S) where N and Σ are the alphabets of auxiliary and terminal symbols with N ∩ Σ = ∅, S ∈ N is the axiom of the grammar and P is a finite set of productions in the form α → β, where α ∈ (N ∪ Σ) * N(N ∪ Σ) * and β ∈ (N ∪ Σ) * . The language of the grammar is denoted by L(G) and it is the set of terminal strings that can be obtained from S by applying symbol substitutions according to P. Formally, w 1 ⇒ G w 2 if w 1 = uαv, w 2 = uβv, and α → β ∈ P. We denote the reflexive and transitive closure of ⇒ G by * ⇒ G . The language generated by G is defined by the set L(G) = {w ∈ Σ * : S * ⇒ G w}.
We say that a grammar G = (N, Σ, P, S) is right (left) linear (regular) if every production in P is in the form A → uB (A → Bu) or A → w with A, B ∈ N and u, w ∈ Σ * . The class of languages generated by right (left) linear grammars is the class of regular languages and is denoted by RE G. We say that a grammar G = (N, Σ, P, S) is linear if every production in P is in the form A → uBv or A → w with A, B ∈ N and u, v, w ∈ Σ * . The class of languages generated by linear grammars is denoted by LIN . A well-known result from formal language theory is the inclusion RE G ⊂ LIN . A deterministic finite automata (DFA) is an abstract machine A = (Q, Σ, δ, q 0 , F), where Q is a finite set of states, Σ is an input alphabet, δ : Q × Σ → Q is a transition function, q 0 ∈ Q is an initial state, and F ⊆ Q is a set of final states. The extension of the transition function over stringsδ : Q × Σ * → Q is defined asδ(q, λ) = q, andδ(q, xa) = δ(δ(q, x), a) for every x ∈ Σ * , and a ∈ Σ. In the following, we will not distinguish between δ andδ , and its meaning will depend on the function arguments. The language accepted by a DFA A is the set L(A) = {x ∈ Σ * : δ(q 0 , x) ∈ F}. A classical result from formal language theory is that the class of languages accepted by deterministic finite automata is RE G. For the case of non-deterministic finite automata, the transition function is defined as δ : Q × Σ → P (Q), where P (Q) is the power set of Q. It is a classical result from formal language theory that the set of languages accepted by non-deterministic finite automata is again RE G.
Regular languages can be defined by regular expressions. They are a language specification based on definition rules that consider letters of the alphabet, the union, product and closure operators, and the use of parentheses to reorder the priority of operators.
A homomorphism h is defined as a mapping h : Σ → Γ * where Σ and Γ are alphabets. We can extend the definition of homomorphisms over strings as h(λ) = λ and h(ax Given an alphabet Σ = {a 1 , · · · , a n }, we use the symmetric (and injective) relation of complementarity ρ ⊆ Σ × Σ. For any string x ∈ Σ * , we denote by ρ(x) the string obtained by substituting the symbol a in x by the symbol b such that (a, b) ∈ ρ (remember that ρ is injective) with ρ(λ) = λ.
Given an alphabet V, a sticker over V is the pair (x, y) such that x = x 1 vx 2 , y = y 1 wy 2 with x, y ∈ Σ * and ρ(v) = w. The sticker (x, y) will be denoted by x y . A sticker x y will be a complete and complementary molecule if |x| = |y| and ρ(x) = y. A complementary and complete molecule x y will be denoted as x y . The set of all complete and complementary molecules over V will be called the Watson-Crick domain and denoted by WK ρ (V). We define the following sets, where ρ denotes the symmetric relation between the symbols in V: In addition, we can consider the set Kari et al. [9] defined the sticking operation µ which can be considered as a product operation over stickers. Basically, given two stickers x and y, the operation µ(x, y) returns a new sticker by taking into account the complementarity relation ρ between the upper string in x and the lower string in y and vice versa.
Sticker systems were defined by Freund et al. in [15] and by Kari et al. in [9] as a generative model to apply the sticking operation µ (a product-like operation over stickers) to obtain languages. At the same time, Pȃun and Rozenberg [10] introduced some aspects which will be highly related to the present work such as the (bounded) delayed languages defined by sticker systems. A sticker system is defined by the tuple where its elements are called dominoes, and A is a finite subset of LR ρ (V) which are the axioms of the system. The derivation relation ⇒ between stickers, according to a sticker system γ, can be defined as follows: Given that ⇒ is a relation between stickers subjected to a sticker system γ, we will denote the reflexive and transitive closure of ⇒ by * ⇒. Now, we can define the language of complete and complementary molecules generated by the sticker system γ = (V, ρ, A, D) with no restrictions as follows: In addition, the language generated by γ is defined as: The family of languages generated by arbitrary sticker systems is denoted by ASL(n).
The maximal length of an overhang in a sticker z = x y is called the delay of z.
Observe that the delay of a sticker z is the maximal length of the right or left overhang located in the upper or lower strand.
, then the delay of z equals to the maximum value between |u| and |w|, where The delay associated to a derivation in a sticker system is the maximal delay of the stickers that are produced during that derivation. For any sticker system γ, the language of strings generated by γ with a delay d is denoted by L d (γ). A sticker system γ is said to have a bounded delay if there is d > 1 such that L d (γ) = L n (γ)· The family of languages generated by arbitrary sticker systems with a bounded delay is denoted by ASL(b). Watson-Crick finite automata (WKFA) were defined first by Freund et al. [2] as an acceptance model to deal with stickers. It can be defined by the tuple M = (V, ρ, Q, s 0 , F, δ), where Q and V are disjoint alphabets (states and symbols), ρ ⊆ V × V is a symmetric (and injective) relation of complementarity, s 0 ∈ Q is the initial state, F ⊆ Q is a set of final states, and δ : Q × V * V * → P (Q). A transition step in an arbitrary WK finite We can use a normal form such that for every transition q ∈ δ(q, This normal form defines the so-called 1-limited WK finite automata and they were proved to be equivalent to arbitrary ones [2]. The language of complete and complementary molecules accepted by M will be x y q, with q ∈ F and x, y ∈ V * }, while the upper strand language accepted by M will be defined as: The class of languages accepted by arbitrary WKFA in the upper strand is denoted by AW K.
We can use a normal form such that for every transition q ∈ δ(q, x 1 x 2 ) then x 1 = λ or x 2 = λ. This normal form defines the so called simple WKFA and they were proved to be equivalent to arbitrary ones [2]. Now, we will relate the languages in AW K with the languages generated by sticker systems through the following result. Proof. Let us take M = (V, ρ, Q, s 0 , F, δ) to be a simple WKFA with Q = {s 0 , q 1 , . . ., q n } and V = {a 1 , · · · , a n }. Observe that this restriction over WKFA does not affect the computational capacity of the model, given that simple is a normal form for WKFA. We can define the sticker system γ = (V , ρ , A, D) where the elements of γ are defined as follows: ρ is defined as: • The set of axioms A is defined through the following rules: 1. # s 0 # This axiom places the initial state of M as the first state in the sequence defined to accept any string.

2.
s An axiom is created for every transition from the initial state of the WKFA, s 0 . The overhang end on the left lower strand of the axiom corresponds to the state of the WKFA that is reached from the initial state with a λ-movement.

3.
s If the initial state in the WKFA is final too, then this axiom is inserted. That is, the additional state q f is produced from the initial state without any input symbol.

•
The set D is defined by the pairs according to the following rules: ) and x = λ or y = λ.
That means that, in M, it is possible to transit from q to p with the pair x y .

2.
( For each final state q i ∈ F of the WKFA, these stickers mark the end of the state sequence in the WKFA to accept the input. 3.
This rule is added to avoid overhang ends in the left side of the sticker system. When this rule is added there is no way to add another one according to the automata transitions. where q in ∈ F. The following derivation is hold in γ: . .q i 1 s 0 #x 1 . . .x n q i n q i n−1 . . .q i 1 s 0 #y 1 . . .y n ⇒ γ · · · · · · ⇒ γ #q i n q i n−1 . . .q i 1 s 0 #x 1 . . .x n q f #q i n q i n−1 . . .q i 1 s 0 #y 1 . . .y n ⇒ γ q f #q i n q i n−1 . . .q i 1 s 0 #x 1 . . .x n q f #q i n q i n−1 . . .q i 1 s 0 #y 1 . . .y n , and it can easily be shown that x 1 . . .x n ∈ L u (M) iff q f #q i n q i n−1 . . .q i 1 s 0 #x 1 . . .x n ∈ L n (γ). A regular set R is defined by the regular expression: q f #(s 0 + q 1 + · · · + q n ) * #(a 1 + a 2 + · · · + a n ) * .
The above regular expression guarantees that only strings containing a single symbol q f can be produced and, therefore, in the sticker system, the molecules obtained correspond only to those that have reached a final state in the WKFA. On the other hand, note that without the set R acting as a control set, the sticker system could generate molecules that would not be recognized in the automaton. In other words, nothing prevents the sticker system from continuing to add stickers that could complete new molecules once the molecule accepted by the automaton has been generated.
It can be observed that the sequence of states needed to accept any string x in M is defined by the reverse of the sequence of states between the # marks in the stickers obtained in γ.

Example 1.
Let us take the simple WKFA defined by the transition diagram in Figure 1. From the previous WKFA, we build a sticker system, such that A and D are defined as follows: The regular expressión defined to control the stickers generation is defined as follows:

Accepting Languages with Bounded Delays in WKFA
In this section, we will introduce the delays during a WKFA computation in a way similar to sticker systems [10].
Let us suppose that we consider the following computation in a WKFA M = (V, ρ, Q, s 0 , F, δ) with x = x 1 x 2 . . .x n and y = y 1 y 2 . . .y n : The definition of delays in the computation of the automaton is a constraint that reduces the computational power of the model. This can be formalized by the following result. Proof. Let us consider the language L = {wcw : w ∈ {a, b} * }. This language belongs to AW K given that it can be accepted by the following WKFA.
Observe that the automaton reads first in the upper strand the string w until it reaches the symbol c always being in the state q 0 . Once the symbol c has been read, the automaton changes its state and it then starts to match the string after the symbol c with the string starting on the lower strand. If the symbol c is found in the lower strand then it proceeds to read the rest of the input string in the lower strand. It is easy to see that for any delay value d > 0, a string wcw with |w| > d will not be parsed with a delay less than or equal to d given that the automaton needs to read the string w first before arriving to the symbol c. In such a case, the automaton produces a delay greater than d given that |w| > d. Now, we are going to consider the relationships between the upper, lower, and arbitrary delays. In this case, we will work with arbitrary WKFA and the identity as the relation ρ defined in the automata. Observe that this does not detract from the generality of our results since the complementarity relationship can be reduced to the identity as shown in [16].
Proof. Let M 1 = (V, ρ, Q, s 0 , F, δ 1 ) be an arbitrary WKFA with ρ being the identity relation, and L be the language accepted by M 1 with a bounded upper (lower) delay d. Then, from M 1 we can obtain the WKFA M 2 = (V, ρ, Q, s 0 , F, ). It is easy to see that if w is accepted by M 1 with an upper (lower) delay d, then w is accepted by M 2 by a lower (upper) delay given that the relation ρ is the identity and the transitions in M 2 swap the contents of the upper and lower strands.
Observe that ρ is the identity relation and, we can assume that x i ∈ V, 1 ≤ i ≤ n. Then, from M we can obtain a non-deterministic finite automata A = (Q, V, s 0 , f , F), such It is easy to see that x ∈ L(A) = L. Hence, AW K updel (0) ⊆ RE G. In order to see that RE G ⊆ AW K updel (0), we take any arbitrary deterministic finite automata A = (Q, Σ, f , q 0 , F), and we obtain the WKFA M = (Σ, ρ, Q, q 0 , F, δ) where ρ is the identity relation, and p ∈ δ(q, a a iff f (q, a) = p. Again, it is easy to see that x ∈ L u,updel (M, 0) iff x ∈ L(A). Then, RE G ⊆ AW K updel (0), and the lemma holds.

A Sufficient Condition for Bounded Delays in WKFA
We have seen how delays during computation, regardless of whether they occur on the upper or lower strand, can be variable. That is, given a delay value d, the computation can produce that delay incrementally (i.e., starting with a delay less than d and then increasing it to the required value). In this section, we will propose a way to produce the required delay instantaneously. Informally, the approach we are going to propose requires producing the desired delay in the first movement of the computation and, subsequently, maintaining it until a point in time where it is completely reduced. This, without being a normal form, can be interesting when dealing with WKFA inductive inference as proposed in [12].
We say that M accepts x in the upper strand with an initial delay k iff M applies the following transition sequences to accept x = x 1 x 2 · · · x l . Remember that we can use the identity relation without loss of generality.

3.
Final movement ∀m ≥ 0 such that l − m + 1 < k Therefore, M accepts x in the upper strand by first processing k symbols of the upper strand, and then processing alternately k symbols of the lower and upper strand. When there are less than k symbols left in the upper strand, all the remaining symbols are processed. Figure 2 shows graphically how the stickers are organized as they are processed with initial delays. Observe that the movements of the automata can adapted to be simple: The movement q ∈ δ(p, x y ), with |x| = |y| = k can be replaced by the following two movements q ∈ δ(p, λ y ) and q ∈ δ(q , x λ ). Hence, the scheme to process the stickers in a simple WKFA with initial delays is shown in Figure 3. Observe that according to the previous figure, in the WKFA the delay always equals to k given that first it is produced in the upper strand and then the lower strand is completed. In the following we will work with simple WKFA with only initial delays.

Languages Accepted by WKFA with Only Initial Delays
Let M be a simple WKFA with only initial delay movements, and let d be the delay at every computation step in M. If a sticker system γ is built from M as established in Theorem 1, then L n (γ) ∈ ASL(b). This is due to the fact that, according to the definition of a WKFA with an initial delay d, the length of the delay that may appear in any of the two strands when processing a sequence cannot be longer than d. Therefore, in the sticker system, during its construction, the overhang end in the right side cannot be longer than d, and, in the left side, longer than one.

Example 2.
Let us take the simple WKFA with initial delays defined by the transition diagram of Figure 4. Observe that the bounded delay in this case equals to 2. Observe that the bounded delay in this case equals to 2 and L u (M) = L u,updel (M, 2). From the previous WKFA, we build a sticker system γ, such that A and D are defined as follows: In this case, L u (M) = L n (γ) which can be defined by the regular expression: given that, once a molecule is completed in γ, after having reached the final state q 4 , there is no possibility of lengthening the molecule and completing a larger one, as the upper and lower strand would not be complementary.
Thus, the definition of a regular expression for derivative control, as in the proof of Theorem 1, is completely unnecessary in the case of initial delays. This aspect will be formalized by the following theorem. It is known that ASL(b) ⊆ LIN (Theorem 4.3 in [10]). That theorem is proven by building a linear grammar G = (N, T, S, P) from a sticker system γ = (V, ρ, A, D). In the grammar G, the molecules are obtained by a derivation from the ends of the molecule until reaching its central part in a reverse process to the one that happens in the sticker system. We can take advantage of this result together with the results we have proven before to characterize the language accepted by the WKFA with only initial delays. It is established by the following result. , S, P) be a linear grammar built from γ as described in [10]. It can be observed that, for every production in G in the form A → αBβ the sticker α is defined from the symbols in Q ∪ {#, q f } while the sticker β is defined from the symbols in V. Now, a left linear grammar G = (N, V, S, P ) can be constructed from G. The set of productions in P is defined as follows: where the morphism h : V → V is defined as h(a) = a if a ∈ V and h(a) = λ if a ∈ Q ∪ {#, q f }. The language generated by the grammar G is the result of removing the symbols in Q ∪ {#, q f } from the strings generated by G in the upper strand. These strings are the strings in L u (M). Since G is a left linear grammar and L(G ) = L u (M) then L u (M) ∈ RE G.

Conclusions
In this paper we addressed some aspects of the languages defined by WKFA where the delay between the two strands in obtaining the upper strings is bounded. This aspect is interesting because, on the one hand, it allows us to relate two classical DNA computational models such as Watson-Crick finite automata and sticker systems. On the other hand, it allows us to establish a result that limits the computational capacity of the WKFA model, i.e., by introducing a bounded delay, we are reducing the class of languages that the model is able to accept.
Another aspect we want to highlight is that the study of language families characterized by bounded delays is a prior step to establishing efficient methods for their inductive inference, that is the application of machine learning techniques to obtain DNA computational models. In this sense, some previous results allow us to approach this aspect in a reasonable way [12]. The imposition of some kind of constraints and normal forms for some classes of languages accepted by WKFA is important in establishing the machine learning algorithms to infer WKFA from positive examples. It is well known in formal language theory that from linear languages there exist inherently ambiguous languages [17]. Since WKFA can be represented by intersections of linear and even linear languages [18], then the inherent ambiguity of some languages accepted by this model is unavoidable. This leads to multiple options, when specifying a set of transitions during the elaboration of a model by using machine learning techniques. The imposition of features such as initial delays makes the ambiguity disappear given that a predefined form is imposed on the transitions and makes it feasible to automatically learn some subclasses of this model.
There are additional aspects of this work that deserve to be explored in future research. For example, one aspect to consider is whether delays induce a hierarchy of languages in a gradual way, i.e., whether any language accepted by a WKFA with a bounded delay d can be accepted by a WKFA with a smaller delay. In the case of a negative answer to this question, we could obtain an infinite hierarchy, where the smallest class would be defined by the class of regular languages (null delays) which, in turn, would contain the languages that can be accepted only with initial delays.