Efﬁcient Construction of the Equation Automaton

: This paper describes a fast algorithm for constructing directly the equation automaton from the well-known Thompson automaton associated with a regular expression. Allauzen and Mohri have presented a uniﬁed construction of small automata and gave a construction of the equation automaton with time and space complexity in O ( m log m + m 2 ) , where m denotes the number of Thompson automaton transitions. It is based on two classical automata operations, namely epsilon-removal and Hopcroft’s algorithm for deterministic Finite Automata (DFA) minimization. Using the notion of c-continuation, Ziadi et al. presented a fast computation of the equation automaton in O ( m 2 ) time complexity. In this paper, we design an output-sensitive algorithm combining advantages of the previous algorithms and show that its computational complexity can be reduced to O ( m × | Q ≡ e | ) , where | Q ≡ e | denotes the number of states of the equation automaton, by an epsilon-removal and Bubenzer minimization algorithm of an Acyclic Deterministic Finite Automata (ADFA).


Introduction
The equation automaton (also known as derived terms automaton or Antimirov automaton) was first introduced in Mirkin's paper [1]. In [2], Antimirov introduced the notion of partial derivative of a regular expression, that lead to another definition and construction of the equation automaton. It is an -free NFA which has in general smaller number of states and transitions than the well-known position automaton [3][4][5]. The complexity of the original construction algorithm of [2], which is based on the computation of the set of partial derivatives of the expression, is in O(n 5 ), where n denotes the size of the regular expression. In 2001, Champarnaud and Ziadi [6] introduced the notion of canonical derivatives and constructed a new automaton called the c-continuation automaton. They also proved that this automaton is isomorphic to the position automaton and that the equation automaton is its quotient for some equivalence relation.
The notion of c-derivative has been introduced in [6] to derive the equation automaton from the position automaton via the c-continuation automaton. A unique regular expression over indexed and ordered letters, called c-continuation, is assigned to each state of the position automaton. The resulting automaton is called the c-continuation automaton [6]. After that, one can define the equivalence relation between two c-continuations i.e., two states of the c-continuation automaton as follows: if deleting the indices of letters from two c-continuations results in the same regular expression, they correspond to the same partial derivative. Hence, the equation automaton would be a quotient of the c-continuation automaton w.r.t. the previously defined equivalence relation. From the algorithmic point of view, this result allows the construction of the equation automaton in O(n 2 ) time and space [6,7]. Therefore, this improves the Antimirov's algorithm by a factor of O(n 3 ).
In [8], Allauzen and Mohri present simple and unified constructions of the position automata [3][4][5], follow automata [9,10], and the equation automata [2,6,7] from regular expressions. Their algorithms are based on two standard automata operations applied to the Thompson automata [11,12] called: epsilon-remove and Hopcroft's algorithm for DFA minimization [13]. The complexity of their construction for the equation automaton is in O(m log m + m 2 ), where m is the number of Thompson automaton transitions. Notice that, by construction the number of transitions of the Thompson automaton m and the size of a regular expression n are proportional. Thus, we have m = O(n).
To improve the time complexity of computing the equation automaton from a regular expression E, we design an algorithm, combining advantages of previous methods [6][7][8], with a worst-case time complexity in O(m × |Q ≡ e |), where |Q ≡ e | denotes the number of its states. Our approach is based on Bubenzer minimization of an acyclic DFA instead Hopcroft's algorithm for DFA minimization step used in Allauzen and Mohri's method. The main idea is to associate implicitly each c-continuation to a corresponding state, called position state in the Thompson automaton by a special marking of ε-transitions. As a consequence of this marking, the right language of each position state in the Thompson automaton represents implicitly its c-continuation, called pseudo-continuation. After that, we disable temporarily the cyclic ε-transition in the Thompson automaton and perform Bubenzer minimization of an acyclic DFA to compute efficiently partial derivatives equivalence relation over the set of position states. Finally, we remove indexed ε-transitions, enable the cyclic εtransition and then compute the ε-closure of states in the produced automaton from the previous step to get the equation automaton. The implementation of the proposed algorithm is available under the repository https://github.com/FaissalOuardi/Equation-automaton, (accessed on 27 May 2021).
The paper is organized as follows. Section 2 contains some basic definitions and necessary preliminaries. Section 3 summarizes theoretical results that lead to c-continuations of a regular expression, and their relations with the partial derivatives. The definition of the c-continuation automaton is recalled, as well as the way it is connected to the equation automaton. Section 4 is a recall to the algorithm due to Allauzen and Mohri. We detail then in Section 5 the algorithmic refinements leading to an O(m × |Q ≡ e |) time complexity of the efficient construction of the equation automaton where |Q ≡ e | is the number of its states.

Preliminaries
In this section, we introduce briefly the notion of finite automata. For further details on formal aspects of finite automata theory, we particularly recommend reading classical books [14,15].

Regular Expressions and Finite Automata
Let A be a non-empty finite set of letters, called an alphabet. The set of all words over A is denoted by A * . ε is the empty word. A language over A is a subset of A * .

Regular Expressions and Languages
A regular expression over the alphabet A is a term of the algebra T reg(A) defined over the set A ∪ {0, 1} with the symbols of functions * , +, ·, where * is unary and + and · are binary. Properties of the constants 0, 1 and the operators * , +, and · lead to identities on this algebra. Each regular expression denotes a language. L is the function that assigns to each regular expression the regular language it denotes. L : T reg(A) −→ reg(A * ) is defined as follows: The following identities are classically used: Let E be a regular expression. The set of letters occurring in E is denoted by A E . To specify their position in the expression, letters are subscripted following the order of reading. The resulted expression is the linearized form of E, denoted by E. For example, starting from E = (a + b) * aba + 1, one obtains the linearized version E = (a 1 + b 2 ) * a 3 b 4 a 5 + 1 of E. The subscripted letters are called positions; the set of all position in the expression E is denoted by pos(E). For the previous example, we have pos(E) = {a 1 , b 2 , a 3 , b 4 , a 5 }. If F is a subexpression of E, we denote by pos E (F) the subset of positions of E that are letters of F. We say that a regular expression is in linear form if each letter of the expression occurs only once. We denote by h the function that maps each position in pos(E) to the letter of A E that appears at this position in E.
The size of the regular expression E, denoted by | E |, is the number of nodes in its syntax tree. We call alphabetic width of E, denoted by || E ||, the number of occurrences of letters in the expression i.e., the cardinality of pos(E). The alphabetic width of the expression (a + b) * aba + 1 is equal to 5; its size is equal to 12.
Notice that the alphabetic width and the size of a regular expression are independent parameters. Therefore complexities are expressed w.r.t. both of these two parameters. However, it is usual to preprocess the input expression in order to reduce its size and to make its size proportional to its alphabetic width. So, if we consider a reduced regular expression E w.r.t. the following rules: [16].
Thus, we have in this case | E | = O(|| E ||). It is known that regular expressions can be transformed to SNF in linear time [16]. λ(E) denote the null term of E, that is By T(E) we denote the syntax tree associated with the regular expression E. A node in T(E) will be denoted by ν. We write Nodes(E) for the set of nodes of T(E). If ν ∈ Nodes(E) is a node in T(E), sym(ν), father(ν) and right(ν) denote respectively the symbol, the father and the right son of the node ν. If sym(ν) is an operator, E ν will denote the subexpression that corresponds to the subtree with the root ν.

Finite Automata and Recognizable Languages
A nondeterministic finite automaton (NFA) is a quintuple A = Q, A, q 0 , δ, F where Q is a finite set of states, A is the alphabet, q 0 ∈ Q is the initial state, F ⊆ Q is the set of final states, and δ : Q × (A ∪ {ε}) → 2 Q is the transition function. The size of an automaton A, denoted by |A|, is the number of its states. The automaton A is called deterministic (DFA) if there is only one initial state, |δ(q, ε)| = 0 and |δ(q, a)| = 1, for any q ∈ Q, for any a ∈ A. A path in A is a sequence (q i , a i , q i+1 ), i = 1, · · · , n, of consecutive transitions. Its label is the word w = a 1 a 2 · · · a n . A word w = a 1 a 2 · · · a n is recognized by the automaton A if there exists a path labeled w such that q 1 = q 0 and q n+1 ∈ F.
The language recognized by the automaton A, denoted by L(A), is the set of words it recognizes. The right language of a state q in the automaton A, denoted by → L q (A), is obtained by setting q to be the initial state, i.e., → L q (A) = {w ∈ A * | δ(q, w) ∩ F = ∅}. We say that A is acyclic if the underlying graph is acyclic. The language associated with an acyclic automaton is finite.
Let ∼ be an equivalence relation over Q. For q ∈ Q, [q] denotes the equivalence class of q w.r.t. ∼ and, for C ⊆ Q, C/ ∼ denotes the quotient set C/ ∼ = {[q]|q ∈ C}. We say that ∼ is right invariant w.r.t. A if and only if the following conditions hold:

Thompson Automaton
In [12], Thompson gave a linear time and space algorithm to convert a regular expression E to an NFA with ε-transitions, denoted by T E . The recursive steps of the construction of Thompson NFA are pictured in Figure 1.  There are some disadvantages of Thompson's NFA when it is used in practice: it has many redundant states and ε-transitions, its number of states is in O(| E |) while other constructions offer NFAs with O(|| E ||) states.
In the next section, we will present the construction of a reduced ε-free automaton, named equation automaton, sometimes called Antimirov automaton or derived terms automaton.

Equation Automaton
The equation automaton has been introduced for the first time by Mirkin in [1]. In 1996, Antimirov introduced the notion of partial derivatives and used it to define the equation automaton [2]. Champarnaud and Ziadi [6] defined the notion of canonical derivatives of a linear expression and constructed a new automaton called the c-continuation automaton. They also proved that this automaton is isomorphic to the position automaton in the sense that the two automata have identical sets of states, identical initial and final states, and transitions Theorem 6 in [6]. Using an equivalence relation over the set of states of the c-continuation automaton, they derive the equation automaton in quadratic time.
The definition of the equation automaton of a regular expression is based on that of the partial derivatives of regular expressions, which are multisets of regular expressions over A. The partial derivative of E with respect to a ∈ A is defined recursively on the structure of E as follows: The partial derivative of E with respect to the string u ∈ A * is denoted by ∂ u (E) and recursively defined by ). The cardinality of the set D(E) of all partial derivatives of a regular expression E is less than or equal to || E || + 1.

Example 2.
Let us consider the regular expression E = (a * + ba * + b * ) * . The partial derivatives of E are as follows: The computation of the transitions of the equation automaton E E are as follows: The equation automaton E E associated with E is shown in Figure 3. In the following, we recall the definition and properties of the c-continuation automaton. Next, we show how it can be bound to the equation automaton.

C-Continuation Automaton
This automaton has been introduced by Champarnaud and Ziadi [6] to efficiently compute the equation automaton. Let us recall the notion of c-derivative, c-continuation and c-continuation automaton.

Definition 1.
(c-derivative with respect to a letter). The c-derivative of a regular expression E with respect to a letter a is the regular expression d a (E) defined by: The c-derivative with respect to a word u = u 1 · · · u n is defined recursively by the rules: d ε (E) = E and d u 1 ···u n (E) = d u 2 ···u n (d u 1 (E)). [6]). Let E be a linear regular expression and a be a letter from E. Then all non-zero c-derivatives of the form d ua (E), where u is an arbitrary word, are equal.

Theorem 2. (Theorem 4 in
Theorem 2 allows us to define the c-continuation c a (E) of a in a linear expression E as the unique value of the non-zero c-derivatives d ua (E). Proposition 1. (Proposition 6 in [6]). For every letter a of a linear expression E, the c-continuation c a (E) is such that: Corollary 1. (Corollary 5 in [6]). For every letter a of a linear expression E, the c-continuation c a (E) is either 1 or a subexpression of E or a product of subexpressions.
More precisely, for a linear regular expression E, we have c a (E) = H 0 · · · H k , where H i is a subexpression of E, for all 0 ≤ i ≤ k.
We now consider a regular expression E over A. Let E be the linearized form of E over pos(E) and h be the mapping from pos(E) onto A E .
In order to simplify the writing for a regular expression E, we consider by convention that c 0 (E) = d ε (E) = E and c x (E) will denote c x (E).

Definition 2. (c-continuation automaton)
The c-continuation automaton of E, C E = Q, A E , i, δ, F , is defined by: We note that the number of states of C E is exactly || E || + 1.

Example 3. Let us consider the regular expression
) * and the c-continuations of E are as follows: The outgoing transitions from the state (0, c 0 (E)) are computed using the c-derivatives of c 0 (E) as follows: Then we get the c-continuation automaton C E in Figure 4.

Equation Automaton as a Quotient of C-Continuation Automaton
Champarnaud and Ziadi [6] have proved that the equation automaton is a quotient of the c-continuation automaton. Let us consider the equivalence relation ≡ e defined by Sometimes we write x ≡ e y ⇔ h(c x (E)) ≡ h(c y (E)).
Moreover, if two states are equivalent w.r.t. ≡ e , then they are either both final or both The equivalence class of the state (x, c x (E)) is represented by C x = h(c x (E)). Since the relation ≡ e is right-invariant, we can define the quotient automaton C E / ≡ e = Q ≡ e , A E , q 0 , δ, F as follows: Theorem 3. (Theorem 10 in [6]). Let E be a regular expression. The automaton C E / ≡ e deduced from the c-continuation automaton is isomorphic to the equation automaton E E .
We note that the number of states of E E is majorized by || E || + 1.

Example 4.
Let us consider the regular expression E = (a * + ba * + b * ) * from Example 1. There are three ≡ e -equivalence classes when applying the function h that remove indices from letters for different c-continuations of E: The c-continuation automaton C E and the quotient automaton C E / ≡ e which is isomorphic to the equation automaton E E are schematized in Figure 5:

Allauzen and Mohri's Algorithm
In [8], Allauzen and Mohri compute the equation automaton from the Thompson automaton of a regular expression E in O(m log m + m 2 ) time. Their algorithm is based on some combinations of ε-transitions removal and Hopcroft's algorithm for DFA minimization to the classical Thompson automata [13]. In the next, we briefly describe their method. Let We denote by T E the automaton over A obtained by recursively marking some of the ε-transitions of the Thompson automaton T E as follows: Allauzen and Mohri have shown that the equation automaton can be obtained using some ε-transitions marking of the Thompson automaton and then apply two classical automata operations, namely epsilon removal, denoted by the function rmeps (resp. the function rmeps for marked epsilon removal) and the Hopcroft's algorithm for DFA minimization [13], denoted by min B .
Note that after removing ε-transitions from the automaton T E , we obtain a deterministic finite automaton rmeps( T E ). After that, the Hopcroft's algorithm for DFA minimization is applied to derive the automaton min B (rmeps( T E )) such that the set of its states is in bijection with the set of partial derivatives of E. Finally, to compute transitions of the equation automaton from min B (rmeps( T E )), marked ε-transitions are removed using rmeps operation.

Efficient Conversion Algorithm
In this section, we will show that the equation automaton E E of a regular expression E can be deduced from the associated Thompson automaton in O(| E | · |Q ≡ e |) time, where |Q ≡ e | denotes the number of states of E E . Algorithm 1 summarizes the different steps of our approach.

Algorithm 1 Computation of the equation automaton.
input : The Thompson automaton T E = Q, A E , I, δ, F associated with a regular expression E. output : The equation automaton E E associated with E.

/*
Computation of states */ Compute Id(T E ): Subexpressions identification over states of T E . • Define the sub-automaton Id(T E ) by marking recursively some ε-transitions of T E according to the following rules: if E = F + G then mark the ε-transitions by ε l+ (resp. ε r+ ) from the initial state I to I T F (resp. I T G ) if E = F * then -mark the ε-transitions by ε 1 * from the initial state I (resp. the final state F T F ) to I T F (resp. F) and by ε 2 * from the initial state I to F. -temporarily disable the ε-transition from F T F to I T F .

•
Compute the function N(q) that maps each state q ∈ Q to a unique integer identifying the associated subexpression E q , if it exists.
Compute C ≡ e (Id(T E )): • Compute pseudo-continuations for all position states of Id(T E ). • Merge equivalent states having the same pseudo-continuation.
For convenience, we assume that the k states of a given finite automaton are identified by the integers 1, · · · , k.
From Corollary 1, the c-continuation c x (E) = H 0 · · · H l associated with a position x is a concatenation of distinct subexpressions H i of E, possibly reduced to a single subexpression or to 1. In the Thompson automaton T E , we can associate a position x to a particular state q, called position state and define the associated pseudo-continuation C(q) = N(H 0 ) · · · N(H l ), where N(H i ) denotes the integer that identify the initial state of the Thompson automaton T H i . So, the first step, compute Id(T E ), of our algorithm consists on computing the function N(.) such that for two subexpressions H i and H j of E, we have: ). This step can be done using a special marking of the ε-transitions of T E that makes it acyclic and deterministic and such that the right languages of its states represent the structure of the corresponding subexpressions. In the next step, Compute C ≡ e (Id(T E )), we re-mark the ε-transitions such that the resulted automaton is acyclic and deterministic and the right language of a position state in Id(T E ) represents a pseudo-continuation. After that, one can merge equivalent position states having the same right language. The final step is the computation of final states and transitions of the equation automaton using an epsilon removal operation, denoted by rmeps(.), from the resulted automaton in the previous step.
In the next, we will show that the equation automaton E E can be computed efficiently from the Thompson automaton T E using the following operations rmeps(C ≡ e (Id(T E ))).

Computation of States
In the following, We will show that the computation of the relation ≡ e over the states of the Thompson automaton can be performed in linear time w.r.t. the size of the expression using the minimization of an acyclic deterministic finite automaton. This minimization can be performed efficiently in O(| E |) time using Bubenzer's algorithm [17,18].
Before computing the equivalence classes C ≡ e over states of the Thompson automaton, we will perform a preprocessing step to identify all identical sub-expressions of E. In the next, we will show that this identification can be done in O(| E |) time.

Sub-Expressions Identification
Let Exp the set of all subexpressions of E. In this preprocessing step, we will mark each state in the Thompson automaton by a unique letter in the set {1, 2, . . . , | Exp |}.
Let us define a bijection N between the set Exp and a finite set of letters {1, 2, . . . , | Exp |}. Consequently, if E 1 and E 2 are two sub-expressions of E, then we have: Based on the parsing method, introduced in Section 6 in [19], that derive an equivalent regular expression from Thompson automaton, each subexpression H i of E is associated with an integer identifying the initial state I T H i in the Thompson automaton T E . Let q be a state in T E , we denote by E q the subexpression associated with q, if it exists. For abbreviation, N(q) represents N(E q ).
In the following, we will show that the computation of the function N over the states of T E turns into a minimization of the acyclic deterministic sub-automaton of the Thompson automaton, Id(T E ) = Q , A E , I, δ , F , defined by: where l (resp. r), denote left (resp. right), • Q = {(q, N(q))| q ∈ Q}, i.e., a state in T E is augmented by the letter N(q).

•
The transition function δ is defined over the Thompson automaton as follows: Notice that this automaton is an acyclic deterministic sub-automaton of the Thompson automaton where ε-transitions are indexed and the cyclic transitions in the case when E = F * are temporarily disabled.
To compute identical subexpressions, we define the equivalence relation ∼ over the states of Id(T E ) as follows: Thus we have: Lemma 1. Let q and q be two states in Id(T E ). We have: Obvious, by construction. Proof. Let q and q two states in Id(T E ). One has: Thus, the equivalence relation ∼ coincides with Myhill-Nerode equivalence relation [20,21] over the states of Id(T E ). Since the automaton Id(T E ) is deterministic and acyclic, its minimization using Bubenzer's algorithm [17] requires O(| E |) time and space complexity.

Example 5.
The automaton Id(T E ) obtained after performing the subexpression identification step for the regular expression E = (a * + ba * + b * ) * through T E .
As shown in Figure 6, for the states 3 and 9 we have: As a consequence, we have E 3 ≡ E 9 .
with ⊥ is an artificial node such that f (⊥) = ⊥.
Using Proposition 5, the computation of the set of states requires O(| E | 3 ) time and space complexity. This is due to the fact that the size of a c-continuation is in O(| E | 2 ). In order to reduce this complexity, we introduce a modified definition of pseudo-continuation introduced in [6] over an acyclic deterministic sub-automaton of the Thompson automaton, denoted by C ≡ e (Id(T E )). When merging ≡ e -equivalent states over Id(T E ), we get the automaton C ≡ e (Id(T E )). This step requires a linear time w.r.t the size of E using Bubenzer's algorithm [17], since the automaton Id(T E ) is acyclic and deterministic.
In the following, a state (q, N(q)) in the automaton Id(T E ) is called a position state, if there exists a ∈ A E and (q , N(q )) ∈ Q such that δ ((q , N(q )), a) = (q, N(q)). The state (0, N(0)) is also considered as a position state.
It is obvious to see that each position state is associated with a unique position in pos(E) ∪ {0} and then it's can be associated with a c-continuation. Let (q, N(q)) and (q , N(q )) two position states associated with the positions x and x in pos(E). One can extend the ≡ e relation over position states in Id(T E ) as follows: In the next, we will prove that the computation of the equivalence classes C ≡ e can be performed in a linear time w.r.t. the size of the regular expression over Id(T E ) using the notion of pseudo-continuations.
For abbreviation, a state (q, N(q)) in Id(T E ) will be denoted by q. We denote by C(q) the pseudo-continuation associated with the position state q which is an implicit representation of its c-continuation c x (E), where x is the position letter of q. We will show that the computation of the equivalence classes C ≡ e turns on the computation of pseudo-continuation C(q) over a particular ε-transitions marking of the automaton Id(T E ). Definition 3. The pseudo-continuation C(q) associated with a position state (q, N(q)) ∈ Q in Id(T E ) is recursively defined by: In order to compute efficiently the set of pseudo-continuations associated with the position states in Id(T E ), we define an acyclic deterministic sub-automaton C ≡ e (Id(T E )) = Q ≡ e , A E , I, δ , F of the Thompson automaton as follows: Q ≡ e = {(q, C(q))| (q, N(q)) ∈ Q }, i.e., a state in Id(T E ) is replaced by (q, C(q)).

•
The transition function δ is defined as follows: Let us define the equivalence relation ≈ over the position states of C ≡ e (Id(T E )) as follows: (q, C(q)) ≈ (q , C(q )) ⇔ C(q) ≡ C(q ) Thus we have: Let h be the application that maps a letter ε i to the letter i. By construction, the following proposition holds. Proposition 6. Let (q, C(q)) be a position state in C ≡ e (Id(T E )). We have: h( → L q (C ≡ e (Id(T E )))) = C(q) Example 6. Let us consider the Thompson automaton T E defined in previous examples. Figure 7 schematizes the derived automaton from Id(T E ) after pseudo continuations computation for position states.
Notice that dotted ε-transitions are temporarily disabled and dashed ones are temporarily added. The position states in the automaton Id(T E ) are {0, 5, 8, 11, 16}. According to the definition of a pseudo continuation (see Formula (7)), the pseudo-continuations associated with position states are computed over the automaton Id(T E ) as follows: On the other hand, we have: The following proposition is fundamental to prove that the equivalence relation ≡ e using the notion of c-continuation is the same when using pseudo-continuations C(q).

Proposition 7.
Let q (resp. q ) be a position state associated with a position x ∈ pos(E) (resp. x ∈ pos(E)) in C ≡ e (Id(T E )). One has: As a consequence, the following proposition holds. Proposition 8. Let q and q be two position states in C ≡ e (Id(T E )). One has: q ≡ e q ⇔ C(q) ≡ C(q ) Theorem 5. Let E be a regular expression and T E the associated Thompson automaton. The relation ≡ e can be computed over C ≡ e (Id(T E )) in O(| E |) time.
Proof. From Proposition 8, one can deduce that the computation of the equivalence relation ≡ e , turn to apply the Myhill-Nerode relation on the states of the automaton C ≡ e (Id(T E )). By definition, this last is acyclic and deterministic. Then, its minimization using Bubenzer's algorithm [17] requires O(| E |) time and space complexity.

Computation of Transitions and Final States
After merging ≡ e -equivalent states in the previous step, we obtain a reduced automaton C ≡ e (Id(T E )) having the same set of states as the equation automaton. To compute the transition function, we first enable the cyclic transitions previously disabled in the case when E = F * on C ≡ e (Id(T E )).
Let Q ≡ e be the set of states of C ≡ e (Id(T E )). Recall that epsilon removal operation is denoted by, rmeps(p) for a state p ∈ Q and rmeps(A) denotes the resulted automaton after removing marked and non-marked ε-transitions from the automaton A.
As a consequence of Lemma 5 from [8], the following Lemma yeilds.

Lemma 2.
(Lemma 5 in [8]). Let q and q be two position states in Q ≡ e associated respectively with the positions x and x in pos(E) ∪ {0}, we have: q ∈ rmeps(q ) iff h(c x (E)) ∈ ∂ a (h(c x (E))), for some a ∈ A E .
The set of destination states of the outgoing transitions from a state q ∈ Q ≡ e is then equal to {p ∈ Q ≡ e | p ∈ rmeps(q) and p is a position state} Lemma 3. Let q be a position state in Q ≡ e associated with a position x in pos(E) ∪ {0}, we have: F ∈ rmeps(q) iff λ(c x (E)) = 1.

Proposition 9.
We have E E = rmeps(C ≡ e (Id(T E ))) Example 8. (Continues). Let us consider the automaton C ≡ e (Id(T E )) of the Example 7. The final states and the transitions of the equation automaton are computed over C ≡ e (Id(T E )) using epsilon removal operation rmeps as follows: • There are two paths in C ≡ e (Id(T E )) from the state 0 to the state {5, 8, 11} labeled respectively by ε · ε · ε · ε · a and ε · ε · ε · b, then {5, 8 Since there are O(| E |) states in C ≡ e (Id(T E )) and the operation rmeps(C ≡ e (Id(T E ))) is performed on exactly |Q ≡ e | states, the following theorem holds. Theorem 6. Let E be a regular expression. The equation automaton of E can be computed in O(| E | · |Q ≡ e |).

Conclusions
In this paper, we presented a fast and sophisticated construction of the equation automaton from a regular expression over its associated Thompson automaton. The time complexity of our algorithm is at least as favorable as that of the best previously known algorithm. It is based on the minimization of acyclic deterministic finite automata and epsilon removal operations. This allowed us a construction of the equation automaton in O(| E | · |Q ≡ e |) time and space complexity where |Q ≡ e | denotes the number of transitions of the produced automaton. The implementation of the proposed algorithm is available under the following repository: https://github.com/FaissalOuardi/Equation-automaton (accessed on 27 May 2021).