Answer Set Programming for Regular Inference

We propose an approach to non-deterministic finite automaton (NFA) inductive synthesis that is based on answer set programming (ASP) solvers. To that end, we explain how an NFA and its response to input samples can be encoded as rules in a logic program. We then ask an ASP solver to find an answer set for the program, which we use to extract the automaton of the required size. We conduct a series of experiments on some benchmark sets, using the implementation of our approach. The results show that our method outperforms, in terms of CPU time, a SAT approach and other exact algorithms on all benchmarks.


Introduction
The main problem investigated in this paper is as follows. Given a finite alphabet Σ, two finite subsets S + , S − ⊆ Σ * , and an integer k > 0, find a k-state NFA A that recognizes a language L ⊆ Σ * such that S + ⊆ L and S − ⊆ Σ * − L. In other words, we are dealing with the process of learning a finite state machine based on a set of labeled strings, thus building a model reflecting the characteristics of the observations. Machine learning of automata and grammars has a wide range of applications in such fields as syntactic pattern recognition, computational biology, systems modeling, natural language acquisition, and knowledge discovery (see [1][2][3][4][5]).
It is well known that NFA or regular expression minimization is computationally hard: it is PSPACE-complete [6]. Moreover, even if we specify the regular language by a deterministic finite automaton (DFA), the problem remains PSPACE-complete [7]. Angluin [8] showed that there is no polynomial-time algorithm for finding the shortest compatible regular expression for arbitrary given data (if P = NP). Thus we conjecture that the complexity of inferring a minimal-size NFA that matches a labeled set of input strings is probably exponential.
For the deterministic case, the problem is NP-complete [9]. Besides, in contrast to the NFAs, for a given regular language there is always exactly one minimum-size DFA (i.e., there is no other non-isomorphic DFA with the same minimal number of states). Therefore, is NFA induction harder than DFA induction? To answer this, let us compare the problem search space sizes expressed by the number of automata with a fixed number of states. Let c be the size of the alphabet and k the number of automaton states. The number of pairwise non-isomorphic minimal k-state DFAs over a c-letter alphabet is of order k2 k−1 k (c−1)k . The number of NFAs such that every state is reachable from the start state is of order 2 ck 2 [10]. Thus, switching from determinism to non-determinism increases the search space enormously. However, on the other hand, it is well known that NFAs are more compact. A DFA could even be exponentially larger than a corresponding NFA for a given language.
We will also refer to a SAT encoding given in [5]. All four above-mentioned methods and a SAT encoding are thoroughly described in Subsection 4.2. To enable comparisons with other methods in the future, the Python implementation of our approach is made available via GitHub. The Python scripting language is used only for generating the appropriate AnsProlog facts and running Clingo, an ASP solver.
Another line of research concerns the induction of DFAs. The original idea of SAT encoding in this context comes from the work made by Heule and Verwer [12]. Their work, in turn, was based on the idea of transformation from DFA identification into graph coloring, which was proposed by Coste and Nicolas [13]. Zakirzyanov et al. [14] proposed BFS-based symmetry breaking predicates, instead of the original max-clique predicates, which improved the translation-to-SAT technique. The improvement was demonstrated with the experiments on randomly generated input data. The core idea is as follows. Consider a graph G, the vertices of which are the states of an initial automaton and there are edges between vertices that cannot be merged. Finding minimum-size DFA is equivalent to a graph coloring with a minimum number of colors. The graph coloring constraints, on the other hand, can be efficiently encoded into SAT according to Walsh [15].
In a more recent approach, satisfiability modulo theories (SMT) are explored. Suppose that A = (Σ, Q = {0, 1, . . . , K − 1}, s = 0, F, δ) is a target automaton and P is the set of all prefixes of S + ∪ S − . An SMT encoding proposed by Smetsers et al. [16] uses four functions: δ : Q × Σ → Q, m : P → Q, λ A : Q → {⊥, }, λ T : S + ∪ S − → {⊥, }, where {⊥, } represents logical {false, true}, and the following five constraints: m(ε) = 0, They implemented the encodings using Z3Py, the Python front-end of an efficient SMT solver Z3. This paper is organized into five sections. In Section 2, we present necessary definitions and facts originating from automata, formal languages, and declarative problem-solving. Section 3 describes our inference algorithm based on solving an AnsProlog program. Section 4 shows the experimental results of our approach and describes in detail all reference methods. Concluding comments are made in Section 5.

Preliminaries
We assume the reader to be familiar with basic regular language and automata theory, for example, from [17], so that we introduce only some notations and notions used later in the paper.

Words and Languages
An alphabet Σ is a finite, non-empty set of symbols. A word w is a finite sequence of symbols chosen from an alphabet. The length of word w is denoted by |w|. The empty word ε is the word with zero length. Let x and y be words. Then xy denotes the concatenation of x and y, that is, the word formed by making a copy of x and following it by a copy of y. As usual, Σ * denotes the set of words over Σ. A word w is called a prefix of a word u if there is a word x such that u = wx. It is a proper prefix if x = ε. A set of words taken from some Σ * , where Σ is a particular alphabet, is called a language.
A sample S is an ordered pair S = (S + , S − ) where S + , S − are finite languages with an empty intersection (i.e., having no common word). S + is called the positive part of S (examples), and S − the negative part of S (counter-examples).

Non-Deterministic Finite Automata
A non-deterministic finite automaton (NFA) is a five-tuple A = (Σ, Q, s, F, δ) where Σ is an alphabet, Q is a finite set of states, s ∈ Q is the initial state, F ⊆ Q is a set of final states, and δ is a relation from Q × Σ to Q. Members of δ are called transitions. A transition ((q, a), r) ∈ δ with q, r ∈ Q and a ∈ Σ, is usually written as r ∈ δ(q, a). Relation δ specifies the moves: the meaning of r ∈ δ(q, a) is that automaton A in the current state q reads a and can move to state r. If for given q and a there is no such r that ((q, a), r) ∈ δ, the automaton stops and we can assume it enters the rejecting state. Moving into a state that is not final is also regarded as rejecting but it may be just an intermediate state.
It is convenient to defineδ as a relation from Q × Σ * to Q by the following recursion: ((q, ya), r) ∈δ if ((q, y), p) ∈δ and ((p, a), r) ∈ δ, where a ∈ Σ, y ∈ Σ * , and requiring ((t, ε), t) ∈δ for every state t ∈ Q. The language accepted by an automaton A is then Two automata are equivalent if they accept the same language. Let A = (Σ, Q, s, F, δ) be an NFA. Then we will say that x ∈ Σ * is: (a) recognized by accepting (or accepted) if there is q ∈ F such that ((s, x), q) ∈δ, (b) recognized by rejecting if there is q ∈ Q − F such that ((s, x), q) ∈δ, and (c) rejected if it is not accepted.

Answer Set Programming
Let us shortly introduce the idea of answer set programming (ASP). The readers interested in the details of ASP, alternative definitions, and the formal specification of AnsProlog are referred to handbooks [18][19][20].
Let A be a set of atoms. A rule is of the form: where a, b i s, and c i s are atoms and k, m ≥ 0. The head of the rule, a, may be absent. The part on the right of '←' is called the body of the rule. The symbol ∼ is called default negation and, by analogy to database systems, in logic programming it refers to the absence of information. Informally, a ← . . . ∼ b means: if . . . and there is no evidence for b then a should be included into a solution. A program Π is a finite set of rules. Let R be the set of rules of the form: and A be a set of atoms occurring in R. The model of a set R of rules without negated atoms is a subset M ⊆ A which fulfills the following conditions: , the head and the body of a rule r ∈ R are simultaneously absent) this condition does not hold, so no model exists; 3.
Alternatively, if all atoms were treated as Boolean variables (i.e., presence is true, absence is false), M would be the model of an R exactly when all rules (i.e., clauses) are satisfied.
The semantics of a program is defined by an answer set as follows. The reduct Π X of a program Π relative to a set X of atoms is defined by The ⊆-smallest model of Π X is denoted by Cn(Π X ). A set X of atoms is an answer set of Π if X = Cn(Π X ).
For the sake of simplicity, AnsProlog programs are written using variables (by convention, variables start with uppercase letters). Such programs are then grounded, i.e., transformed to programs with no variables, by applying a Herbrand substitution. Note, however, that clever grounding discards rules that are redundant, i.e., that can never apply, because some atoms in their bodies have no possibility to be derived [19]. For example, the program: can be transformed to Π: which has a single answer set: Its minimal model Cn(Π X ) is just X. In other words, a set X of atoms is an answer set of a logic program Π if: (i) X is a classical model of Π and (ii) all atoms in X are justified by some rule in Π.
Recently, Answer Set Programming has emerged as a declarative problem-solving paradigm. This particular way of programming in AnsProlog is well-suited for modeling and solving problems that involve common sense reasoning. It has been fruitfully used in a range of applications.
Early ASP solvers used backtracking to find solutions. With the evolution of Boolean SAT solvers, several ASP solvers were built on top them. The approach taken by these solvers was to convert the ASP formula into SAT propositions, apply the SAT solver, and then convert the solutions back to ASP form. Newer systems, such as Clasp (which is a part of the Clingo solver, https://potassco.org/clasp/), take advantage of the conflict-driven algorithms inspired by SAT, without the complete conversion into a Boolean-logic form. These approaches improve the performance significantly, often by an order of magnitude, over earlier backtracking algorithms [21].

Proposed Encoding for the Induction of NFA
Our translation reduces NFA identification into an AnsProlog program. Suppose we are given a sample S over an alphabet Σ, and a positive integer k. We want to find a k-state NFA A = (Σ, {q 0 , q 1 , . . . , q k−1 }, q 0 , F, δ) such that every w ∈ S + is recognized by accepting and every w ∈ S − is recognized by rejecting. The parameter k can be regarded as the degree of data generalization. The smallest k, say k 0 , for which our logic program has an answer set, will give the most general automaton. As k increases, we obtain a set of nested languages, the largest for k 0 and the smallest for some k m ≥ k 0 . Usually, the running time for k > k 0 is shorter than for k 0 .
Let Pref(S) be the set of all prefixes of S + ∪ S − . The relationship between an automaton A and a sample S in terms of ASP is constrained as shown below in seven groups of rules. In rules (5)-(24) the following convention for naming variables is used: P stands for a prefix, N stands for a number (state index), I, J, and M also represent state indexes, C stands for a character (the element of alphabet), W stands for word (which is also a prefix), U represents another prefix.

1.
We have the following domain specification, i.e., our AnsProlog facts.
join(u, a, v) ← . for all u, v ∈ Pref(S) and a ∈ Σ such that ua = v.
Facts (5) and (6) define the set of states Q and the input alphabet Σ, while facts (7)- (9) describe the input sample. In particular, they define the prefixes as well as words to be recognized by accepting and rejecting, respectively. Finally, fact (10) defines the concatenation operation, which given prefix u ∈ Pref(S) and symbol a ∈ Σ produces prefix v ∈ Pref(S).

2.
The next rules ensure that in an automaton A every prefix goes to at least one state and every state is final or not.
Rules (11) and (12) describe the reachability of states q ∈ Q by prefixes p ∈ Pref(S). State q is reachable by prefix p iff the prefix can be read by following a series of transitions from state q 0 to state q (this series of transitions builds a path for prefix p). The unreachable states are described by the default negation rule not_x. Clearly, for every prefix p ∈ Pref(S) and every state q ∈ Q, either (11) or (12) holds. Here P (a prefix) and N (a number, state index) are variables, which means that during the grounding they will be substituted for, respectively, every p ∈ Pref(S) because of the atom prefix(P) in the body of the rule and for every i ∈ {0, 1, . . . , k − 1} because of the atom q(N) in the body of the rule. Notice that for every p ∈ Pref(S) we already have fact prefix(p) and for every i ∈ {0, 1, . . . , k − 1} we already have fact q(i), which are the sources of this substitution. Rules (13) and (14) declare that for every prefix p ∈ Pref(S) there has to be some reachable state q ∈ Q. These rules follow from the fact that the members of sets S + and S − have to be recognized by accepting or rejecting, respectively. In other words, for each w ∈ (S + ∪ S − ) there has to be at least one path in the inferred NFA. Finally, rules (15) and (16) ensure that each state q ∈ Q is either accepting (final) or rejecting (not final). Such rules as the pair (15) and (16) are recommended in ASP textbooks to specify that each element either is/has something or is/has not (refer for example to Chapter 4 of Chitta Baral's [18]).

3.
For encoding transitions we will use predicates delta.
Rule (17) says that if there exists a transition between a pair of states q i , q j ∈ Q, marked with a symbol c ∈ Σ then delta(I, C, J) is in the model. Otherwise, the default negation rule not_delta applies (rule (18)).

4.
Without sacrificing the generality, we can assume that q 0 is the initial state.
Rules (19) and (20) mean that only state q 0 is reachable by the empty word ε.

5.
Every counter-example has to be recognized by rejecting.
Recall that for the headless rules at least one predicate present in the body of the rule cannot be satisfied. Hence, rule (21) means that there is no final state that is reachable by any word w ∈ S − . 6.
Every example has to be recognized by accepting. In this rule we used an extension syntax of ASP-a choice construction. Here, it means that the number of final states, q n , for which ((q 0 , W), q n ) ∈δ cannot be equal to 0 for any example w. 7.
Finally, there are mutual constraints between x and delta predicates.
Rule (23) says that for some state r that is reachable by word w = uc, there exists some state q i reachable by word u and there is a transition between states q i and q m with symbol c.
Similarly, rule (24) says that if there is a word w = uc leading to some state q i ∈ Q, then the number of transitions with symbol c outgoing from a state reachable by word u cannot be zero.
Rules (11) to (24) always remain unchanged. This program has an answer set {q(0), . . ., delta(1, b, 1), final(0)}. In order to construct an associated NFA it is enough to take all final and delta predicates, which define, respectively, final states and transitions of the resultant automaton. So we have obtained an NFA depicted in Figure 1.
Additionally, in Appendix A there is a description of how answer sets are determined. In Appendix B a larger illustration is given.

Experimental Results
In this section, we describe some experiments comparing the performance of our approach (the program can be found at https://gitlab.com/wojtek3dan/asp4nfa) with the methods mentioned in the introductory section and described in more detail in Subsection 4.2. We used an ASP solver, Clingo, which can be executed sequentially or in parallel [22]. While comparing our approach with RA-PS1, RA-PS2, OA-PS1, and OA-PS2, all programs ran on an 8-core processor. ASP vs. SAT comparison was performed using a single core. For these experiments, we used a set of 40 samples (the samples can be found at https://gitlab.com/wojtek3dan/asp4nfa/-/tree/master/samples) based on randomly generated regular expressions.

Benchmarks
As far as we know, all standard benchmarks are too hard to be solved by pure exact algorithms. Thus, we generated problem instances using our own algorithm. This algorithm builds a set of words with the following parameters: size |E| of a regular expression to be generated, alphabet size |Σ|, the number |S| of words actually generated and their minimum, d min , and maximum, d max , lengths. The algorithm is arranged as follows. First, construct a random regular expression E. Next, obtain corresponding minimum-state DFA M. Then, as long as a sample S is not symmetrically structurally complete (refer to Chapter 6 of [3] for the formal definition of this concept) with respect to M, repeat the following steps: (a) using the Xeger library (https://pypi.org/project/xeger/) for generating random strings from a regular expression, get two words u and w; (b) truncate as few symbols from the end of w as possible in order to obtain a counter-examplew; if it succeeds, add u to S + andw to S − . Finally, accept S = (S + , S − ) as a valid sample if it is not too small, too large or highly imbalanced. In order to ensure that these conditions are fulfilled, the equations |S + | ≥ 8, |S − | ≥ 8, and |S + | + |S − | ≤ 1000 hold for all our samples. In generating a random word from a regex or from an automaton we encounter a problem with, respectively, star operator and self-loops. Theoretically, there are infinitely many words matched to these fragments, so we have to bound the number of repetitions. We set this parameter to four.

Compared Algorithms
As already mentioned, our algorithm was compared with a SAT-based algorithm and several exact parallel algorithms. To make the paper self-contained let us briefly describe these algorithms.
The SAT-based algorithm defines three types of binary variables, x wq , y apq , and z q , for w ∈ Pref(S), a ∈ Σ, p, q ∈ Q. Variable x wq = 1 iff state q is reachable by prefix w, otherwise x wq = 0. Variable y apq = 1 iff there exists a transition from state p to state q with symbol a, otherwise y apq = 0. Finally, z q = 1 iff state q is final, and z q = 0 otherwise. The constraints involving these variables are as follows:

1.
All examples have to be accepted, while none of the counter-examples should be, which is described by 2.
All prefixes w = a, w ∈ Pref(S), a ∈ Σ, result from the transitions outgoing from state q 0 3. For all states q ∈ Q reachable by prefixes w = va, v, w ∈ Pref(S), a ∈ Σ, there has to be some state r reachable by prefix v, and there has to be an outgoing transition from r to q with symbol a. By symmetry, if there exists a path for prefix v ending in some state r and there exists a transition from r to q with symbol a then there exists a path to state q with prefix w = va. These conditions are expressed as ∀ q,r∈Q x wq − x vr y arq ≥ 0.
Additionally, it holds that z q 0 = 1, when ε ∈ S + , z q 0 = 0, when ε ∈ S − , and z q 0 is not predefined when ε / ∈ (S + ∪ S − ). The solution to the presented problem formulation is sought by a SAT solver.

Example 2.
Let us consider Example 1 again. In the SAT-based formulation we have the following variables x aq 0 , x aq 1 , . . ., x abcq 1 , y aq 0 q 0 , y aq 0 q 1 , . . ., y cq 1 q 1 , z q 0 , and z q 1 . Constraints (25)-(29) remain unchanged. A set of assignments satisfying the constraints at hand is as follows: x aq 1 = 1, x abq 1 = 1, x cq 0 = 1, x abcq 0 = 1, y aq 0 q 1 = 1, y bq 1 q 1 = 1, y cq 0 q 0 = 1, y cq 1 q 0 = 1, z q 0 = 1. All remaining variables are zeros. The resulting NFA is shown in Figure 3. Note that even though the set of transitions in Figure 3 is smaller than in Figure 1  Identification of a k-state NFA by means of the exact algorithms RA-PS1, RA-PS2, OA-PS1, and OA-PS2 is based on the SAT formulation given before. Assuming k is fixed we only need to determine the set of final states F and the transition function δ. Let us recall that a set of final states is feasible iff the following conditions are satisfied: Clearly, an NFA without final states cannot accept any word, and if the empty word ε is in S + (resp. S − ) the initial state q 0 has to be final (resp. not final). Since every feasible set F may lead to an NFA consistent with the sample S (as the NFAs need not be unique), we distribute the different sets F among processes and try to identify the δ function by means of a backtracking algorithm. While searching for the values of y apq variables, we apply different search orders. This is so, because there is no universal ordering method assuring fast convergence to the solution. The orderings used in the analyzed algorithms are deg, mmex, and mmcex. The deg ordering is a static ordering method based on a variable degree, i.e., the number of constraints the variables are involved in. The ordering does not change as the algorithm progresses. The mmex and mmcex orderings change dynamically while the algorithm runs. They aim at satisfying first the equations related to examples, or counter-examples, respectively.
The Parallelization Scheme 1 (PS1) maximizes the number of sets F processed simultaneously. If the number of available processes is greater than the number of sets F to be analyzed, we assign multiple variable orderings (VOs) to each set. In the RA-PS1 algorithm this assignment is performed randomly, while in the OA-PS1 algorithm, the deg, mmex, and mmcex methods are ordered by their complexity and chosen in a round robin fashion.
The Parallelization Scheme 2 (PS2) maximizes the number of variable orderings applied to the same set F. This way we shorten the time needed to obtain an answer whether an NFA exists for the given set F. If the number of available processes is smaller than the product of the number of sets F and the number of variable orderings used, we need to choose the sets F to be processed first. In the RA-PS2 algorithm we choose them at random, while in the OA-RS2 algorithm, we analyze first the sets for which the size of F is smaller. Example 3. Let us consider the problem given in Example 1. Since k = 2 and ε / ∈ (S + ∪ S − ), the following sets F can be defined: F 1 = {q 0 }, F 2 = {q 1 }, and F 3 = {q 0 , q 1 }. Let us also assume that we can use the three VOs discussed before. Finally, let the number of processes p = 3 (denoted by p i , for i = 0, 1, 2). We can have the following example configurations of algorithms RA-PS1, RA-PS2, OA-PS1, OA-PS2:

1.
Algorithm RA-PS1-process p 0 gets (F 1 , VO 3 ); process p 1 gets (F 2 , VO 2 ); process p 2 gets (F 3 , VO 3 ). Each process uses a single VO to analyze one of the possible sets F i , i = 1, 2, 3. There is no guarantee that all VOs are used at least once.

2.
Algorithm RA-PS2-process p 0 gets (F 2 , VO 1 ); process p 1 gets (F 2 , VO 2 ); process p 2 gets (F 2 , VO 3 ). Each process uses a different VO to analyze just one set F at a time (chosen randomly). If there is no solution for set F 2 , we need to repeat the above assignments but this time for the set F 1 or F 3 (again chosen randomly). We repeat the above procedure until the solution is found.

3.
Algorithm OA-PS1-process p 0 gets (F 1 , VO 1 ); process p 1 gets (F 2 , VO 2 ); process p 2 gets (F 3 , VO 3 ). Each process uses a single VO to analyze one of the possible sets F i , i = 1, 2, 3, but this time the VOs are assigned according to a predefined order.

4.
Algorithm OA-PS2-process p 0 gets (F 1 , VO 1 ); process p 1 gets (F 1 , VO 2 ); process p 2 gets (F 1 , VO 3 ). Each process uses a different VO to analyze just one set F at a time, but this time we start with F 1 followed by F 2 and F 3 (unless the solution is found at some stage).
Note that in Parallelization Scheme 2, obtaining a negative answer, i.e., that an NFA does not exist for the given set F i , by means of one VO allows us to stop the execution of other VOs and move on to another set F j , i = j.

Performance Comparison
In all experiments, we used Intel (Santa Clara, California, U.S.) Xeon CPU E5-2650 v2, 2.6 GHz (8 cores, 16 threads), under Ubuntu 18.04 operating system with 190 GB RAM. The time limit (TL) was set to 1000 s. The results are listed in Table 2. In order to determine whether the observed mean difference between ASP and the remaining methods is a real CPU time decrease, we used a paired samples t test [23] pp. 1560-1565, for ASP vs. SAT, ASP vs. RA-PS1, ASP vs. RA-PS2, ASP vs. OA-PS1, and ASP vs. OA-PS2. As we can see from Table 3, p value is low in all cases, so we can conclude that our results did not occur by chance and that using our ASP encoding is likely to improve CPU time performance for prepared benchmarks.
Let us explain how the mean values were computed. All TL cells were substituted by 1000. Notice that this procedure does not violate the significance of the statistical tests, because our program completed computations within the time limit for all problems (files). Thus, determining all running times would even strengthen our hypothesis.
To make the advantage of the ASP-based approach over the exact parallel algorithms even more convincing let us analyze the largest sizes of automata analyzed by the algorithms within the time limit TL = 1000 s. The summary of obtained sizes is given in Table 4. Note that the table includes only the problems for which TL entries exist in Table 2. The entries marked with * denote executions in which the algorithms started running for the given k but were terminated due to the time limit, without producing the final NFA.   This operation can be repeated and we define: X P 1 = X P and X P i = (X P i−1 ) P .
It is easy to see that Cn(P) = i≥1 ∅ P i . Because for a certain i the equation X P i = X P i+1 holds, determining Cn(P) is straightforward and fast.
Consider any normal program Π. We recall from subsection 2.3 that a set X ⊆ A is an answer set of Π if X = Cn(Π X ) (please do not confuse the reduct with program's acting on a set of atoms). Take two sets, L and U, such that L ⊆ X ⊆ U for an answer set X of Π. Observe that: (i) X ⊆ Cn(Π L ), and (ii) Cn(Π U ) ⊆ X. Thus we get: The last property is a recipe for expanding the lower bound L and cutting down the upper bound U. The procedure in which we replace L by L ∪ Cn(Π U ) and then U by U ∩ Cn(Π L ) as long as L or U are changed, will be called narrowing. At some point we get L = U = X. When we start from L = ∅, U = A, then there are also two more possibilities: L ⊆ U (there is no answer set), and L ⊂ U. In the latter case we can take any a ∈ U − L and check out two paths: a should be included into L or a should be excluded from U. This leads to Algorithm A1 [19]: Which outputs all answer sets of a program Π provided that it had been invoked with SOLVE(Π, ∅, A). The pessimistic time complexity of this algorithm can be assessed by the recurrence relation T(n) = 2T(n − 1) + n 2 , where n = |U| − |L|, which gives us the exponential complexity T(n) = O(2 n ).