Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data

: Logical Analysis of Data is a procedure aimed at identifying relevant features in data sets with both positive and negative samples. The goal is to build Boolean formulas, represented by strings over {0,1,-} called patterns , which can be used to classify new samples as positive or negative. Since a data set can be explained in alternative ways, many computational problems arise related to the choice of a particular set of patterns. In this paper we study the computational complexity of several of these pattern problems (showing that they are, in general, computationally hard) and we propose some integer programming models that appear to be effective. We describe an ILP model for ﬁnding the minimum-size set of patterns explaining a given set of samples and another one for the problem of determining whether two sets of patterns are equivalent, i.e., they explain exactly the same samples. We base our ﬁrst model on a polynomial procedure that computes all patterns compatible with a given set of samples. Computational experiments substantiate the effectiveness of our models on fairly large instances. Finally, we conjecture that the existence of an effective ILP model for ﬁnding a minimum-size set of patterns equivalent to a given set of patterns is unlikely, due to the problem being NP-hard and co-NP-hard at the same time.


Introduction
One of the main consequences of the constant progress of technology together with the massive use of computers in many aspects of our lives has been the creation of large repositories of data storing information of all sorts. A major problem related to these huge data sets is the one of discovering relevant patterns that separate the noise from important information and of deriving rules for clustering the data into classes sharing essential common features. To this aim, the fields of study known as data mining [1,2] and feature selection [3][4][5] have recently emerged among the most relevant applications of modern computer science.
In this paper we focus on some mathematical issues that arise from data mining problems. A very common situation for data mining problems is to represent the starting information by a two-dimensional array, in which the rows correspond to samples (or individuals) while the columns correspond to their characteristics (also called features).
If the features are Boolean, one of the tools that can be used to extract interesting information is the so-called Logical Analysis of Data (LAD [6][7][8]). Consider for instance a data set consisting of a binary matrix of m rows and n columns, in which some rows are labeled as positive while the remaining rows are labeled as negative (for instance, in the case of a molecular biology experiment using a device called "microarray" which measures the level of gene expression in cells, the values 0 and 1 would be related to the level being, respectively, "normal" or "abnormal" [9][10][11]).
The Logical Analysis of Data has the objective of discovering a set of simple Boolean formulas (or "rules") that can be used to classify new binary vectors (b 1 , . . . , b n ). Each rule describes what the value of some bits must be for a vector to be classified as positive or negative. For instance, a "positive rule" could be meaning that any vector with a 1 in the 2nd component, and a 0 in the 5th and 9th component is classified as positive. Similarly, there can be some "negative rules" which specify which vectors should be classified as negative.
A rule such as the above can be conveniently represented by a pattern, which is a string over the alphabet {0, 1, -}. The characters 0 and 1 in the pattern specify which positions must be matched exactly by a binary vector to satisfy the rule, while the characteris a wildcard that can be indifferently matched by either 0 or 1. In particular, if n = 10 the pattern corresponding to the above rule would be -1--0---0-If r is a rule and p is the pattern corresponding to r, then a binary vector b satisfies the rule if and only if k:p k ∈{0,1} We say that the pattern p covers all vectors b for which the above holds. In view of the equivalence of rules and patterns, we can talk of positive/negative patterns in place of positive/negative rules.
The objective of LAD is to infer positive and negative patterns from the data in such a way that (i) each positive row is covered by at least one of the positive patterns, while no negative row is and (ii) each negative row is covered by at least one of the negative patterns, while no positive row is. This approach has been successfully applied to many contexts in both bioinformatics and biomedicine [8].
Since there might be many alternative sets of patterns explaining a given instance of LAD, one has to introduce a suitable criterion for choosing a specific solution. In particular, an Okkam's razor strategy would suggest seeking the simplest possible solutions, i.e., the sets with a minimum number of patterns. Finding a min-size set of patterns which cover a given set of vectors is called the Pattern Cover Minimality problem.
Other problems arising from the analysis of patterns are related to understanding whether two different sets of rules actually explain the same data set, or, in other words, the two pattern sets are equivalent. In particular we would like also to know whether a given set of rules explains all possible data, and so is in some sense "useless". On the opposite side we would like to know whether there are some data that cannot be explained by a particular set of rules.
In addition, given a set of patterns we would like to know whether there exists another smaller set of patterns that explains the same data set. This problem that we call Pattern Equivalence Minimality looks similar to Pattern Cover Minimality. The difference is that here we start from a pattern set and not from a data set. Though patterns can be expanded into strings and we might solve a Pattern Cover Minimality problem from these strings, it is obviously computationally intractable expanding the patterns. Hence we should be able to find a better set of patterns starting directly from the given pattern set.
In the following we will review the computational complexity of these pattern problems, which are, in general, quite complex [12] (see also [13] for a fixed-parameter analysis of some related pattern problems). We then give an integer linear programming (ILP) formulation for Pattern Cover Minimality and for Pattern Equivalence and we address the effectiveness of our formulations by means of extensive computational experiments. An ILP formulation for Pattern Cover Minimality is also given in Boccia et al. [14]. The formulation we propose in this paper reduces the problem to a Set Covering with a (low-degree) polynomial number of columns. Pattern Cover models for non-binary data are, e.g., a branch-and-price procedure described in [15] and heuristic procedures proposed in [16].
Formulating a solution procedure for Pattern Equivalence Minimality seems quite challenging since, as we will prove in the paper, this problem is NP-hard and co-NP-hard at the same time.
The paper is organized as follows. In Section 2 we provide precise mathematical definitions of the concept we are dealing with and the related problems. In Section 3 we investigate the computational complexity of the problems defined in the previous section. In Section 4 we investigate strings and patterns that are mutually compatible. In particular, we provide a polynomial algorithm to list all patterns that are compatible with a given set of strings (the inverse problem of listing all strings compatible with a set of patterns is necessarily exponential). In Section 5 we give ILP models both for the Pattern Cover Minimality and Pattern Equivalence problems. These procedures are tested in Section 6 devoted to the computational experiments. Some conclusions are drawn in Section 7.

Basic Definitions
A binary string (or, simply, a string) s is a sequence of symbols where each symbol can be either 0 or 1. With n-binary string we denote a binary string of length n. An n-pattern (or simply a pattern p is sequence of symbols where each symbol can take the values 0, 1 or -, i.e., We call the symbola gap. With n-pattern we denote a pattern of length n. Notice that a string is in fact a particular pattern, i.e., a pattern without gaps. A pattern p covers, or generates, a string s = (s 1 · · · s n ) if s k = p k for each k such that p k ∈ {0, 1}. The span of p is the set of all binary strings generated by the pattern p, i.e., Given a set P of patterns the span S(P) of P is the set Two sets P and Q of patterns are equivalent if S(P) = S(Q). A set of patterns P is a minimum set if |P| ≤ |Q| for each set of patterns Q equivalent to P.
We say that a pattern p is compatible with a set of strings S if S(p) ⊆ S. Similarly we say that a string s is compatible for a pattern p if s ∈ S(p). We denote by P(S) the set of all compatible patterns for S. Let p be a pattern compatible with a given set S of strings. If there is no compatible pattern p such that S(p) ⊂ S(p ), we say that p is maximal (for S). We denote by P * (S) the subset of maximal patterns in P(S). Given a set S of strings and a set P of patterns we say that P is compatible for S if each p ∈ P is compatible for S and so P ⊆ P(S) and S(P) ⊆ S.
Moreover, a set of patterns P is called a cover of S if S(P) = S. Notice that all covers of S are equivalent to each other and S, viewed as a set of patterns, is equivalent to each of its covers.
We note that S is a set and so it does not contain duplicate strings. We assume this to be true also when we represent a set of m strings of length n as an m × n array of zeros and ones.

Computational Complexity Results
The previous definitions lead to the following decision problems [12]: 1.
PATTERN COVER: given a set S of strings and a set P of patterns, is P a cover of S, i.e., S = S(P)? 2.
PATTERN COVER MINIMALITY: given a set S of strings and a constant K, does there exist a cover P of S such that |P| ≤ K? 3.
PATTERN EQUIVALENCE MINIMALITY: given a set P of patterns and a constant K < |P|, does there exist an equivalent set of patterns Q such that |Q| ≤ K? 5.
PATTERN INCOMPLETENESS: given a set P of patterns, is it not complete, i.e., does there exist a string s ∈ {0, 1} n such that s / ∈ S(P)?
We first note that, given a set P of patterns and a string s ∈ {0, 1} n , determining whether s ∈ S(P) or s / ∈ S(P) is polynomial. Indeed, given a pattern p and a string s we may check in time O(n) whether s can be generated by p or not. Hence, given a set P of patterns we have to repeat the check for each p ∈ P. If the check is false for each p ∈ P we have s / ∈ S(P), otherwise we have s ∈ S(P).

Proposition 1. PATTERN COVER is polynomial.
Proof. For each s ∈ S we check whether s ∈ S(P) or not. Hence in time O(n |S| |P|) we may decide whether S ⊆ S(P) or not. In order to decide whether S(P) ⊆ S or not, for each pattern p ∈ P we count the number n(p) of strings in S compatible for p. Let k be the number of gaps in p. Then p is compatible for S, i.e., S(p) ⊆ S, if and only if n(p) = 2 k . Computing n(p) can be done in time O(n |S|). Overall, checking whether S(P) ⊆ S takes O(n |S| |P|) time.

Proposition 2. PATTERN COVER MINIMALITY is NP-complete.
Proof. We observe that PATTERN COVER MINIMALITY is basically the same as MINI-MUM DISJUNCTIVE NORMAL FORM (see [17] p. 261), which we repeat here for the sake of completeness: Given a set U = {u 1 , u 2 , . . . , u n } of variables, a set A ⊆ {T, F} n of "truth assigments", and an integer K > 0, does there exist a disjunctive normal form expression E over U, having no more than K disjuncts, which is true for precisely the assignments in A and no others?
We show the equivalence of the two problems by the following map which builds a PATTERN COVER MINIMALITY instance. Each element a ∈ A becomes an input binary string (with 0 representing false, and 1 representing true), while each disjunct d is mapped into a pattern p such that p i = 1 if u i appears in d, p i = 0 if ¬u i appears in d and a p i = -otherwise.

Proposition 3. PATTERN INCOMPLETENESS is NP-complete.
Proof. We reduce SAT to PATTERN INCOMPLETENESS. Given a SAT instance with n literals and m clauses we derive a set P of m patterns p k , k = 1, . . . , m, (each pattern associated to each clause), as follows: for each variable i and each clause k, if the literal x i is present in the clause, we set p k i = 0, if the literal ¬x i is present in the clause, we set p k i = 1, and if neither x i nor ¬x i are present in the clause, we set p k i = -(note that the pattern values p k i ∈ {0, 1} are set opposite to the truth values of the literals x i ). Assume SAT is satisfiable and let x be a satisfying truth assignment. Define a string s as s i = 1 if x i = TRUE and s i = 0 if x i = FALSE. By assumption, at least one literal of each clause must be true, and so for each p k ∈ P at least one of the symbols s i corresponding to 0, 1 positions of p k must be different from p k i , due to the particular construction of p k .
It follows that s cannot be in S(p k ) for all k and so s cannot be in S(P). In a similar way, given a string s not in S(P) we can reverse the previous reasoning and obtain a satisfying truth assignment for the SAT instance. To see that PATTERN INCOMPLETENESS is also in NP it suffices to observe that verifying that a string s / ∈ S(P) does not belong to S(P) takes polynomial time, as previously described.
Since PATTERN INCOMPLETENESS and PATTERN COMPLETENESS are complements of each other, we have: Proof. We transform PATTERN COMPLETENESS into PATTERN EQUIVALENCE. Given a set P of patterns, instance of PATTERN COMPLETENESS, the corresponding instance of PATTERN EQUIVALENCE consists of the set P plus a set Q containing only the pattern (--· · · -) (which generates {0, 1} n ). For a no-instance, there exists a string s ∈ S(P) and s / ∈ S(Q), or vice versa, and this s is a short certificate.

Proposition 5. PATTERN EQUIVALENCE MINIMALITY is co-NP-hard.
Proof. We describe a transformation from PATTERN COMPLETENESS. Given an instance of PATTERN COMPLETENESS we define a corresponding instance of PATTERN EQUIVA-LENCE MINIMALITY by choosing K = 1. Without loss of generality, we may assume that for each i, the values p i 's across all p ∈ P are not all identical (since, otherwise, we may discard each position where all symbols are equal and reduce the instance to an equivalent one). At this point, the only pattern that can be equivalent to P is (--. . . -).
Since PATTERN COVER MINIMALITY is a particular case of PATTERN EQUIVA-LENCE MINIMALITY we have by Proposition 2: By Propositions 5 and 6, PATTERN EQUIVALENCE MINIMALITY is both NP-hard and co-NP-hard. To date it is not known whether the classes of NP-complete problems and co-NP-complete problems coincide or are disjoint. The widely believed conjecture is that they are disjoint. Following this conjecture we conclude that it is unlikely that PATTERN EQUIVALENCE MINIMALITY is in NP or in co-NP, and so we expect its complexity to be beyond the classes NP and co-NP.
It is obvious that, given a set P of patterns, generating S(P) takes exponential time in general for the mere fact that |S(P)| can be of exponential size. It is perhaps surprising that the reverse, i.e., given a set of strings S, generating P(S) is polynomial. Indeed it turns out that P(S) is of polynomial size and also the algorithm that generates P(S) is polynomial. We devote the next section to this issue.

Compatible Patterns
We describe a procedure to compute P(S), that is the set of all compatible patterns for a set S of strings. The analysis of this procedure shows that the number of compatible patterns is polynomial (≤ O(|S| log 2 3 )).
We define a recursion that produces a set P (S) of patterns and we will show that P (S) = P(S). We assume there are no duplicates in S. The recursive calls create string sets that satisfy this property. The length of each string in a generic set R of strings (clearly all of equal length) is denoted by n(R). For a generic set R and 1 ≤ c ≤ n(R), S(R, c) is the set of strings of length (n − c + 1) obtained from R by taking, for each s ∈ R, only the elements s c , s c+1 , . . . , s n(R) . Furthermore, for s a string and X = {x 1 , . . . , x k } a set of strings, we denote by s • X = {sx 1 , . . . , sx k } the set obtained by appending s as a prefix to all strings in X.
The recursive algorithm to compute P (S) consists of: • base case: if |S| = 1 then return(S) (the entire S, seen as a single string); • recursion: given an input set R of strings let c ≤ n(R) be the first index such that there are two strings in R whose c-th elements are different (if there is not such an index all strings in R would be identical, contradicting the hypothesis of no duplicates). Hence all prefixes s 1 , . . . , s c−1 are equal for each s ∈ R. Lets be this common prefix. Let P 0 , P 1 , P * be defined as follows: 1.

3.
Let R * be the set of all strings s ∈ R 0 for which there exists s ∈ R 1 such that s i = s i for all i > c (note that s and s differ only at the c-th element and that the strings s and s (if any) go in pairs due to the hypothesis of no duplicates) and let S * := S(R * , c + 1) (recursive call).
Theorem 1. Let S be a string set and let P (S) be the set produced by the recursive algorithm on S. Then P(S) = P (S).
Proof. We use induction on |S|. Base case: If |S| = 1 the assert is clearly true. Inductive step: Assume |S| > 1 and the theorem holds for all sets with |S| − 1 strings. With the notation of the algorithm, Let p ∈ P(S). Since s (possibly null) is a prefix of each string in S, p must start with s as well. Assume p = sxq with x ∈ {0, 1, -}. If x = 0 then q is a pattern compatible for S 0 and, by induction, q ∈ P (S 0 ). Therefore, p ∈ s • (0 • P (S 0 )), so that p ∈ P (S). A similar argument shows that also if x = 1 it is p ∈ P (S). Now, if x = -, then for each string a generated by q, the pattern p generates both s0a and s1a. Therefore, a had to be a string both in S 0 and in S 1 . This means that q had to be a pattern compatible for S * . Since, by induction, all such patterns are in P * then p ∈ s • (-• P * ) so that p ∈ P (A). Now, assume p ∈ P (S). Let p = sxq, with x ∈ {0, 1, -}. If x ∈ {0, 1}, let sxy be a string generated by p. Then y ∈ R 0 ∪ R 1 and therefore sxy ∈ R, so p ∈ P(S). If x = -let s0y be a string generated by p. Then also s1y is generated by p. Then y ∈ R 0 ∩ R 1 so that s0y, s1y ∈ R and p ∈ P(S).
Proof. Let T(m) be an upper bound to the cardinality of P(S) for a set S with m = |S|. From the previous theorem and from the algorithm recursive calls we see that This the sequence A006046 as listed in the On-line Encyclopedia of Integer Sequences (OEIS) and our derivation seems to add a new meaning to that sequence. Note from (2) that in particular for m = 2 k we have T(m) = 3 k = m log 2 3 . Clearly, if S consists of all strings of length k, so |S| = 2 k , then |P(S)| = 3 k . So the bound T(m) for |P(S)| is strict in this particular case. In fact, we can prove that the bound is strict for every m and not just for m = 2 k . Consider the following Procedure 1 that generates a set S of T(m) strings of size m.

Procedure 1 f [m]
begin if m = 1 then return(1); else begin It is easy to show that |P(S)| = T(m) for the sets produced by the above procedure, and so the bound T(m) is strict for all m. However, in most cases, as for covering problems, we may be interested only in maximal patterns P * (S) and we may wonder how the number of maximal patterns grows with m.
In the particular case of the strings produced by the above procedure, which corresponds to the worst case in terms of generic patterns, the number of maximal patterns grows very slowly, indeed as log m.
In the following table we report the value m, |P(S)| = T(m), and the number |P * (S)| of maximal patterns for the above set of strings Although it may happen that |P * (S)| > m in general, we may show that a cover can always be obtained by less than m patterns if for each string there is a compatible pattern with at least one gap that covers it. Let us call a one-pattern a pattern with exactly one gap. Consider the set of all compatible one-patterns for a given set of m strings of n symbols. By assumption, each string is covered by a one-pattern (if it is covered by a pattern with gaps it is also covered by a one-pattern). Build a graph G = (V, E) with V the set of strings and E the set of string pairs spanned by a one-pattern. This graph is bipartite because we may partition the strings into "even" and "odd" strings according to the number of ones in the string. A one-pattern necessarily spans an even string and an odd one. Moreover, there are no isolated vertices by assumption. A cover consisting of one-patterns corresponds to an edge cover of G. The cardinality of an edge cover is given by m − |M| with M the cardinality of a maximum matching. Since |M| ≥ 1 we need at most m − 1 one-patterns to cover all strings. For the above example these are two alternative minimum covers with 6 < 12 patterns.
(00-01, 010-0, 1011-, 1-010, 1-101, -0000), (00-01, 0-000, 100-0, 1011-, 1-101, -1010) Therefore, if more than m patterns are needed to cover a set of m strings, this means that there are some special strings that, loosely speaking, cannot be explained by some rule and require a particular pattern that coincides with the string. Furthermore, the m − 1 bound is obtained by considering only one-patterns. If, as we presume, the data sets are explained by more interesting patterns with many gaps, the size of a minimum cover can be significantly less than m.

ILP Models
In this section we provide ILP models for two of the problems defined in Section 3, namely PATTERN COVER MINIMALITY and PATTERN EQUIVALENCE. The first problem is NP-complete and the second one is co-NP-complete. So it is not inappropriate to use ILP models for their solution. On the contrary, we believe that PATTERN EQUIVALENCE MINIMALITY cannot be expressed as an ILP model. We have already stressed the fact that this problem is both NP-hard and co-NP-hard and we have also observed that, given the current state of the art in computational complexity, we believe that it does not belong to NP or co-NP. Since ILP problems belong to these classes, we have strong reasons to doubt about the possibility of solving PATTERN EQUIVALENCE MINIMALITY by ILP models.

ILP for Pattern Cover Minimality
We approach the problem of finding a pattern set P of minimum cardinality that spans a given set of strings S as a 01LP set cover problem, in which each row is associated to each string of the string set, each column is associated to each compatible pattern for S, and the entry a ij of the 01 LP matrix is 1 if and only if the pattern j covers the string i. In view of Theorem 2 the matrix has a polynomial number of columns and therefore it can be explicitly written. We note that it is not strictly necessary to generate the full matrix. We may use a column generation approach by adapting the algorithm that generates all pattern to the pricing problem given dual variables associated to the strings. However, in our computational experiments we have seen that generating the full matrix and then solving the problem outperforms the column generation approach, which requires running the recursive algorithm for each column generation, while only one run is necessary for generating the full matrix.

ILP for Pattern Equivalence
We assume that two sets P and Q of patterns are given. We introduce the following models We have the following result: Proof. We reiterate that ⊂ means strict inclusion. It is sufficient to prove that S(P) ⊆ S(Q) if and only if v > 0. If y p = 1 then x is generated by p ∈ P. The constraint ∑ p∈P y p ≥ 1 implies that x is generated by at least one pattern in P. Hence feasible x are in S(P). Consider now any x ∈ {0, 1} n . If x is generated by q ∈ Q then z q = 1, while if x is not generated by q ∈ Q then z q = 0 is feasible (along with possible integer values z q ≥ 1). The objective function forces z q to be zero in this case. Therefore, v = 0 if and only if x ∈ S(P) and x / ∈ S(Q). If v > 0, for any pattern x ∈ S(P) we have that x ∈ S(Q), i.e., S(P) ⊆ S(Q).
Note that, if S(P) ⊆ S(Q), i.e., when v = 0, the model (3) yields also a string x in S(P) but not in S(Q), whereas if S(P) ⊆ S(Q), i.e., when v > 0, model (3) yields also a string x in both S(P) and S(Q). Similarly if S(Q) ⊆ S(P), i.e., when w = 0, model (4) yields also a string x in S(Q) but not in S(P), whereas if S(Q) ⊆ S(P), i.e., when w > 0, model (4) yields also a string x in both S(Q) and S(P).
We may further distinguish the case w = 0, v = 0, via the following model The following proposition follows easily from Proposition 7. When S(P) and S(Q) are not disjoint, the model yields a string x shared by both sets. If we consider model (4) and take Q = {(--· · · -)} we are actually solving the problem FULL PATTERN COVERAGE together with its complement PARTIAL PATTERN COVERAGE. Hence (4) becomes and we may conclude, as a Corollary of Proposition 7, that As a simple example of the previous results suppose we are given the two following sets of patterns P and Q.
We run in sequence (3) and (4) and obtain v = 0 and w > 0, that implies S(Q) ⊂ S(P), according to Proposition 7. In this case there is no need of running (5). As a byproduct we obtain from (3) and (4) respectively the strings One can easily check that x 1 ∈ S(P) and x 1 / ∈ S(Q) and also that x 2 ∈ S(Q) ⊂ S(P). Moreover, if we run (6) for P we obtain u > 0, that implies S(P) = S((----)) and also the string (1110) as a certificate that S(P) does not contain all strings.
If we are given the two sets of patterns  (3) and (4) we obtain v = 0 and w = 0. Hence we have to run also (5) and obtainŵ = 0. This means that S(P) and S(Q) are not disjoint. We may exhibit also the string x 3 ∈ S(P) ∩ S(Q) that belongs to their intersection. Moreover, from the previous (3) and (4) we also have the two strings x 1 ∈ S(P), x 1 / ∈ S(Q) and x 2 ∈ S(Q), x 2 / ∈ S(P). As a final output we may run (6) for P ∪ Q and obtain u = 0 and x 4 / ∈ S(P) ∪ S(Q) :

Computational Experiments
We have carried out computational experiments for PATTERN COVER MINIMALITY and PATTERN EQUIVALENCE. The problem PATTERN COVER is polynomial and we felt no need to perform computational experiments for this problem. On the opposite side, problem PATTERN EQUIVALENCE MINIMALITY seems to be intractable and we have not even devised ideas of how to solve it. Problems PATTERN COMPLETENESS and PATTERN INCOMPLETENESS are particular cases of PATTERN EQUIVALENCE.
Our tests were run on an Intel Core i5 machine 2.3 GHz with 8 GB Ram. The program was implemented in C++ and we used Cplex 12.4 as the ILP solver.

Pattern Cover Minimality
We approach the problem of finding a pattern set P of minimum cardinality that spans a given set S of strings as a 01LP set cover problem, in which each row is associated to each string of the string set, each column is associated to each compatible pattern for S, and the entry a ij of the 01 LP matrix is 1 if and only if the pattern j covers the string i. In view of Theorem 2 the matrix has a polynomial number of columns and therefore it can be explicitly written. We note that it is not strictly necessary to generate the full matrix. We may use a column generation approach by adapting the algorithm that generates all patterns to the pricing problem given dual variables associated to the strings. However, we have seen that generating the full matrix and then solving the problem outperforms the column generation approach, which requires running the recursive algorithm for each column generation, while only one run is necessary for generating the full matrix.
We fix the size of a string to n = 15. Each string is randomly generated by independently setting each bit to 1 with probability p (and to 0 with probability 1 − p). A random instance consists of a set S of m randomly generated strings without duplicate strings. We consider the following values: p ∈ {0.1, 0.25, 0.5} and m ∈ {100, 1000, 5000, 10, 000}. For each combination of values of p and m we generate ten instances.
The strings generated with a value of p close to 0 (or equivalently close to 1) tend to be similar whereas they are much less similar for p = 0.5. Similar instances are expected to be covered with a few patterns with many gaps, whereas non-similar instances are expected to be covered with many patterns with few gaps.
By the recursive procedure described in Section 4 we compute for each S all compatible patterns P(S) and solve with cplex the corresponding set cover problem.
The computational results are reported in Table 1. For each combination of p and m we report for each one of the ten instances the resulting number of compatible patterns (|P(S)|), the optimal value of the minimal cover problem (opt), the total cpu time in seconds consisting of the pattern generation procedure plus the cplex run (time), and the number of nodes (root excluded) of the branch-and-bound process (#nodes). A value of #nodes equal to zero means that the solution of the LP relaxation was already integer. As can be seen from Table 1, all instances are solved at the branch-and-bound root node except the case p = 0.5 and m = 10,000.

Pattern Equivalence
One of the main difficulties for testing the models for PATTERN EQUIVALENCE is creating sensible instances which show that the ILP model is indeed effective.
In fact, a major objection that one might have versus the use of ILP is that-when the maximum number of gaps in the patterns is not "large enough"-a simple enumerative approach might prove quite effective, and much better than ILP, even if there are a lot of patterns and n is quite big. Assume, for example, to compare two sets of patterns of about 1000 patterns each with n = 100, and each pattern has at most ten "-" in it. Ten gaps can be expanded in 1024 ways, and so each set of patterns yields at most about 1,000,000 strings, which most computers can generate in a second. Then, we just need to check whether these two sets of strings have the same size (if not, we stop) and, if they do, we compare each element of the first to each one of the second and stop as soon as one element of the first is not in the second (perhaps by first sorting the two sets and then scanning the sorted lists). Some data structures might be more effective than others for these operations, but, bottom line, it is a very fast process that ILP has a hard time beating.
Therefore, we want to show that ILP is the way to go when the naïve approach cannot work, namely, when the patterns have so many gaps in them that a complete expansion (which exponentially increases the data size) is out of question. This poses the problem of how to create non-trivial, interesting instances of equivalent pattern sets which have a large number of gaps.

Diagonal Instances
A simple way of creating instances with equivalent pattern sets is as follows. For every n, we consider two equivalent sets of patterns, which generate all strings with the exception of the string 11 · · · 11. We call these diagonal instances.
The first set has n patterns, with a maximum number of gaps n − 1, and is the following (exemplified for n = 6):  The second set has 2n − 1 patterns, with a maximum number of gaps n − 2, and is We perform a sequence of tests to compare the ILP approach with the complete enumeration algorithm, for increasing values of n. These diagonal instances turn out to be very easy for the ILP model. They are all solved in less than 0.1 s, for n ≤ 30 as can be seen from Table 2. The enumerative approach, however, becomes very soon impractical. For n = 28 the algorithm takes already more than 15 min, while for n = 30 the algorithm has not finished after one hour (which we set as a maximum time limit). It is interesting to notice how the ILP approach solves this instance in less than a second also for n = 100, while the enumerative approach would have to generate 2 100 − 1 strings.

Generating Equivalent Pattern Sets in General
We can adopt the following strategy to create two sets of equivalent patterns: 1.
We generate a small starting set P (e.g., |P| = 3, 4) of random patterns. Each pattern is obtained by setting each bit to "-" with some probability q, to 0 with some probability p < 1 − q, and to 1 with probability 1 − p − q. Since we are interested in patterns with many gaps, we set q to a large value (e.g., 0.8). Let g be the minimum number of gaps appearing in some pattern of P.

2.
The patterns in P are expanded in all possible ways yielding a set S of strings.

3.
We compute with the recursive procedure described in Section 4 (slightly modified) the set P of all patterns compatible with S which have at least g gaps each (this ensures that it is always possible to cover S with patterns in P).

4.
We compute two random solutions of the set covering problem. Namely, from P we select (picking patterns at random until we have a cover) two subsets P 1 , P 2 that are covers of S.

5.
P 1 , P 2 are equivalent by construction and have a fairly large number of gaps in each pattern.
In our implementation, because of memory problems, the above procedure works for n ≤ 20. Thus, to build larger instances we use a trick. Namely, we create instances starting from instances built as above and then combining them into larger and larger ones as explained below.

How to Boost Instances of Pattern Equivalence
One way to increase the number of gaps in the instances would be to take two set of equivalent patterns A and B and suffix each pattern with a list of k gaps. This, however, yields very particular, uninteresting, instances. In order to obtain more elaborate, hard pattern equivalence instances, we have developed the following scheme.
Given a set of patterns X, denote by n(X) the number of columns (i.e., string length), -m(X) be the number of rows (i.e., of patterns), -G(X) the maximum number of gaps in some pattern.
Furthermore, given sets of patterns A and B, denote by C = A × B the set of patterns Note that n(C) = n(A) + n(B), m(C) = m(A) · m(B) and G(C) = G(A) + G(B). We have Claim 1. Let A 1 , A 2 , B 1 , B 2 be sets of patterns such that A 1 is equivalent to A 2 and B 1 is equivalent to B 2 . Then A 1 × B 1 is equivalent to A 2 × B 2 .
Proof. Let C i := A i × B i . We want to show that S(C 1 ) = S(C 2 ). We show S(C 1 ) ⊂ S(C 2 ) since the other direction is symmetrical. Let (x, y) ∈ S(C 1 ). In particular, x ∈ S(A 1 ) and y ∈ S(B 1 ). Since A 1 is equivalent to A 2 also x ∈ S(A 2 ) and similarly y ∈ S(B 2 ). Hence (x, y) ∈ S(A 1 ) × S(B 2 ) = S(C 2 ).
By using this trick repeatedly, we can snowball from small instances, e.g., 4 or 5 patterns with 5 or 6 gaps each, to instances with a few hundred patterns with more than 20 gaps each.

Experiments
We have created 10 instances of size n = 30 each. Each instance is built by combining two equivalent instances of size n = 15 each, which were built with our procedure with parameters q = 0.8, p = 0.1 and |P| ∈ {3, 4}. The results are reported in Table 3. These instances turned out to be too difficult to be solved by the enumerative approach in less than half hour each. For each set of input patterns (i = 1, 2) we have: 1. m i is the number of input patterns.

2.
g i is the minimum number of gaps per pattern.

3.
G i is the maximum number of gaps per pattern.

4.
a i is the average number of gaps per pattern.
The ten instances are reported in Table 3 sorted by running times (9-th column). The 10-th and 11-th columns report the number of branch-and-bound nodes required (root node excluded) for solving models (3) and (4), respectively.
The results show that the ILP approach is effective also for instances which are large and enough and cannot be tackled by enumerative approaches.
In order to test the same ILP models in case the pattern sets are not equivalent we have randomly perturbed the previous data. The running times remained practically the same.

Conclusions
In order to use feature selection and LAD in the analysis of binary data consisting of positive and negative samples, one has to identify which computational problems might arise and how to overcome them. One of the issues that we have addressed in this paper is in fact the computational complexity of the problems, which we have shown to be, in general, very hard. As a viable approach to the effective solution of some of these problems, we have described integer linear programming formulations. In particular, we have given ILP models for the problem of determining if two sets of patterns are equivalent and for finding a min-size set of patterns which explain a given data set. A striking consequence of our complexity results is that there could be no simple ILP model for finding a minimal set of patterns explaining the same data set explained by a given pattern set. Developing some procedures for this last problem could be a line of future research.
Author Contributions: Conceptualization, methodology, software, validation, writing-original draft, review and editing: G.L. and P.S. Both authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.