1. Introduction
One of the main consequences of the constant progress of technology together with the massive use of computers in many aspects of our lives has been the creation of large repositories of data storing information of all sorts. A major problem related to these huge data sets is the one of discovering relevant patterns that separate the noise from important information and of deriving rules for clustering the data into classes sharing essential common features. To this aim, the fields of study known as 
data mining [
1,
2] and 
feature selection [
3,
4,
5] have recently emerged among the most relevant applications of modern computer science.
In this paper we focus on some mathematical issues that arise from data mining problems. A very common situation for data mining problems is to represent the starting information by a two-dimensional array, in which the rows correspond to samples (or individuals) while the columns correspond to their characteristics (also called features).
If the features are Boolean, one of the tools that can be used to extract interesting information is the so-called Logical Analysis of Data (LAD [
6,
7,
8]). Consider for instance a data set consisting of a binary matrix of 
m rows and 
n columns, in which some rows are labeled as 
positive while the remaining rows are labeled as 
negative (for instance, in the case of a molecular biology experiment using a device called “microarray” which measures the level of gene expression in cells, the values 0 and 1 would be related to the level being, respectively, “normal” or “abnormal” [
9,
10,
11]).
The Logical Analysis of Data has the objective of discovering a set of simple Boolean formulas (or “rules”) that can be used to classify new binary vectors 
. Each rule describes what the value of some bits must be for a vector to be classified as positive or negative. For instance, a “positive rule” could be
      
      meaning that any vector with a 1 in the 2nd component, and a 0 in the 5th and 9th component is classified as positive. Similarly, there can be some “negative rules” which specify which vectors should be classified as negative.
A rule such as the above can be conveniently represented by a 
pattern, which is a string over the alphabet 
-}. The characters 
 and 
 in the pattern specify which positions must be matched exactly by a binary vector to satisfy the rule, while the character - is a wildcard that can be indifferently matched by either 
 or 
. In particular, if 
 the pattern corresponding to the above rule would be
      
If 
r is a rule and 
p is the pattern corresponding to 
r, then a binary vector 
b satisfies the rule if and only if
      
We say that the pattern p covers all vectors b for which the above holds. In view of the equivalence of rules and patterns, we can talk of positive/negative patterns in place of positive/negative rules.
The objective of LAD is to infer positive and negative patterns from the data in such a way that (i) each positive row is covered by at least one of the positive patterns, while no negative row is and (ii) each negative row is covered by at least one of the negative patterns, while no positive row is. This approach has been successfully applied to many contexts in both bioinformatics and biomedicine [
8].
Since there might be many alternative sets of patterns explaining a given instance of LAD, one has to introduce a suitable criterion for choosing a specific solution. In particular, an Okkam’s razor strategy would suggest seeking the simplest possible solutions, i.e., the sets with a minimum number of patterns. Finding a min-size set of patterns which cover a given set of vectors is called the Pattern Cover Minimality problem.
Other problems arising from the analysis of patterns are related to understanding whether two different sets of rules actually explain the same data set, or, in other words, the two pattern sets are equivalent. In particular we would like also to know whether a given set of rules explains all possible data, and so is in some sense “useless”. On the opposite side we would like to know whether there are some data that cannot be explained by a particular set of rules.
In addition, given a set of patterns we would like to know whether there exists another smaller set of patterns that explains the same data set. This problem that we call Pattern Equivalence Minimality looks similar to Pattern Cover Minimality. The difference is that here we start from a pattern set and not from a data set. Though patterns can be expanded into strings and we might solve a Pattern Cover Minimality problem from these strings, it is obviously computationally intractable expanding the patterns. Hence we should be able to find a better set of patterns starting directly from the given pattern set.
In the following we will review the computational complexity of these pattern problems, which are, in general, quite complex [
12] (see also [
13] for a fixed-parameter analysis of some related pattern problems). We then give an integer linear programming (ILP) formulation for Pattern Cover Minimality and for Pattern Equivalence and we address the effectiveness of our formulations by means of extensive computational experiments. An ILP formulation for Pattern Cover Minimality is also given in Boccia et al. [
14]. The formulation we propose in this paper reduces the problem to a Set Covering with a (low-degree) polynomial number of columns. Pattern Cover models for non-binary data are, e.g., a branch-and-price procedure described in [
15] and heuristic procedures proposed in [
16].
Formulating a solution procedure for Pattern Equivalence Minimality seems quite challenging since, as we will prove in the paper, this problem is NP-hard and co-NP-hard at the same time.
The paper is organized as follows. In 
Section 2 we provide precise mathematical definitions of the concept we are dealing with and the related problems. In 
Section 3 we investigate the computational complexity of the problems defined in the previous section. In 
Section 4 we investigate strings and patterns that are mutually compatible. In particular, we provide a polynomial algorithm to list all patterns that are compatible with a given set of strings (the inverse problem of listing all strings compatible with a set of patterns is necessarily exponential). In 
Section 5 we give ILP models both for the Pattern Cover Minimality and Pattern Equivalence problems. These procedures are tested in 
Section 6 devoted to the computational experiments. Some conclusions are drawn in 
Section 7.
  2. Basic Definitions
A 
binary string (or, simply, a 
string) 
s is a sequence of symbols where each symbol can be either 
 or 
. With 
n-binary string we denote a binary string of length 
n. An 
n-pattern (or simply a 
pattern p is sequence of symbols where each symbol can take the values 
, 
 or -, i.e.,
      
We call the symbol - a 
gap. With 
n-pattern we denote a pattern of length 
n. Notice that a string is in fact a particular pattern, i.e., a pattern without gaps. A pattern 
p covers, or 
generates, a string 
 if 
 for each 
k such that 
. The 
span of 
p is the set of all binary strings generated by the pattern 
p, i.e.,
      
Given a set 
P of patterns the span 
 of 
P is the set
      
Two sets P and Q of patterns are equivalent if . A set of patterns P is a minimum set if  for each set of patterns Q equivalent to P.
We say that a pattern p is compatible with a set of strings S if . Similarly we say that a string s is compatible for a pattern p if . We denote by  the set of all compatible patterns for S. Let p be a pattern compatible with a given set S of strings. If there is no compatible pattern  such that , we say that p is maximal (for S). We denote by  the subset of maximal patterns in . Given a set S of strings and a set P of patterns we say that P is compatible for S if each  is compatible for S and so  and .
Moreover, a set of patterns P is called a cover of S if . Notice that all covers of S are equivalent to each other and S, viewed as a set of patterns, is equivalent to each of its covers.
A set P of patterns is said to be complete if , i.e., if P generates all possible n-binary strings. Clearly,  is trivially complete.
We note that S is a set and so it does not contain duplicate strings. We assume this to be true also when we represent a set of m strings of length n as an  array of zeros and ones.
  3. Computational Complexity Results
The previous definitions lead to the following decision problems [
12]:
- PATTERN COVER: given a set S of strings and a set P of patterns, is P a cover of S, i.e., ? 
- PATTERN COVER MINIMALITY: given a set S of strings and a constant K, does there exist a cover P of S such that ? 
- PATTERN EQUIVALENCE: given two sets P, Q of patterns, are they equivalent, i.e., ? 
- PATTERN EQUIVALENCE MINIMALITY: given a set P of patterns and a constant , does there exist an equivalent set of patterns Q such that ? 
- PATTERN COMPLETENESS: given a set P of patterns, is it complete, i.e., ? 
- PATTERN INCOMPLETENESS: given a set P of patterns, is it not complete, i.e., does there exist a string  such that ? 
We first note that, given a set 
P of patterns and a string 
, determining whether 
 or 
 is polynomial. Indeed, given a pattern 
p and a string 
s we may check in time 
 whether 
s can be generated by 
p or not. Hence, given a set 
P of patterns we have to repeat the check for each 
. If the check is false for each 
 we have 
, otherwise we have 
.
      
Proposition 1. PATTERN COVER is polynomial.
 Proof.  For each  we check whether  or not. Hence in time  we may decide whether  or not. In order to decide whether  or not, for each pattern  we count the number  of strings in S compatible for p. Let k be the number of gaps in p. Then p is compatible for S, i.e., , if and only if . Computing  can be done in time . Overall, checking whether  takes  time.    □
 Proposition 2. PATTERN COVER MINIMALITY is NP-complete.
 Proof.  We observe that PATTERN COVER MINIMALITY is basically the same as MINIMUM DISJUNCTIVE NORMAL FORM (see [
17] p. 261), which we repeat here for the sake of completeness: Given a set 
 of variables, a set 
 of “truth assigments”, and an integer 
, does there exist a disjunctive normal form expression 
E over 
U, having no more than 
K disjuncts, which is true for precisely the assignments in 
A and no others?
We show the equivalence of the two problems by the following map which builds a PATTERN COVER MINIMALITY instance. Each element  becomes an input binary string (with  representing false, and  representing true), while each disjunct d is mapped into a pattern p such that  if  appears in d,  if  appears in d and a  otherwise.    □
 Proposition 3. PATTERN INCOMPLETENESS is NP-complete.
 Proof.  We reduce SAT to PATTERN INCOMPLETENESS. Given a SAT instance with n literals and m clauses we derive a set P of m patterns , , (each pattern associated to each clause), as follows: for each variable i and each clause k, if the literal  is present in the clause, we set , if the literal  is present in the clause, we set , and if neither  nor  are present in the clause, we set  (note that the pattern values  are set opposite to the truth values of the literals ).
Assume SAT is satisfiable and let x be a satisfying truth assignment. Define a string s as  if  and  if . By assumption, at least one literal of each clause must be true, and so for each  at least one of the symbols  corresponding to ,  positions of  must be different from , due to the particular construction of . It follows that s cannot be in  for all k and so s cannot be in . In a similar way, given a string s not in  we can reverse the previous reasoning and obtain a satisfying truth assignment for the SAT instance.
To see that PATTERN INCOMPLETENESS is also in NP it suffices to observe that verifying that a string  does not belong to  takes polynomial time, as previously described.    □
  Since PATTERN INCOMPLETENESS and PATTERN COMPLETENESS are complements of each other, we have:
      
Corollary 1. PATTERN COMPLETENESS is co-NP-complete.
 Proposition 4. PATTERN EQUIVALENCE is co-NP-complete.
 Proof.  We transform PATTERN COMPLETENESS into PATTERN EQUIVALENCE. Given a set P of patterns, instance of PATTERN COMPLETENESS, the corresponding instance of PATTERN EQUIVALENCE consists of the set P plus a set Q containing only the pattern  (which generates ). For a no-instance, there exists a string  and , or vice versa, and this s is a short certificate.   □
 Proposition 5. PATTERN EQUIVALENCE MINIMALITY is co-NP-hard.
 Proof.  We describe a transformation from PATTERN COMPLETENESS. Given an instance of PATTERN COMPLETENESS we define a corresponding instance of PATTERN EQUIVALENCE MINIMALITY by choosing . Without loss of generality, we may assume that for each i, the values ’s across all  are not all identical (since, otherwise, we may discard each position where all symbols are equal and reduce the instance to an equivalent one). At this point, the only pattern that can be equivalent to P is .    □
 Since PATTERN COVER MINIMALITY is a particular case of PATTERN EQUIVALENCE MINIMALITY we have by Proposition 2:
      
Proposition 6. PATTERN EQUIVALENCE MINIMALITY is NP-hard.
 By Propositions 5 and 6, PATTERN EQUIVALENCE MINIMALITY is both NP-hard and co-NP-hard. To date it is not known whether the classes of NP-complete problems and co-NP-complete problems coincide or are disjoint. The widely believed conjecture is that they are disjoint. Following this conjecture we conclude that it is unlikely that PATTERN EQUIVALENCE MINIMALITY is in NP or in co-NP, and so we expect its complexity to be beyond the classes NP and co-NP.
It is obvious that, given a set P of patterns, generating  takes exponential time in general for the mere fact that  can be of exponential size. It is perhaps surprising that the reverse, i.e., given a set of strings S, generating  is polynomial. Indeed it turns out that  is of polynomial size and also the algorithm that generates  is polynomial. We devote the next section to this issue.
  4. Compatible Patterns
We describe a procedure to compute , that is the set of all compatible patterns for a set S of strings. The analysis of this procedure shows that the number of compatible patterns is polynomial ().
We define a recursion that produces a set 
 of patterns and we will show that 
. We assume there are no duplicates in 
S. The recursive calls create string sets that satisfy this property. The length of each string in a generic set 
R of strings (clearly all of equal length) is denoted by 
. For a generic set 
R and 
, 
 is the set of strings of length 
 obtained from 
R by taking, for each 
, only the elements 
. Furthermore, for 
s a string and 
 a set of strings, we denote by
      
      the set obtained by appending 
s as a prefix to all strings in 
X.
The recursive algorithm to compute  consists of:
	  
- base case: if  then return (the entire S, seen as a single string); 
- recursion: given an input set R of strings let  be the first index such that there are two strings in R whose c-th elements are different (if there is not such an index all strings in R would be identical, contradicting the hypothesis of no duplicates). Hence all prefixes  are equal for each . Let  be this common prefix. Let  be defined as follows: - Let  and . Then  (recursive call). 
- Let  and . Then  (recursive call). 
- Let  be the set of all strings  for which there exists  such that  for all  (note that s and  differ only at the c-th element and that the strings s and  (if any) go in pairs due to the hypothesis of no duplicates) and let  (recursive call).
			 
 
- Ifthen else . 
- Then, return  
 
Theorem 1. Let S be a string set and let  be the set produced by the recursive algorithm on S. Then .
 Proof.  We use induction on . Base case: If  the assert is clearly true.
Inductive step: Assume 
 and the theorem holds for all sets with 
 strings. With the notation of the algorithm,
        
Let . Since s (possibly null) is a prefix of each string in S, p must start with s as well. Assume  with . If  then q is a pattern compatible for  and, by induction, . Therefore, , so that . A similar argument shows that also if  it is . Now, if , then for each string a generated by q, the pattern p generates both  and . Therefore, a had to be a string both in  and in . This means that q had to be a pattern compatible for . Since, by induction, all such patterns are in  then  so that .
Now, assume . Let , with . If , let  be a string generated by p. Then  and therefore , so . If  let  be a string generated by p. Then also  is generated by p. Then  so that  and .    □
 Theorem 2. .
 Proof.  Let 
 be an upper bound to the cardinality of 
 for a set 
S with 
. From the previous theorem and from the algorithm recursive calls we see that
        
By applying the Master Theorem (see p. 66 [
18]) we get
        
   □
 These are the first values of 
 for 
, according to (
1):
This the sequence A006046 as listed in the On-line Encyclopedia of Integer Sequences (OEIS) and our derivation seems to add a new meaning to that sequence. Note from (
2) that in particular for 
 we have 
. Clearly, if 
S consists of all strings of length 
k, so 
, then 
. So the bound 
 for 
 is strict in this particular case. In fact, we can prove that the bound is strict for every 
m and not just for 
. Consider the following Procedure 1 that generates a set 
S of 
 strings of size 
m.
      
| Procedure 1 | 
|   begin        if         then return (1);        else begin            ; ;            ; ;           if  then ;            ;            ;            return ();        end   end | 
These are the sets produced by the procedure for 
, for which
      
It is easy to show that  for the sets produced by the above procedure, and so the bound  is strict for all m. However, in most cases, as for covering problems, we may be interested only in maximal patterns  and we may wonder how the number of maximal patterns grows with m.
In the particular case of the strings produced by the above procedure, which corresponds to the worst case in terms of generic patterns, the number of maximal patterns grows very slowly, indeed as .
In the following table we report the value 
m, 
, and the number 
 of maximal patterns for the above set of strings
      
It turns out that 
 is equal to the number of ones in the binary expression of 
m. However, the number 
 can be much larger for other sets of strings. There are cases such that 
. For instance, consider the following case for which 
, 
 and 
:
Although it may happen that 
 in general, we may show that a cover can always be obtained by less than 
m patterns if for each string there is a compatible pattern with at least one gap that covers it. Let us call a one-pattern a pattern with exactly one gap. Consider the set of all compatible one-patterns for a given set of 
m strings of 
n symbols. By assumption, each string is covered by a one-pattern (if it is covered by a pattern with gaps it is also covered by a one-pattern). Build a graph 
 with 
V the set of strings and 
E the set of string pairs spanned by a one-pattern. This graph is bipartite because we may partition the strings into "even" and "odd" strings according to the number of ones in the string. A one-pattern necessarily spans an even string and an odd one. Moreover, there are no isolated vertices by assumption. A cover consisting of one-patterns corresponds to an edge cover of 
G. The cardinality of an edge cover is given by 
 with 
M the cardinality of a maximum matching. Since 
 we need at most 
 one-patterns to cover all strings. For the above example these are two alternative minimum covers with 
 patterns.
      
Therefore, if more than m patterns are needed to cover a set of m strings, this means that there are some special strings that, loosely speaking, cannot be explained by some rule and require a particular pattern that coincides with the string. Furthermore, the  bound is obtained by considering only one-patterns. If, as we presume, the data sets are explained by more interesting patterns with many gaps, the size of a minimum cover can be significantly less than m.
  5. ILP Models
In this section we provide ILP models for two of the problems defined in 
Section 3, namely PATTERN COVER MINIMALITY and PATTERN EQUIVALENCE. The first problem is NP-complete and the second one is co-NP-complete. So it is not inappropriate to use ILP models for their solution. On the contrary, we believe that PATTERN EQUIVALENCE MINIMALITY cannot be expressed as an ILP model. We have already stressed the fact that this problem is both NP-hard and co-NP-hard and we have also observed that, given the current state of the art in computational complexity, we believe that it does not belong to NP or co-NP. Since ILP problems belong to these classes, we have strong reasons to doubt about the possibility of solving PATTERN EQUIVALENCE MINIMALITY by ILP models.
  5.1. ILP for Pattern Cover Minimality
We approach the problem of finding a pattern set P of minimum cardinality that spans a given set of strings S as a 01LP set cover problem, in which each row is associated to each string of the string set, each column is associated to each compatible pattern for S, and the entry  of the 01 LP matrix is 1 if and only if the pattern j covers the string i. In view of Theorem 2 the matrix has a polynomial number of columns and therefore it can be explicitly written. We note that it is not strictly necessary to generate the full matrix. We may use a column generation approach by adapting the algorithm that generates all pattern to the pricing problem given dual variables associated to the strings. However, in our computational experiments we have seen that generating the full matrix and then solving the problem outperforms the column generation approach, which requires running the recursive algorithm for each column generation, while only one run is necessary for generating the full matrix.
  5.2. ILP for Pattern Equivalence
We assume that two sets 
P and 
Q of patterns are given. We introduce the following models
        
We have the following result:
        
Proposition 7. - –  if and only if  and ; 
- –  if and only if  and ; 
- –  if and only if  and . 
 Proof.  We reiterate that ⊂ means strict inclusion. It is sufficient to prove that  if and only if . If  then x is generated by . The constraint  implies that x is generated by at least one pattern in P. Hence feasible x are in . Consider now any . If x is generated by  then , while if x is not generated by  then  is feasible (along with possible integer values ). The objective function forces  to be zero in this case.
Therefore,  if and only if  and . If , for any pattern  we have that , i.e., .    □
 Note that, if 
, i.e., when 
, the model (
3) yields also a string 
x in 
 but not in 
, whereas if 
, i.e., when 
, model (
3) yields also a string 
x in both 
 and 
. Similarly if 
, i.e., when 
, model (
4) yields also a string 
x in 
 but not in 
, whereas if 
, i.e., when 
, model (
4) yields also a string 
x in both 
 and 
.
We may further distinguish the case 
, 
, via the following model
        
The following proposition follows easily from Proposition 7.
Proposition 8.  and  are disjoint if and only if .
 When  and  are not disjoint, the model yields a string x shared by both sets.
If we consider model (
4) and take 
 we are actually solving the problem FULL PATTERN COVERAGE together with its complement PARTIAL PATTERN COVERAGE. Hence (
4) becomes
        
        and we may conclude, as a Corollary of Proposition 7, that
Proposition 9.  if and only if .
 As a simple example of the previous results suppose we are given the two following sets of patterns 
P and 
Q.
        
We run in sequence (
3) and (
4) and obtain 
 and 
, that implies 
, according to Proposition 7. In this case there is no need of running (
5). As a byproduct we obtain from (
3) and (
4) respectively the strings
        
One can easily check that 
 and 
 and also that 
. Moreover, if we run (
6) for 
P we obtain 
, that implies 
 and also the string 
 as a certificate that 
 does not contain all strings.
If we are given the two sets of patterns
        
        and run in sequence (
3) and (
4) we obtain 
 and 
. Hence we have to run also (
5) and obtain 
. This means that 
 and 
 are not disjoint. We may exhibit also the string 
 that belongs to their intersection. Moreover, from the previous (
3) and (
4) we also have the two strings 
, 
 and 
, 
. As a final output we may run (
6) for 
 and obtain 
 and 
: 
  6. Computational Experiments
We have carried out computational experiments for PATTERN COVER MINIMALITY and PATTERN EQUIVALENCE. The problem PATTERN COVER is polynomial and we felt no need to perform computational experiments for this problem. On the opposite side, problem PATTERN EQUIVALENCE MINIMALITY seems to be intractable and we have not even devised ideas of how to solve it. Problems PATTERN COMPLETENESS and PATTERN INCOMPLETENESS are particular cases of PATTERN EQUIVALENCE.
Our tests were run on an Intel Core i5 machine 2.3 GHz with 8 GB Ram. The program was implemented in C++ and we used Cplex 12.4 as the ILP solver.
  6.1. Pattern Cover Minimality
We approach the problem of finding a pattern set P of minimum cardinality that spans a given set S of strings as a 01LP set cover problem, in which each row is associated to each string of the string set, each column is associated to each compatible pattern for S, and the entry  of the 01 LP matrix is 1 if and only if the pattern j covers the string i. In view of Theorem 2 the matrix has a polynomial number of columns and therefore it can be explicitly written. We note that it is not strictly necessary to generate the full matrix. We may use a column generation approach by adapting the algorithm that generates all patterns to the pricing problem given dual variables associated to the strings. However, we have seen that generating the full matrix and then solving the problem outperforms the column generation approach, which requires running the recursive algorithm for each column generation, while only one run is necessary for generating the full matrix.
We fix the size of a string to . Each string is randomly generated by independently setting each bit to 1 with probability p (and to 0 with probability ). A random instance consists of a set S of m randomly generated strings without duplicate strings. We consider the following values:  and . For each combination of values of p and m we generate ten instances.
The strings generated with a value of p close to 0 (or equivalently close to 1) tend to be similar whereas they are much less similar for . Similar instances are expected to be covered with a few patterns with many gaps, whereas non-similar instances are expected to be covered with many patterns with few gaps.
By the recursive procedure described in 
Section 4 we compute for each 
S all compatible patterns 
 and solve with cplex the corresponding set cover problem.
The computational results are reported in 
Table 1. For each combination of 
p and 
m we report for each one of the ten instances the resulting number of compatible patterns (
), the optimal value of the minimal cover problem (opt), the total cpu time in seconds consisting of the pattern generation procedure plus the cplex run (time), and the number of nodes (root excluded) of the branch-and-bound process (#nodes). A value of #nodes equal to zero means that the solution of the LP relaxation was already integer.
As can be seen from 
Table 1, all instances are solved at the branch-and-bound root node except the case 
 and 
.
  6.2. Pattern Equivalence
One of the main difficulties for testing the models for PATTERN EQUIVALENCE is creating sensible instances which show that the ILP model is indeed effective.
In fact, a major objection that one might have versus the use of ILP is that—when the maximum number of gaps in the patterns is not “large enough”—a simple enumerative approach might prove quite effective, and much better than ILP, even if there are a lot of patterns and n is quite big. Assume, for example, to compare two sets of patterns of about 1000 patterns each with , and each pattern has at most ten “-” in it. Ten gaps can be expanded in 1024 ways, and so each set of patterns yields at most about 1,000,000 strings, which most computers can generate in a second. Then, we just need to check whether these two sets of strings have the same size (if not, we stop) and, if they do, we compare each element of the first to each one of the second and stop as soon as one element of the first is not in the second (perhaps by first sorting the two sets and then scanning the sorted lists). Some data structures might be more effective than others for these operations, but, bottom line, it is a very fast process that ILP has a hard time beating.
Therefore, we want to show that ILP is the way to go when the naïve approach cannot work, namely, when the patterns have so many gaps in them that a complete expansion (which exponentially increases the data size) is out of question. This poses the problem of how to create non-trivial, interesting instances of equivalent pattern sets which have a large number of gaps.
  6.3. Diagonal Instances
A simple way of creating instances with equivalent pattern sets is as follows. For every n, we consider two equivalent sets of patterns, which generate all strings with the exception of the string . We call these diagonal instances.
The first set has 
n patterns, with a maximum number of gaps 
, and is the following (exemplified for 
):
The second set has 
 patterns, with a maximum number of gaps 
, and is
        
We perform a sequence of tests to compare the ILP approach with the complete enumeration algorithm, for increasing values of 
n. These diagonal instances turn out to be very easy for the ILP model. They are all solved in less than 0.1 s, for 
 as can be seen from 
Table 2. The enumerative approach, however, becomes very soon impractical. For 
 the algorithm takes already more than 15 min, while for 
 the algorithm has not finished after one hour (which we set as a maximum time limit). It is interesting to notice how the ILP approach solves this instance in less than a second also for 
, while the enumerative approach would have to generate 
 strings.
  6.4. Generating Equivalent Pattern Sets in General
We can adopt the following strategy to create two sets of equivalent patterns:
- We generate a small starting set P (e.g., ) of random patterns. Each pattern is obtained by setting each bit to “-” with some probability q, to 0 with some probability , and to 1 with probability . Since we are interested in patterns with many gaps, we set q to a large value (e.g., 0.8). Let g be the minimum number of gaps appearing in some pattern of P. 
- The patterns in P are expanded in all possible ways yielding a set S of strings. 
- We compute with the recursive procedure described in  Section 4-  (slightly modified) the set  -  of all patterns compatible with  S-  which have at least  g-  gaps each (this ensures that it is always possible to cover  S-  with patterns in  - ). 
- We compute two random solutions of the set covering problem. Namely, from  we select (picking patterns at random until we have a cover) two subsets  that are covers of S. 
-  are equivalent by construction and have a fairly large number of gaps in each pattern. 
In our implementation, because of memory problems, the above procedure works for . Thus, to build larger instances we use a trick. Namely, we create instances starting from instances built as above and then combining them into larger and larger ones as explained below.
  6.5. How to Boost Instances of Pattern Equivalence
One way to increase the number of gaps in the instances would be to take two set of equivalent patterns A and B and suffix each pattern with a list of k gaps. This, however, yields very particular, uninteresting, instances. In order to obtain more elaborate, hard pattern equivalence instances, we have developed the following scheme.
Given a set of patterns X, denote by 
-  the number of columns (i.e., string length),
-  be the number of rows (i.e., of patterns),
-  the maximum number of gaps in some pattern. 
Furthermore, given sets of patterns 
A and 
B, denote by 
 the set of patterns
        
Note that 
, 
 and 
. We have
        
Claim 1. Let  be sets of patterns such that  is equivalent to  and  is equivalent to . Then  is equivalent to .
 Proof.  Let . We want to show that . We show  since the other direction is symmetrical. Let . In particular,  and . Since  is equivalent to  also  and similarly . Hence .    □
 By using this trick repeatedly, we can snowball from small instances, e.g., 4 or 5 patterns with 5 or 6 gaps each, to instances with a few hundred patterns with more than 20 gaps each.
  6.6. Experiments
We have created 10 instances of size 
 each. Each instance is built by combining two equivalent instances of size 
 each, which were built with our procedure with parameters 
, 
 and 
. The results are reported in 
Table 3. These instances turned out to be too difficult to be solved by the enumerative approach in less than half hour each.
For each set of input patterns () we have:
-  is the number of input patterns. 
-  is the minimum number of gaps per pattern. 
-  is the maximum number of gaps per pattern. 
-  is the average number of gaps per pattern. 
The ten instances are reported in 
Table 3 sorted by running times (9-th column). The 10-th and 11-th columns report the number of branch-and-bound nodes required (root node excluded) for solving models (
3) and (
4), respectively.
The results show that the ILP approach is effective also for instances which are large and enough and cannot be tackled by enumerative approaches.
In order to test the same ILP models in case the pattern sets are not equivalent we have randomly perturbed the previous data. The running times remained practically the same.
  7. Conclusions
In order to use feature selection and LAD in the analysis of binary data consisting of positive and negative samples, one has to identify which computational problems might arise and how to overcome them. One of the issues that we have addressed in this paper is in fact the computational complexity of the problems, which we have shown to be, in general, very hard. As a viable approach to the effective solution of some of these problems, we have described integer linear programming formulations. In particular, we have given ILP models for the problem of determining if two sets of patterns are equivalent and for finding a min-size set of patterns which explain a given data set. A striking consequence of our complexity results is that there could be no simple ILP model for finding a minimal set of patterns explaining the same data set explained by a given pattern set. Developing some procedures for this last problem could be a line of future research.