Next Article in Journal
SR-Inpaint: A General Deep Learning Framework for High Resolution Image Inpainting
Next Article in Special Issue
Scheduling Multiprocessor Tasks with Equal Processing Times as a Mixed Graph Coloring Problem
Previous Article in Journal
Maritime Supply Chain Optimization by Using Fuzzy Goal Programming
Previous Article in Special Issue
A General Cooperative Optimization Approach for Distributing Service Points in Mobility Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data

by
Giuseppe Lancia
*,† and
Paolo Serafini
Department of Mathematics, Computer Science and Physics, University of Udine, 33100 Udine, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Algorithms 2021, 14(8), 235; https://doi.org/10.3390/a14080235
Submission received: 7 July 2021 / Revised: 30 July 2021 / Accepted: 6 August 2021 / Published: 9 August 2021
(This article belongs to the Special Issue 2021 Selected Papers from Algorithms Editorial Board Members)

Abstract

:
Logical Analysis of Data is a procedure aimed at identifying relevant features in data sets with both positive and negative samples. The goal is to build Boolean formulas, represented by strings over {0,1,-} called patterns, which can be used to classify new samples as positive or negative. Since a data set can be explained in alternative ways, many computational problems arise related to the choice of a particular set of patterns. In this paper we study the computational complexity of several of these pattern problems (showing that they are, in general, computationally hard) and we propose some integer programming models that appear to be effective. We describe an ILP model for finding the minimum-size set of patterns explaining a given set of samples and another one for the problem of determining whether two sets of patterns are equivalent, i.e., they explain exactly the same samples. We base our first model on a polynomial procedure that computes all patterns compatible with a given set of samples. Computational experiments substantiate the effectiveness of our models on fairly large instances. Finally, we conjecture that the existence of an effective ILP model for finding a minimum-size set of patterns equivalent to a given set of patterns is unlikely, due to the problem being NP-hard and co-NP-hard at the same time.

1. Introduction

One of the main consequences of the constant progress of technology together with the massive use of computers in many aspects of our lives has been the creation of large repositories of data storing information of all sorts. A major problem related to these huge data sets is the one of discovering relevant patterns that separate the noise from important information and of deriving rules for clustering the data into classes sharing essential common features. To this aim, the fields of study known as data mining [1,2] and feature selection [3,4,5] have recently emerged among the most relevant applications of modern computer science.
In this paper we focus on some mathematical issues that arise from data mining problems. A very common situation for data mining problems is to represent the starting information by a two-dimensional array, in which the rows correspond to samples (or individuals) while the columns correspond to their characteristics (also called features).
If the features are Boolean, one of the tools that can be used to extract interesting information is the so-called Logical Analysis of Data (LAD [6,7,8]). Consider for instance a data set consisting of a binary matrix of m rows and n columns, in which some rows are labeled as positive while the remaining rows are labeled as negative (for instance, in the case of a molecular biology experiment using a device called “microarray” which measures the level of gene expression in cells, the values 0 and 1 would be related to the level being, respectively, “normal” or “abnormal” [9,10,11]).
The Logical Analysis of Data has the objective of discovering a set of simple Boolean formulas (or “rules”) that can be used to classify new binary vectors ( b 1 , , b n ) . Each rule describes what the value of some bits must be for a vector to be classified as positive or negative. For instance, a “positive rule” could be
( b 2 = 1 ) ( b 5 = 0 ) ( b 9 = 0 )
meaning that any vector with a 1 in the 2nd component, and a 0 in the 5th and 9th component is classified as positive. Similarly, there can be some “negative rules” which specify which vectors should be classified as negative.
A rule such as the above can be conveniently represented by a pattern, which is a string over the alphabet { 0 , 1 , - } -}. The characters 0 and 1 in the pattern specify which positions must be matched exactly by a binary vector to satisfy the rule, while the character - is a wildcard that can be indifferently matched by either 0 or 1 . In particular, if n = 10 the pattern corresponding to the above rule would be
- 1 - - 0 - - - 0 -
If r is a rule and p is the pattern corresponding to r, then a binary vector b satisfies the rule if and only if
k : p k { 0 , 1 } ( b k = p k )
We say that the pattern p covers all vectors b for which the above holds. In view of the equivalence of rules and patterns, we can talk of positive/negative patterns in place of positive/negative rules.
The objective of LAD is to infer positive and negative patterns from the data in such a way that (i) each positive row is covered by at least one of the positive patterns, while no negative row is and (ii) each negative row is covered by at least one of the negative patterns, while no positive row is. This approach has been successfully applied to many contexts in both bioinformatics and biomedicine [8].
Since there might be many alternative sets of patterns explaining a given instance of LAD, one has to introduce a suitable criterion for choosing a specific solution. In particular, an Okkam’s razor strategy would suggest seeking the simplest possible solutions, i.e., the sets with a minimum number of patterns. Finding a min-size set of patterns which cover a given set of vectors is called the Pattern Cover Minimality problem.
Other problems arising from the analysis of patterns are related to understanding whether two different sets of rules actually explain the same data set, or, in other words, the two pattern sets are equivalent. In particular we would like also to know whether a given set of rules explains all possible data, and so is in some sense “useless”. On the opposite side we would like to know whether there are some data that cannot be explained by a particular set of rules.
In addition, given a set of patterns we would like to know whether there exists another smaller set of patterns that explains the same data set. This problem that we call Pattern Equivalence Minimality looks similar to Pattern Cover Minimality. The difference is that here we start from a pattern set and not from a data set. Though patterns can be expanded into strings and we might solve a Pattern Cover Minimality problem from these strings, it is obviously computationally intractable expanding the patterns. Hence we should be able to find a better set of patterns starting directly from the given pattern set.
In the following we will review the computational complexity of these pattern problems, which are, in general, quite complex [12] (see also [13] for a fixed-parameter analysis of some related pattern problems). We then give an integer linear programming (ILP) formulation for Pattern Cover Minimality and for Pattern Equivalence and we address the effectiveness of our formulations by means of extensive computational experiments. An ILP formulation for Pattern Cover Minimality is also given in Boccia et al. [14]. The formulation we propose in this paper reduces the problem to a Set Covering with a (low-degree) polynomial number of columns. Pattern Cover models for non-binary data are, e.g., a branch-and-price procedure described in [15] and heuristic procedures proposed in [16].
Formulating a solution procedure for Pattern Equivalence Minimality seems quite challenging since, as we will prove in the paper, this problem is NP-hard and co-NP-hard at the same time.
The paper is organized as follows. In Section 2 we provide precise mathematical definitions of the concept we are dealing with and the related problems. In Section 3 we investigate the computational complexity of the problems defined in the previous section. In Section 4 we investigate strings and patterns that are mutually compatible. In particular, we provide a polynomial algorithm to list all patterns that are compatible with a given set of strings (the inverse problem of listing all strings compatible with a set of patterns is necessarily exponential). In Section 5 we give ILP models both for the Pattern Cover Minimality and Pattern Equivalence problems. These procedures are tested in Section 6 devoted to the computational experiments. Some conclusions are drawn in Section 7.

2. Basic Definitions

A binary string (or, simply, a string) s is a sequence of symbols where each symbol can be either 0 or 1 . With n-binary string we denote a binary string of length n. An n-pattern (or simply a pattern p is sequence of symbols where each symbol can take the values 0 , 1 or -, i.e.,
s i 0 , 1 , p i 0 , 1 , - , i = 1 , n .
We call the symbol - a gap. With n-pattern we denote a pattern of length n. Notice that a string is in fact a particular pattern, i.e., a pattern without gaps. A pattern p covers, or generates, a string s = ( s 1 s n ) if s k = p k for each k such that p k 0 , 1 . The span of p is the set of all binary strings generated by the pattern p, i.e.,
S ( p ) = s 0 , 1 n : s i = p i if p i 0 , 1
Given a set P of patterns the span S ( P ) of P is the set
S ( P ) = p P S ( p )
Two sets P and Q of patterns are equivalent if S ( P ) = S ( Q ) . A set of patterns P is a minimum set if | P | | Q | for each set of patterns Q equivalent to P.
We say that a pattern p is compatible with a set of strings S if S ( p ) S . Similarly we say that a string s is compatible for a pattern p if s S ( p ) . We denote by P ( S ) the set of all compatible patterns for S. Let p be a pattern compatible with a given set S of strings. If there is no compatible pattern p such that S ( p ) S ( p ) , we say that p is maximal (for S). We denote by P * ( S ) the subset of maximal patterns in P ( S ) . Given a set S of strings and a set P of patterns we say that P is compatible for S if each p P is compatible for S and so P P ( S ) and S ( P ) S .
Moreover, a set of patterns P is called a cover of S if S ( P ) = S . Notice that all covers of S are equivalent to each other and S, viewed as a set of patterns, is equivalent to each of its covers.
A set P of patterns is said to be complete if S ( P ) = 0 , 1 n , i.e., if P generates all possible n-binary strings. Clearly, ( - - - ) is trivially complete.
We note that S is a set and so it does not contain duplicate strings. We assume this to be true also when we represent a set of m strings of length n as an m × n array of zeros and ones.

3. Computational Complexity Results

The previous definitions lead to the following decision problems [12]:
  • PATTERN COVER: given a set S of strings and a set P of patterns, is P a cover of S, i.e., S = S ( P ) ?
  • PATTERN COVER MINIMALITY: given a set S of strings and a constant K, does there exist a cover P of S such that | P | K ?
  • PATTERN EQUIVALENCE: given two sets P, Q of patterns, are they equivalent, i.e., S ( P ) = S ( Q ) ?
  • PATTERN EQUIVALENCE MINIMALITY: given a set P of patterns and a constant K < | P | , does there exist an equivalent set of patterns Q such that | Q | K ?
  • PATTERN COMPLETENESS: given a set P of patterns, is it complete, i.e., S ( P ) = 0 , 1 n ?
  • PATTERN INCOMPLETENESS: given a set P of patterns, is it not complete, i.e., does there exist a string s 0 , 1 n such that s S ( P ) ?
We first note that, given a set P of patterns and a string s 0 , 1 n , determining whether s S ( P ) or s S ( P ) is polynomial. Indeed, given a pattern p and a string s we may check in time O ( n ) whether s can be generated by p or not. Hence, given a set P of patterns we have to repeat the check for each p P . If the check is false for each p P we have s S ( P ) , otherwise we have s S ( P ) .
Proposition 1.
PATTERN COVER is polynomial.
Proof. 
For each s S we check whether s S ( P ) or not. Hence in time O ( n | S | | P | ) we may decide whether S S ( P ) or not. In order to decide whether S ( P ) S or not, for each pattern p P we count the number n ( p ) of strings in S compatible for p. Let k be the number of gaps in p. Then p is compatible for S, i.e., S ( p ) S , if and only if n ( p ) = 2 k . Computing n ( p ) can be done in time O ( n | S | ) . Overall, checking whether S ( P ) S takes O ( n | S | | P | ) time.    □
Proposition 2.
PATTERN COVER MINIMALITY is NP-complete.
Proof. 
We observe that PATTERN COVER MINIMALITY is basically the same as MINIMUM DISJUNCTIVE NORMAL FORM (see [17] p. 261), which we repeat here for the sake of completeness: Given a set U = { u 1 , u 2 , , u n } of variables, a set A { T , F } n of “truth assigments”, and an integer K > 0 , does there exist a disjunctive normal form expression E over U, having no more than K disjuncts, which is true for precisely the assignments in A and no others?
We show the equivalence of the two problems by the following map which builds a PATTERN COVER MINIMALITY instance. Each element a A becomes an input binary string (with 0 representing false, and 1 representing true), while each disjunct d is mapped into a pattern p such that p i = 1 if u i appears in d, p i = 0 if ¬ u i appears in d and a p i = - otherwise.    □
Proposition 3.
PATTERN INCOMPLETENESS is NP-complete.
Proof. 
We reduce SAT to PATTERN INCOMPLETENESS. Given a SAT instance with n literals and m clauses we derive a set P of m patterns p k , k = 1 , , m , (each pattern associated to each clause), as follows: for each variable i and each clause k, if the literal x i is present in the clause, we set p i k = 0 , if the literal ¬ x i is present in the clause, we set p i k = 1 , and if neither x i nor ¬ x i are present in the clause, we set p i k = - (note that the pattern values p i k { 0 , 1 } are set opposite to the truth values of the literals x i ).
Assume SAT is satisfiable and let x be a satisfying truth assignment. Define a string s as s i = 1 if x i = TRUE and s i = 0 if x i = FALSE . By assumption, at least one literal of each clause must be true, and so for each p k P at least one of the symbols s i corresponding to 0 , 1 positions of p k must be different from p i k , due to the particular construction of p k . It follows that s cannot be in S ( p k ) for all k and so s cannot be in S ( P ) . In a similar way, given a string s not in S ( P ) we can reverse the previous reasoning and obtain a satisfying truth assignment for the SAT instance.
To see that PATTERN INCOMPLETENESS is also in NP it suffices to observe that verifying that a string s S ( P ) does not belong to S ( P ) takes polynomial time, as previously described.    □
Since PATTERN INCOMPLETENESS and PATTERN COMPLETENESS are complements of each other, we have:
Corollary 1.
PATTERN COMPLETENESS is co-NP-complete.
Proposition 4.
PATTERN EQUIVALENCE is co-NP-complete.
Proof. 
We transform PATTERN COMPLETENESS into PATTERN EQUIVALENCE. Given a set P of patterns, instance of PATTERN COMPLETENESS, the corresponding instance of PATTERN EQUIVALENCE consists of the set P plus a set Q containing only the pattern ( - - - ) (which generates 0 , 1 n ). For a no-instance, there exists a string s S ( P ) and s S ( Q ) , or vice versa, and this s is a short certificate.   □
Proposition 5.
PATTERN EQUIVALENCE MINIMALITY is co-NP-hard.
Proof. 
We describe a transformation from PATTERN COMPLETENESS. Given an instance of PATTERN COMPLETENESS we define a corresponding instance of PATTERN EQUIVALENCE MINIMALITY by choosing K = 1 . Without loss of generality, we may assume that for each i, the values p i ’s across all p P are not all identical (since, otherwise, we may discard each position where all symbols are equal and reduce the instance to an equivalent one). At this point, the only pattern that can be equivalent to P is ( - - - ) .    □
Since PATTERN COVER MINIMALITY is a particular case of PATTERN EQUIVALENCE MINIMALITY we have by Proposition 2:
Proposition 6.
PATTERN EQUIVALENCE MINIMALITY is NP-hard.
By Propositions 5 and 6, PATTERN EQUIVALENCE MINIMALITY is both NP-hard and co-NP-hard. To date it is not known whether the classes of NP-complete problems and co-NP-complete problems coincide or are disjoint. The widely believed conjecture is that they are disjoint. Following this conjecture we conclude that it is unlikely that PATTERN EQUIVALENCE MINIMALITY is in NP or in co-NP, and so we expect its complexity to be beyond the classes NP and co-NP.
It is obvious that, given a set P of patterns, generating S ( P ) takes exponential time in general for the mere fact that | S ( P ) | can be of exponential size. It is perhaps surprising that the reverse, i.e., given a set of strings S, generating P ( S ) is polynomial. Indeed it turns out that P ( S ) is of polynomial size and also the algorithm that generates P ( S ) is polynomial. We devote the next section to this issue.

4. Compatible Patterns

We describe a procedure to compute P ( S ) , that is the set of all compatible patterns for a set S of strings. The analysis of this procedure shows that the number of compatible patterns is polynomial ( O ( | S | log 2 3 ) ).
We define a recursion that produces a set P ( S ) of patterns and we will show that P ( S ) = P ( S ) . We assume there are no duplicates in S. The recursive calls create string sets that satisfy this property. The length of each string in a generic set R of strings (clearly all of equal length) is denoted by n ( R ) . For a generic set R and 1 c n ( R ) , S ( R , c ) is the set of strings of length ( n c + 1 ) obtained from R by taking, for each s R , only the elements s c , s c + 1 , , s n ( R ) . Furthermore, for s a string and X = { x 1 , , x k } a set of strings, we denote by
s X = { s x 1 , , s x k }
the set obtained by appending s as a prefix to all strings in X.
The recursive algorithm to compute P ( S ) consists of:
  • base case: if  | S | = 1 then return ( S ) (the entire S, seen as a single string);
  • recursion: given an input set R of strings let c n ( R ) be the first index such that there are two strings in R whose c-th elements are different (if there is not such an index all strings in R would be identical, contradicting the hypothesis of no duplicates). Hence all prefixes s 1 , , s c 1 are equal for each s R . Let s ¯ be this common prefix. Let P 0 , P 1 , P * be defined as follows:
    • Let R 0 = { s R : s c = 0 } and S 0 : = S ( R 0 , c + 1 ) . Then P 0 : = P ( S 0 ) (recursive call).
    • Let R 1 = { s R : s c = 1 } and S 1 : = S ( R 1 , c + 1 ) . Then P 1 : = P ( S 1 ) (recursive call).
    • Let R * be the set of all strings s R 0 for which there exists s R 1 such that s i = s i for all i > c (note that s and s differ only at the c-th element and that the strings s and s (if any) go in pairs due to the hypothesis of no duplicates) and let S * : = S ( R * , c + 1 ) (recursive call).
    If R * = then P * : =  else  P * : = P ( S * ) .
    Then, return ( s ( ( 0 P 0 ) ( 1 P 1 ) ( - P * ) ) )
Theorem 1.
Let S be a string set and let P ( S ) be the set produced by the recursive algorithm on S. Then P ( S ) = P ( S ) .
Proof. 
We use induction on | S | . Base case: If | S | = 1 the assert is clearly true.
Inductive step: Assume | S | > 1 and the theorem holds for all sets with | S | 1 strings. With the notation of the algorithm,
P ( S ) = s ( ( 0 P 0 ) ( 1 P 1 ) ( - P * ) ) )
Let p P ( S ) . Since s (possibly null) is a prefix of each string in S, p must start with s as well. Assume p = s x q with x { 0 , 1 , - } . If x = 0 then q is a pattern compatible for S 0 and, by induction, q P ( S 0 ) . Therefore, p s ( 0 P ( S 0 ) ) , so that p P ( S ) . A similar argument shows that also if x = 1 it is p P ( S ) . Now, if x = - , then for each string a generated by q, the pattern p generates both s 0 a and s 1 a . Therefore, a had to be a string both in S 0 and in S 1 . This means that q had to be a pattern compatible for S * . Since, by induction, all such patterns are in P * then p s ( - P * ) so that p P ( A ) .
Now, assume p P ( S ) . Let p = s x q , with x { 0 , 1 , } . If x { 0 , 1 } , let s x y be a string generated by p. Then y R 0 R 1 and therefore s x y R , so p P ( S ) . If x = let s 0 y be a string generated by p. Then also s 1 y is generated by p. Then y R 0 R 1 so that s 0 y , s 1 y R and p P ( S ) .    □
Theorem 2.
| P ( S ) | | S | log 2 3 .
Proof. 
Let T ( m ) be an upper bound to the cardinality of P ( S ) for a set S with m = | S | . From the previous theorem and from the algorithm recursive calls we see that
T ( m ) = max { T ( m k ) + 2 T ( k ) : k = 1 , , m / 2 } , T ( 1 ) = 1
By applying the Master Theorem (see p. 66 [18]) we get
T ( m ) m log 2 3
   □
These are the first values of T ( m ) for m = 1 , , 16 , according to (1):
T ( m ) = 1 , 3 , 5 , 9 , 11 , 15 , 19 , 27 , 29 , 33 , 37 , 45 , 49 , 57 , 65 , 81
This the sequence A006046 as listed in the On-line Encyclopedia of Integer Sequences (OEIS) and our derivation seems to add a new meaning to that sequence. Note from (2) that in particular for m = 2 k we have T ( m ) = 3 k = m log 2 3 . Clearly, if S consists of all strings of length k, so | S | = 2 k , then | P ( S ) | = 3 k . So the bound T ( m ) for | P ( S ) | is strict in this particular case. In fact, we can prove that the bound is strict for every m and not just for m = 2 k . Consider the following Procedure 1 that generates a set S of T ( m ) strings of size m.
Procedure 1 f [ m ]
  begin
       if m = 1
       then return (1);
       else begin
           h 1 = m / 2 ; h 0 = m / 2 ;
           S 0 = f [ h 0 ] ; S 1 = S 0 ;
          if h 1 h 0 then S 1 : = ( 0 , 0 , , 0 ) S 1 ;
           S 0 : = 0 S 0 ;
           S 1 : = 1 S 1 ;
           return ( S 0 S 1 );
       end
  end
These are the sets produced by the procedure for m = 1 , , 6 , for which
m = 1 S = 1 P ( S ) = S = 1 m = 2 S = 0 1 , 1 1 P ( S ) = S 1 = 0 1 , 1 1 , 1 m = 3 S = ( 01 , 10 , 11 ) P ( S ) = S ( 1 , 1 ) m = 4 S = ( 001 , 011 , 101 , 111 ) P ( S ) = S ( 0 1 , 01 , 1 1 , 11 , 1 ) m = 5 S = ( 001 , 011 , 100 , 101 , 111 ) P ( S ) = S ( 0 1 , 10 , 1 1 , 01 , 11 , 1 ) m = 6 S = ( 001 , 010 , 011 , 101 , 110 , 111 ) P ( S ) = S ( 01 , 0 1 , 11 , 1 1 , 01 , 10 , 11 , 1 , 1 )
It is easy to show that | P ( S ) | = T ( m ) for the sets produced by the above procedure, and so the bound T ( m ) is strict for all m. However, in most cases, as for covering problems, we may be interested only in maximal patterns P * ( S ) and we may wonder how the number of maximal patterns grows with m.
In the particular case of the strings produced by the above procedure, which corresponds to the worst case in terms of generic patterns, the number of maximal patterns grows very slowly, indeed as log m .
In the following table we report the value m, | P ( S ) | = T ( m ) , and the number | P * ( S ) | of maximal patterns for the above set of strings
m 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | P ( S ) | 1 3 5 9 11 15 19 27 29 33 37 45 49 57 65 89 | P * ( S ) | 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1
It turns out that | P * ( S ) | is equal to the number of ones in the binary expression of m. However, the number | P * ( S ) | can be much larger for other sets of strings. There are cases such that | P * ( S ) | > m . For instance, consider the following case for which | S | = 12 , | P ( S ) | = 25 < T ( 12 ) = 45 and | P * ( S ) | = 13 :
S = ( 00000 , 00001 , 00101 , 01000 , 01010 , 10000 , 10010 , 10101 , 10110 , 10111 , 11010 , 11101 )
P ( S ) = 00000 00001 0000 00101 00 01 01000 01010 010 0 0 000 10000 10010 100 0 10101 10110 10111 1011 101 1 10 10 11010 11101 1 010 1 101 0000 0101 1010
P * ( S ) = 0000 00 01 010 0 0 000 100 0 1011 101 1 10 10 1 010 1 101 0000 0101 1010
Although it may happen that | P * ( S ) | > m in general, we may show that a cover can always be obtained by less than m patterns if for each string there is a compatible pattern with at least one gap that covers it. Let us call a one-pattern a pattern with exactly one gap. Consider the set of all compatible one-patterns for a given set of m strings of n symbols. By assumption, each string is covered by a one-pattern (if it is covered by a pattern with gaps it is also covered by a one-pattern). Build a graph G = ( V , E ) with V the set of strings and E the set of string pairs spanned by a one-pattern. This graph is bipartite because we may partition the strings into "even" and "odd" strings according to the number of ones in the string. A one-pattern necessarily spans an even string and an odd one. Moreover, there are no isolated vertices by assumption. A cover consisting of one-patterns corresponds to an edge cover of G. The cardinality of an edge cover is given by m | M | with M the cardinality of a maximum matching. Since | M | 1 we need at most m 1 one-patterns to cover all strings. For the above example these are two alternative minimum covers with 6 < 12 patterns.
( 00 01 , 010 0 , 1011 , 1 010 , 1 101 , 0000 ) , ( 00 01 , 0 000 , 100 0 , 1011 , 1 101 , 1010 )
Therefore, if more than m patterns are needed to cover a set of m strings, this means that there are some special strings that, loosely speaking, cannot be explained by some rule and require a particular pattern that coincides with the string. Furthermore, the m 1 bound is obtained by considering only one-patterns. If, as we presume, the data sets are explained by more interesting patterns with many gaps, the size of a minimum cover can be significantly less than m.

5. ILP Models

In this section we provide ILP models for two of the problems defined in Section 3, namely PATTERN COVER MINIMALITY and PATTERN EQUIVALENCE. The first problem is NP-complete and the second one is co-NP-complete. So it is not inappropriate to use ILP models for their solution. On the contrary, we believe that PATTERN EQUIVALENCE MINIMALITY cannot be expressed as an ILP model. We have already stressed the fact that this problem is both NP-hard and co-NP-hard and we have also observed that, given the current state of the art in computational complexity, we believe that it does not belong to NP or co-NP. Since ILP problems belong to these classes, we have strong reasons to doubt about the possibility of solving PATTERN EQUIVALENCE MINIMALITY by ILP models.

5.1. ILP for Pattern Cover Minimality

We approach the problem of finding a pattern set P of minimum cardinality that spans a given set of strings S as a 01LP set cover problem, in which each row is associated to each string of the string set, each column is associated to each compatible pattern for S, and the entry a i j of the 01 LP matrix is 1 if and only if the pattern j covers the string i. In view of Theorem 2 the matrix has a polynomial number of columns and therefore it can be explicitly written. We note that it is not strictly necessary to generate the full matrix. We may use a column generation approach by adapting the algorithm that generates all pattern to the pricing problem given dual variables associated to the strings. However, in our computational experiments we have seen that generating the full matrix and then solving the problem outperforms the column generation approach, which requires running the recursive algorithm for each column generation, while only one run is necessary for generating the full matrix.

5.2. ILP for Pattern Equivalence

We assume that two sets P and Q of patterns are given. We introduce the following models
v = min q Q z q i : q i = 0 x i + i : q i = 1 ( 1 x i ) 1 z q q Q y p 1 x i i : p i = 0 , p P y p x i i : p i = 1 , p P p P y p 1 x i 0 , 1 , y p 0 , 1 , z q 0 integer
w = min p P y p i : p i = 0 x i + i : p i = 1 ( 1 x i ) 1 y p p P z q 1 x i i : q i = 0 , q Q z q x i i : q i = 1 , q Q q Q z q 1 x i 0 , 1 , z q 0 , 1 , y p 0 integer
We have the following result:
Proposition 7.
  • S ( P ) S ( Q ) if and only if v > 0 and w = 0 ;
  • S ( Q ) S ( P ) if and only if w > 0 and v = 0 ;
  • S ( Q ) = S ( P ) if and only if v > 0 and w > 0 .
Proof. 
We reiterate that ⊂ means strict inclusion. It is sufficient to prove that S ( P ) S ( Q ) if and only if v > 0 . If y p = 1 then x is generated by p P . The constraint p P y p 1 implies that x is generated by at least one pattern in P. Hence feasible x are in S ( P ) . Consider now any x 0 , 1 n . If x is generated by q Q then z q = 1 , while if x is not generated by q Q then z q = 0 is feasible (along with possible integer values z q 1 ). The objective function forces z q to be zero in this case.
Therefore, v = 0 if and only if x S ( P ) and x S ( Q ) . If v > 0 , for any pattern x S ( P ) we have that x S ( Q ) , i.e., S ( P ) S ( Q ) . □
Note that, if S ( P ) S ( Q ) , i.e., when v = 0 , the model (3) yields also a string x in S ( P ) but not in S ( Q ) , whereas if S ( P ) S ( Q ) , i.e., when v > 0 , model (3) yields also a string x in both S ( P ) and S ( Q ) . Similarly if S ( Q ) S ( P ) , i.e., when w = 0 , model (4) yields also a string x in S ( Q ) but not in S ( P ) , whereas if S ( Q ) S ( P ) , i.e., when w > 0 , model (4) yields also a string x in both S ( Q ) and S ( P ) .
We may further distinguish the case w = 0 , v = 0 , via the following model
w ^ = min w p + w q y p 1 x i i : p i = 0 , p P y p x i i : p i = 1 , p P p P y p 1 w p z q 1 x i i : q i = 0 , q Q z q x i i : q i = 1 , q Q q Q z q 1 w q
The following proposition follows easily from Proposition 7.
Proposition 8.
S ( P ) and S ( Q ) are disjoint if and only if w ^ > 0 .
When S ( P ) and S ( Q ) are not disjoint, the model yields a string x shared by both sets.
If we consider model (4) and take Q = ( - - - ) we are actually solving the problem FULL PATTERN COVERAGE together with its complement PARTIAL PATTERN COVERAGE. Hence (4) becomes
u = min p P y p i : p i = 0 x i + i : p i = 1 ( 1 x i ) 1 y p p P x i 0 , 1 , y p 0 integer
and we may conclude, as a Corollary of Proposition 7, that
Proposition 9.
S ( P ) = 0 , 1 n if and only if u > 0 .
As a simple example of the previous results suppose we are given the two following sets of patterns P and Q.
P = - - 1 1 1 0 0 - 0 0 - - - 1 - 1 Q = 0 0 - 0 0 - 1 1 - 0 - 1
We run in sequence (3) and (4) and obtain v = 0 and w > 0 , that implies S ( Q ) S ( P ) , according to Proposition 7. In this case there is no need of running (5). As a byproduct we obtain from (3) and (4) respectively the strings
x 1 = ( 1111 ) x 2 = ( 0010 )
One can easily check that x 1 S ( P ) and x 1 S ( Q ) and also that x 2 S ( Q ) S ( P ) . Moreover, if we run (6) for P we obtain u > 0 , that implies S ( P ) S ( ( ) ) and also the string ( 1110 ) as a certificate that S ( P ) does not contain all strings.
If we are given the two sets of patterns
P = - 1 1 1 1 0 0 - 0 0 0 - - 1 - 1 Q = 0 0 - 0 0 - 1 1 - 0 - 1
and run in sequence (3) and (4) we obtain v = 0 and w = 0 . Hence we have to run also (5) and obtain w ^ = 0 . This means that S ( P ) and S ( Q ) are not disjoint. We may exhibit also the string x 3 S ( P ) S ( Q ) that belongs to their intersection. Moreover, from the previous (3) and (4) we also have the two strings x 1 S ( P ) , x 1 S ( Q ) and x 2 S ( Q ) , x 2 S ( P ) . As a final output we may run (6) for P Q and obtain u = 0 and x 4 S ( P ) S ( Q ) :
x 1 = ( 1111 ) x 2 = ( 0010 ) x 3 = ( 0001 ) x 4 = ( 1110 )

6. Computational Experiments

We have carried out computational experiments for PATTERN COVER MINIMALITY and PATTERN EQUIVALENCE. The problem PATTERN COVER is polynomial and we felt no need to perform computational experiments for this problem. On the opposite side, problem PATTERN EQUIVALENCE MINIMALITY seems to be intractable and we have not even devised ideas of how to solve it. Problems PATTERN COMPLETENESS and PATTERN INCOMPLETENESS are particular cases of PATTERN EQUIVALENCE.
Our tests were run on an Intel Core i5 machine 2.3 GHz with 8 GB Ram. The program was implemented in C++ and we used Cplex 12.4 as the ILP solver.

6.1. Pattern Cover Minimality

We approach the problem of finding a pattern set P of minimum cardinality that spans a given set S of strings as a 01LP set cover problem, in which each row is associated to each string of the string set, each column is associated to each compatible pattern for S, and the entry a i j of the 01 LP matrix is 1 if and only if the pattern j covers the string i. In view of Theorem 2 the matrix has a polynomial number of columns and therefore it can be explicitly written. We note that it is not strictly necessary to generate the full matrix. We may use a column generation approach by adapting the algorithm that generates all patterns to the pricing problem given dual variables associated to the strings. However, we have seen that generating the full matrix and then solving the problem outperforms the column generation approach, which requires running the recursive algorithm for each column generation, while only one run is necessary for generating the full matrix.
We fix the size of a string to n = 15 . Each string is randomly generated by independently setting each bit to 1 with probability p (and to 0 with probability 1 p ). A random instance consists of a set S of m randomly generated strings without duplicate strings. We consider the following values: p 0.1 , 0.25 , 0.5 and m 100 , 1000 , 5000 , 10 , 000 . For each combination of values of p and m we generate ten instances.
The strings generated with a value of p close to 0 (or equivalently close to 1) tend to be similar whereas they are much less similar for p = 0.5 . Similar instances are expected to be covered with a few patterns with many gaps, whereas non-similar instances are expected to be covered with many patterns with few gaps.
By the recursive procedure described in Section 4 we compute for each S all compatible patterns P ( S ) and solve with cplex the corresponding set cover problem.
The computational results are reported in Table 1. For each combination of p and m we report for each one of the ten instances the resulting number of compatible patterns ( | P ( S ) | ), the optimal value of the minimal cover problem (opt), the total cpu time in seconds consisting of the pattern generation procedure plus the cplex run (time), and the number of nodes (root excluded) of the branch-and-bound process (#nodes). A value of #nodes equal to zero means that the solution of the LP relaxation was already integer.
As can be seen from Table 1, all instances are solved at the branch-and-bound root node except the case p = 0.5 and m = 10 , 000 .

6.2. Pattern Equivalence

One of the main difficulties for testing the models for PATTERN EQUIVALENCE is creating sensible instances which show that the ILP model is indeed effective.
In fact, a major objection that one might have versus the use of ILP is that—when the maximum number of gaps in the patterns is not “large enough”—a simple enumerative approach might prove quite effective, and much better than ILP, even if there are a lot of patterns and n is quite big. Assume, for example, to compare two sets of patterns of about 1000 patterns each with n = 100 , and each pattern has at most ten “-” in it. Ten gaps can be expanded in 1024 ways, and so each set of patterns yields at most about 1,000,000 strings, which most computers can generate in a second. Then, we just need to check whether these two sets of strings have the same size (if not, we stop) and, if they do, we compare each element of the first to each one of the second and stop as soon as one element of the first is not in the second (perhaps by first sorting the two sets and then scanning the sorted lists). Some data structures might be more effective than others for these operations, but, bottom line, it is a very fast process that ILP has a hard time beating.
Therefore, we want to show that ILP is the way to go when the naïve approach cannot work, namely, when the patterns have so many gaps in them that a complete expansion (which exponentially increases the data size) is out of question. This poses the problem of how to create non-trivial, interesting instances of equivalent pattern sets which have a large number of gaps.

6.3. Diagonal Instances

A simple way of creating instances with equivalent pattern sets is as follows. For every n, we consider two equivalent sets of patterns, which generate all strings with the exception of the string 11 11 . We call these diagonal instances.
The first set has n patterns, with a maximum number of gaps n 1 , and is the following (exemplified for n = 6 ):
0 - - - - - - 0 - - - - - - 0 - - - - - - 0 - - - - - - 0 - - - - - - 0
The second set has 2 n 1 patterns, with a maximum number of gaps n 2 , and is
01 - - - - 0 - 1 - - - 0 - - 1 - - 0 - - - 1 - 0 - - - - 1 10 - - - - 1 - 0 - - - 1 - - 0 - - 1 - - - 0 - 1 - - - - 0 000000
We perform a sequence of tests to compare the ILP approach with the complete enumeration algorithm, for increasing values of n. These diagonal instances turn out to be very easy for the ILP model. They are all solved in less than 0.1 s, for n 30 as can be seen from Table 2. The enumerative approach, however, becomes very soon impractical. For n = 28 the algorithm takes already more than 15 min, while for n = 30 the algorithm has not finished after one hour (which we set as a maximum time limit). It is interesting to notice how the ILP approach solves this instance in less than a second also for n = 100 , while the enumerative approach would have to generate 2 100 1 strings.

6.4. Generating Equivalent Pattern Sets in General

We can adopt the following strategy to create two sets of equivalent patterns:
  • We generate a small starting set P (e.g., | P | = 3 , 4 ) of random patterns. Each pattern is obtained by setting each bit to “-” with some probability q, to 0 with some probability p < 1 q , and to 1 with probability 1 p q . Since we are interested in patterns with many gaps, we set q to a large value (e.g., 0.8). Let g be the minimum number of gaps appearing in some pattern of P.
  • The patterns in P are expanded in all possible ways yielding a set S of strings.
  • We compute with the recursive procedure described in Section 4 (slightly modified) the set P of all patterns compatible with S which have at least g gaps each (this ensures that it is always possible to cover S with patterns in P ).
  • We compute two random solutions of the set covering problem. Namely, from P we select (picking patterns at random until we have a cover) two subsets P 1 , P 2 that are covers of S.
  • P 1 , P 2 are equivalent by construction and have a fairly large number of gaps in each pattern.
In our implementation, because of memory problems, the above procedure works for n 20 . Thus, to build larger instances we use a trick. Namely, we create instances starting from instances built as above and then combining them into larger and larger ones as explained below.

6.5. How to Boost Instances of Pattern Equivalence

One way to increase the number of gaps in the instances would be to take two set of equivalent patterns A and B and suffix each pattern with a list of k gaps. This, however, yields very particular, uninteresting, instances. In order to obtain more elaborate, hard pattern equivalence instances, we have developed the following scheme.
Given a set of patterns X, denote by
- n ( X ) the number of columns (i.e., string length),
- m ( X ) be the number of rows (i.e., of patterns),
- G ( X ) the maximum number of gaps in some pattern.
Furthermore, given sets of patterns A and B, denote by C = A × B the set of patterns
C = { ( a , b ) : a A , b B }
Note that n ( C ) = n ( A ) + n ( B ) , m ( C ) = m ( A ) · m ( B ) and G ( C ) = G ( A ) + G ( B ) . We have
Claim 1.
Let A 1 , A 2 , B 1 , B 2 be sets of patterns such that A 1 is equivalent to A 2 and B 1 is equivalent to B 2 . Then A 1 × B 1 is equivalent to A 2 × B 2 .
Proof. 
Let C i : = A i × B i . We want to show that S ( C 1 ) = S ( C 2 ) . We show S ( C 1 ) S ( C 2 ) since the other direction is symmetrical. Let ( x , y ) S ( C 1 ) . In particular, x S ( A 1 ) and y S ( B 1 ) . Since A 1 is equivalent to A 2 also x S ( A 2 ) and similarly y S ( B 2 ) . Hence ( x , y ) S ( A 1 ) × S ( B 2 ) = S ( C 2 ) . □
By using this trick repeatedly, we can snowball from small instances, e.g., 4 or 5 patterns with 5 or 6 gaps each, to instances with a few hundred patterns with more than 20 gaps each.

6.6. Experiments

We have created 10 instances of size n = 30 each. Each instance is built by combining two equivalent instances of size n = 15 each, which were built with our procedure with parameters q = 0.8 , p = 0.1 and | P | { 3 , 4 } . The results are reported in Table 3. These instances turned out to be too difficult to be solved by the enumerative approach in less than half hour each.
For each set of input patterns ( i = 1 , 2 ) we have:
  • m i is the number of input patterns.
  • g i is the minimum number of gaps per pattern.
  • G i is the maximum number of gaps per pattern.
  • a i is the average number of gaps per pattern.
The ten instances are reported in Table 3 sorted by running times (9-th column). The 10-th and 11-th columns report the number of branch-and-bound nodes required (root node excluded) for solving models (3) and (4), respectively.
The results show that the ILP approach is effective also for instances which are large and enough and cannot be tackled by enumerative approaches.
In order to test the same ILP models in case the pattern sets are not equivalent we have randomly perturbed the previous data. The running times remained practically the same.

7. Conclusions

In order to use feature selection and LAD in the analysis of binary data consisting of positive and negative samples, one has to identify which computational problems might arise and how to overcome them. One of the issues that we have addressed in this paper is in fact the computational complexity of the problems, which we have shown to be, in general, very hard. As a viable approach to the effective solution of some of these problems, we have described integer linear programming formulations. In particular, we have given ILP models for the problem of determining if two sets of patterns are equivalent and for finding a min-size set of patterns which explain a given data set. A striking consequence of our complexity results is that there could be no simple ILP model for finding a minimal set of patterns explaining the same data set explained by a given pattern set. Developing some procedures for this last problem could be a line of future research.

Author Contributions

Conceptualization, methodology, software, validation, writing—original draft, review and editing: G.L. and P.S. Both authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jaiwei, H.; Jian, P.; Micheline, K. Data Mining: Concepts and Techniques; Morgan Kaufmann: Burlington, MA, USA, 2011. [Google Scholar]
  2. Kantardzic, M. Data Mining: Concepts, Models, Methods, and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
  3. Dash, M.; Liu, H. Feature Selection for Classification. Intell. Data Anal. 1997, 1, 131–156. [Google Scholar] [CrossRef]
  4. Felici, G.; de Angelis, V.; Mancinelli, G. Feature Selection for Data Mining. In Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques; Felici, G., Triantaphyllou, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 227–252. [Google Scholar]
  5. Stanczyk, U.; Zielosko, B.; Jain, L.C. Advances in Feature Selection for Data and Pattern Recognition: An Introduction Advances in Feature Selection for Data and Pattern Recognition; Intelligent Systems Reference Library; Stacczyk, U., Zielosko, B., Jain, L., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; Volume 138. [Google Scholar]
  6. Alexe, G.; Alexe, S.; Bonates, T.O.; Kogan, A. Logical analysis of data—The vision of Peter L. Hammer. Ann. Math. Artif. Intell. 2007, 49, 265–312. [Google Scholar] [CrossRef]
  7. Chikalov, I.; Lozin, V.; Lozina, I.; Moshkov, M.; Son Nguyen, H.; Skowron, A.; Zielosko, B. Logical Analysis of Data: Theory, Methodology and Applications. In Three Approaches to Data Analysis; Springer: Berlin/Heidelberg, Germany, 2013; pp. 147–192. [Google Scholar]
  8. Hammer, P.; Bonates, T. Logical Analysis of Data: From Combinatorial Optimization to Medical Applications; RUTCOR Research Report, 10-05; Rutgers University: New Brunswick, NJ, USA, 2005. [Google Scholar]
  9. Bertolazzi, P.; Felici, G.; Festa, P.; Lancia, G. Logic classification and feature selection for biomedical data. Comput. Math. Appl. 2008, 55, 889–899. [Google Scholar] [CrossRef] [Green Version]
  10. Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999, 286, 531–537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Li, T.; Zhang, C.; Ogihara, M. A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 2004, 20, 2429–2437. [Google Scholar] [CrossRef] [PubMed]
  12. Lancia, G.; Serafini, P. The Complexity of Some Pattern Problems in the Logical Analysis of Large Genomic Data Sets. In Bioinformatics and Biomedical Engineering. IWBBIO 2016; Lecture Notes in Computer Science; Ortuno, F., Rojas, I., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; Volume 9656. [Google Scholar]
  13. Lancia, G.; Mathieson, L.; Moscato, P. Separating sets of strings by finding matching patterns is almost always hard. Theor. Comput. Sci. 2017, 665, 73–86. [Google Scholar] [CrossRef] [Green Version]
  14. Boccia, M.; Sforza, A.; Sterle, C. Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches. Informs J. Comput. 2020. [Google Scholar] [CrossRef]
  15. Serafini, P. Classifying negative and positive points by optimal box clustering. Discret. Appl. Math. 2014, 165, 270–282. [Google Scholar] [CrossRef]
  16. Boros, E.; Hammer, P.; Ibaraki, T.; Kogan, A.; Mayoraz, E.; Muchnik, I. An implementation of Logical Analysis of Data. IEEE Trans. Knowl. Data Eng. 2000, 12, 292–306. [Google Scholar] [CrossRef] [Green Version]
  17. Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completeness; W.H. Freeman and Company: San Francisco, CA, USA, 1979. [Google Scholar]
  18. Cormen, T.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms, 3rd ed.; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Table 1. Results for Pattern Cover Minimality.
Table 1. Results for Pattern Cover Minimality.
p = 0.1 p = 0.25 p = 0.5
m | P ( S ) | OptTime#nod | P ( S ) | OptTime#nod#patOptTime#nod
100300600.0380136830.0020102980.0010
341570.0230137840.0020104960.0010
322580.0090119880.0020103970.0010
314560.0110117910.0030103970.0040
302660.0080123850.0020104960.0020
298580.0070139800.0030102980.0020
317550.0140112920.0020102980.0010
298580.0080126820.0080103970.0020
292630.0080128850.0020101990.0020
282660.0110129870.00201001000.0020
100091183800.220037295460.103012348430.0310
91313810.219037705440.097011978530.0310
85674000.215039125500.092012458250.0320
89633840.205037045570.093012138450.0340
87573920.215035405650.085012148380.0350
86513960.203037225430.098012188370.0470
87353690.207038135530.095012268380.0310
88883890.225035205560.094012198460.0300
87093890.220036365720.101012328380.0340
88743810.349036775660.098012248280.0560
5000117,55813237.282070,72616145.421011,21127670.7580
118,36512777.080069,99716144.771011,20627520.8010
116,93612826.668068,98916233.882011,23227700.7460
118,08712816.592069,96416089.634011,14427570.7080
118,43512829.959068,92016184.576011,04227870.7370
115,73712936.254068,60916315.847011,28427550.7460
117,60513016.473068,65516265.419011,22627530.7110
116,01513016.539068,48316314.817011,13627860.6850
116,91012526.996068,24215989.475011,21327440.7080
116,53413017.444069,46616137.985011,31727490.7470
10,000431,876194452.00295,3602197397.0040,5743740281.41302
430,941198947.80295,7032250435.9040,6473728513.81020
433,162199739.30294,7832236388.0040,6643738485.2847
426,007199260.80304,4322199397.3040,4383759333.1914
431,029199666.70293,0312255404.6040,46737281004.42254
434,909193540.80295,1672238442.9040,68637511141.86319
431,853194340.60293,0732221418.7040,1343765307.8729
430,508195170.70297,2342242386.9040,2533750381.11163
431,020195858.70292,2702230399.4040,79336962608.64682
432,664195744.10293,6972233442.3040,53837031802.43784
Table 2. Computational results for Pattern Equivalence—Diagonal instances.
Table 2. Computational results for Pattern Equivalence—Diagonal instances.
nILPenum
180.03 s0.34 s
200.05 s1.38 s
220.07 s6.43 s
240.08 s29.89 s
260.08 s138 s
280.05 s1005 s
300.08 sdid not finish
Table 3. Computational results for Pattern Equivalence—boosted instances.
Table 3. Computational results for Pattern Equivalence—boosted instances.
m 1 g 1 G 1 a 1 m 2 g 2 G 2 a 2 Time(s)BB Nodes1BB Nodes 2
264222322.18266222422.190.43023
270242524.03150242524.040.551021
273222222.00220222322.130.700327
266232423.10279232623.202.2148275
735222422.151089222522.184.0230264
342232523.32667232523.074.286840
456222422.30453222522.2915.1621251512
735222522.35540222522.2350.529273095
688202120.09360202120.1676.2416941940
784212221.18261212421.64172.28135041811
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lancia, G.; Serafini, P. Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data. Algorithms 2021, 14, 235. https://doi.org/10.3390/a14080235

AMA Style

Lancia G, Serafini P. Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data. Algorithms. 2021; 14(8):235. https://doi.org/10.3390/a14080235

Chicago/Turabian Style

Lancia, Giuseppe, and Paolo Serafini. 2021. "Computational Complexity and ILP Models for Pattern Problems in the Logical Analysis of Data" Algorithms 14, no. 8: 235. https://doi.org/10.3390/a14080235

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop