Next Article in Journal
NFT-Vehicle: A Blockchain-Based Tokenization Architecture to Register Transactions over a Vehicle’s Life Cycle
Next Article in Special Issue
Novel Algorithm for Linearly Constrained Derivative Free Global Optimization of Lipschitz Functions
Previous Article in Journal
New Closed Form Estimators for the Beta Distribution
Previous Article in Special Issue
A Matheuristic Approach to the Integration of Three-Dimensional Bin Packing Problem and Vehicle Routing Problem with Simultaneous Delivery and Pickup
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Set Covering and Other Problems: An Empiric Complexity Analysis Using the Minimum Ellipsoidal Width

Industrial Engineering Department, Universidad de Santiago, Ave. Victor Jara 3769, Santiago 9170124, Chile
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(13), 2794; https://doi.org/10.3390/math11132794
Submission received: 26 April 2023 / Revised: 2 June 2023 / Accepted: 13 June 2023 / Published: 21 June 2023
(This article belongs to the Special Issue Operations Research and Optimization)

Abstract

:
This research aims to explain the intrinsic difficulty of Karp’s list of twenty-one problems through the use of empirical complexity measures based on the ellipsoidal width of the polyhedron generated by the constraints of the relaxed linear programming problem. The variables used as complexity measures are the number of nodes visited by the B & B and the CPU time spent solving the problems. The measurements used as explanatory variables correspond to the Dikin ellipse eigenvalues within the polyhedron. Other variables correspond to the constraint clearance with respect to the analytical center used as the center of the ellipse. The results of these variables in terms of the number of nodes and CPU time are particularly satisfactory. They show strong correlations, above 60%, in most cases.

1. Introduction

The NP-completeness of Karp’s 21 problems list dates back to 1972 [1]. It is a list of classical problems that meet computational complexity characteristics. Some of the list’s problems solved in this paper included the set covering problem, the set packaging problem, the knapsack multi-demand problem, and some general integer programming problems. There were statistically significant relationships between the branching and bound tree number of nodes and the resolution time. Explanatory variables included geometric measurements corresponding to an inner Dikin ellipse that replicates the shape of the linear polyhedron. The test problems used are classics in combinatorics, computer science, and computational complexity theory.
The set covering problem, also known as SCP, is an NP-complete class problem. Solutions to these problems usually consist of finding a solution set to cover, totally or partially, a set of needs at the lowest possible cost. In many cases, the distance or the response time between customers and service delivery points is critical to customer satisfaction. For example, if a building catches fire, the fire station response time is vital; the longer the delay, the greater the building damage. In this case, the SCP model ensures that at least one fire station is at a close enough distance in order for fire engines to reach the building within a certain time. Set packing is also a classic problem. It consists of packaging sets of disjoint k subsets. The problem is visibly an NP problem because, given k subsets, subsets are disjoint 2 to 2 in polynomial time [2,3]. The optimization problem consists of finding the maximum number of sets, from 2 to 2 disjoints in a list. It is a maximization problem formulated as a packaging integer programming problem, and its dual linear problem is the set cover problem [4]. The multi-dimensional knapsack (MKP) problem involves selecting a set of items to carry in a knapsack subject to one or more restrictions. These may be the knapsack weight or volume. The objective function of this problem seeks to maximize a linear function in 0–1 variables subject to knapsack constraints. Finally, the multi-dimensional and multi-demand knapsack problem is the multi-dimensional knapsack problem to which added compliance restrictions present some demand conditions [5]. The CPU time is key when solving an MIP/BIP problem using the branch and bound ( B & B ) algorithm. It depends on the search tree size associated with the algorithm. B & B finds the solution by recursively dividing the search space. The space is a tree where the root node is associated with the integer solution space. The brother nodes are a solution space partition of the parent node. At each node, the subspace is an MIP/BIP and its linear programming (LP) relaxation is solved to supply a linear solution. If the LP is not workable or better than the primary solution (the best integer value found), the procedure prunes the node. A previous article proposed complexity indices to estimate the B & B tree size, which applies to the multi-dimensional backpack problem [6].

1.1. B & B Tree Counting Literature Review

Knuth [7] proposed the first method to estimate the search tree size of the branch and bound ( B & B ) algorithm. This method works by repeatedly sampling the search tree sequence and estimating a measure of the node number to disaggregate. This calculates the B & B tree by repeatedly following random paths from the root. As the search may prove to be extensive, Purdom [8] improved the method by following more than one path from a node. Tree size was estimated by performing a partial backtracking search. This modification exponentially improves results with the tree height. Purdom called this method Partial Backtracking. Chen [9] improved Knuth’s proposed method using heuristic sampling to estimate its efficiency. This updated version produces significantly higher efficiency estimates for tree search strategies commonly used. These are depth first, breadth first, best first, and iterative deepening. Belov et al. [10] combined Knuth’s sampling procedure with the abstract B & B tree developed by Le Bodic and Nemhauser [11]. Knuth’s original method uses the distribution of node knowledge in a B & B tree by reducing the variance in tree size estimates, while Le Bodic and Nemhauser provide a theoretical B & B tree model. Belov et al. combined these two methods to obtain a significant estimation accuracy increase. According to Belov et al. and the conducted experiments, the error decreased by more than half in the a priori estimation. The abstract tree developed by Le Bodic and Nemhauser [11] is a formula based on the concept of gain by branching into any node. They use an a priori estimate of the gain obtained by branching left or right into any node. They also use what they call gap, which is the gain value that allows for obtaining the optimal integer solution via branching. Their use of the abstract tree seeks to find the best variable to branch, i.e., to obtain a tree of minimum size. Other estimation methods are the Weighted Backtrack Estimator [12], Profile Estimation [13], and the Sum of Subtree Gaps [14]. Recently, Refs. [15,16,17] developed methods based on machine learning in the context of integer programming. Fischetti in [18] proposed a classifier to predict specific points online. Finally, Hendel et al. [19] developed a new version of the old method of estimating the B & B tree. They integrated Le Bodic and Nemhauser’s [11] theoretical tree and new measures such as "leaf frequency". A leaf in the B & B tree is a node that does not dis-aggregate, and this may be because it delivers an integer solution, which is unfeasible or needs pruning. They then use this and other measures of algorithm progress using machine learning’s random forest model to estimate the size of the B & B tree. Next, they integrate this technique into the SCIP constrained integer programming software [20]. The application of these methods occurs throughout the algorithm, since they require a few iterations to start with the estimate, and there are iterations for recalculations. Its accuracy also grows as the algorithm progresses, and so does the availability of information.

1.2. Our Contribution to the Problem of Estimating the B&B Tree

Our estimation of the B & B tree research line, including the methods developed in this work, follows the conditioning concept in integer programming [21,22,23]. Vera and Derpich [24] proposed dimensions for the polyhedron width, based on m (number of constraints) and n (number of variables). The dimension used is to estimate the number of upper bounds of iterations of the B & B algorithm and the Lenstra algorithm [25]. Vera and Derpich also proposed two measures concerning the polyhedron ellipsoidal width, which are the maximum slack and a term called the “distance to ill-posedness of the integer problem”, which Vera documented in [22]. The number of iterations of the B & B algorithm’s proposed dimensions [25] corresponds to worst-case dimensions. They are similar when compared with those proposed by Le Bodic and Nemhauser [11], since both give values far from the real values obtained. Vera and Derpich’s [22] proposed dimensions basis includes concepts that reflect the shape and spatial orientation of the polyhedron. Earlier approaches do not capture these factors. Therefore, these dimensions also predict the B & B tree node number and CPU time. These are the conceptual basis of this work. The indices developed in this work use a Dikin ellipse inside the polyhedron. They show a good correlation with the B & B algorithm CPU time and the number of nodes visited. The Dikin ellipse allows for the estimation of the polyhedron ellipsoidal width, which iterates n times to estimate the B & B tree. This enables applications to new, related geometric indices, starting with the constraints of the linear programming problem generated by relaxing the integer variables. The proposed indices’ basis is the concept of polyhedron flatness. This means that if a polyhedron is thin in some direction, the B & B algorithm might run faster. In this article, we seek to test how much this idea influences the B & B tree size, characterized by the number of nodes visited and the CPU time. The underlying conjecture for the proposed experimental design is that a narrower polyhedron will be faster to cross. Therefore, the B & B tree will be smaller and vice versa, i.e., in a narrower polyhedron, the B & B tree will be larger. These new factors are related to the dimensions of the matrices associated with the polyhedron of the relaxed problem, as well as the maximum and minimum slacks with respect to the center of the ellipse. To test the relationship between these measures and the B & B tree’s number of nodes, we designed an experimental study and found a strong linear correlation. The empirical study included set covering, set packing, and other general problems of integer programming. Data came from the public library MIPLIB [26,27,28]. We also assessed the multi-demand multi-dimensional knapsack (MDMKP) problem. Data were taken from OR-Library [28].
Following this introduction, and because they support the proposed complexity indices, Section 2 develops the concepts related to the polyhedron ellipsoidal width. Section 3 presents the problems under study in mathematical formulations, which are set covering, set packing, and multi-demand multi-dimensional backpack. Section 4 presents the experimental design, detailing the test problems used and the results obtained. Section 5 presents a discussion in which the work is compared with others that are similar in some way, and, finally, Section 6 summarizes the main findings and future work.

2. Methods and Materials

Polyhedron Ellipsoidal Width

Let K be a convex set in which R n ; we define the integer width of K as follows:
w z ( K ) = w ( v , K )
w ( v , K ) = { v T x : x K } { v T x : x K }
If we restrict the vectors v to the Euclidean space set of unit vectors, with a unitary norm, we obtain the total width according to the coordinate axes { x 1 , x 2 , , x n } . The integer width is a very interesting geometric measure. It is related to the existence of at least one integer point. If there is at least one integer point, this width cannot be too small. This is an important result because [29] stated that w Z ( K ) f ( n ) if K does not contain an integer point. Lenstra considered f ( n ) to be of the order of c o n 2 , where c 0 is a constant. The problem approach that we study in this paper is as follows.
max : { c T x : A x b , x 0 , x Z n }
We assume that the polyhedron given by A x b is bounded and denotes the problem data with the letter d, so that d = ( A , b ) R m x n + m . This is the problem-specific instance. We denote by P ( d ) the { x : A x b } polyhedron, and by α 1 ,…, α m , { x : A x b } the row vectors of A. Next, we analyze an application of Lenstra’s flatness theorem [30]. We obtain a dimension that depends on geometry rather than dimensional factors. As in the classical flatness theorem analysis, the basis of the estimate lies in rounding the polyhedron using inscribed and circumscribed ellipses. Ellipses are intended to interpret the polyhedron shape with certainty. Therefore, let us build a pair of ellipses with a common center x 0 .
E = { x R n : ( x x 0 ) T Q ( x x 0 ) 1 }
and
E = { x R n : ( x y ) T Q ( x y ) γ 2 }
so that
E P ( d ) E
where Q is a definite positive matrix. The matrices have different possibilities, depending on the value of γ . John proposes [31] an ellipse E with minimal volume by making γ = n However, computing x 0 becomes a hard problem. We use an approach based on the classic setup of interior point methods in convex linear optimization [32]. Suppose that we know a self-concordant barrier function Φ , on a convex body, with parameter v as in Nesterov and Nemirosky [33]. Then, let
x 0 = a r g m i n { Φ ( x ) : x i n t   P ( d ) }
Let Q = 2 Φ ( x 0 ) and E be a unit radius inner ellipse, known as a Dikin ellipse. Thus, if we take γ = m + 1 , for example, we use the traditional logarithmic barrier function
Φ ( x ) = i = 1 n l o g ( b i α i t x ) ,
with v = m . The point x 0 is the K = P ( d ) analytical center and the matrix Q is
Q = A T D ( x 0 ) 2 A
with
D ( x ) = d i a g ( b 1 α 1 T x , b 2 α 2 T x , , b m α m T x )
where d i a g ( ) denotes a diagonal matrix constructed with the corresponding elements. The fact that the matrix Q naturally connects with the polyhedron’s geometric properties justifies the ellipsis construction choice. The following is the ellipses’ geometrical result.
Proposition 1. 
Let Q be a positively defined symmetric real matrix by defining a pair of ellipses as in (4). Then,
W z ( P ( d ) ) 2 ( m + 1 ) m i n ( u t Q 1 u ) : u ϵ Z n , u 0
Demonstration.
The term m i n ( u t Q 1 u ) is ellipse E’s radius, according to the direction u.
2 m i n ( u t Q 1 u ) is the ellipse E’s width, according to vector u.
Multiplying by ( m + 1 ) , we obtain the expanded ellipse E .
Then, 2 m i n ( u t Q 1 u ) is ellipse E ’s width, according to the vector u.
The value 2 m i n ( u t Q 1 u ) is greater than the polyhedron P ( d ) ’s width, according to vector u .
Proposition 2. 
Let v 1 , , v n be the positive definite matrix Q orthonormal eigenvectors, and λ m i n be the smallest eigenvalue of Q. Then, for any u R n ,
u T Q 1 u ( 1 λ m i n ) i = 1 n v i T u 2
Demonstration.
Since Q is symmetric and defined as positive, the result comes from the fact that
Q 1 = i = 1 n 1 λ i v i v i T
It follows that
u T Q 1 u = i = 1 n 1 λ i ( u T v i ) 2
Then, taking the minimum eigenvalue, we have
u T Q 1 u i = 1 n 1 λ m i n v i T u 2
Because we use it in our analysis, we describe the result of Vera [22], which relates the matrix Q eigenvalues to the matrix A T A eigenvalues and other data.
Proposition 3. 
Let Q = A T D ( x 0 ) 2 A with D ( x ) = d i a g ( b 1 α 1 T x , , b n α m T ) , b i α i T > 0 , i = 1 , , m . Let λ m i n and λ m a x be the smallest and largest Q eigenvalues, respectively. Let μ m i n and μ m a x be the smallest and largest A T A eigenvalues, respectively. Additionally, let h m a x be the highest and lowest D ( x ) values. Then, it fulfills
λ m i n μ m i n ( h m a x ( x 0 ) ) 2
λ m a x μ m a x ( h m i n ( x 0 ) ) 2
Demonstration [22,24].
Proposition 4. 
Let λ i and v i be the matrix Q eigenvalues and eigenvectors, respectively, i = 1 , , n Let u be a possible solution. Then, we have the following:
w ( u , P ( d ) ) 2 ( m + 1 ) v m a x 2 h m a x ( x 0 ) u m i n
Demonstration.
From Proposition 1’s demonstration, we have
W z ( P ( d ) ) 2 ( m + 1 ) m i n ( u t Q 1 u ) : u ϵ Z n , u 0
From Proposition 2, we have
u T Q 1 u ( 1 λ m i n ) i = 1 n v i T u 2
Taking an upper bound with a norm-2 higher eigenvector, we have
i = 1 n ( v i T u ) 2 i = 1 n ( v m a x T u ) 2
Assuming that u 1 , then
i = 1 n ( v i T u ) 2 i = 1 n ( v m a x T u ) 2 v m a x 2 2
Additionally, from Proposition 3, we have
1 λ m i n ( h m a x ( x 0 ) ) 2 u m i n
w ( u , P ( d ) ) 2 ( m + 1 ) ( 1 λ m i n ) i = 1 n ( v i T u ) 2 2 ( m + 1 ) h m a x ( x 0 ) u m i n v m a x 2
In a previous paper, Ref. [34] built a disjunction to branch variables in the B & B , based on the associated linear polyhedron ellipsoidal. This simultaneously branches various variables, as if it was a super ruler that uses ellipsoidal width u T Q 1 u . This rule proved to be more efficient than the known strong branching rule. The latter often leads to a smaller search tree, although it requires much more time to select branching variables. Figure 1 shows the ellipses used, in an exponential rounding approach, where Q is a positive definite matrix. This is a shortest vector problem version. Micciancio [35] considers it a difficult problem. We use Proposition 4 as a dimension only. Therefore, we do not solve it optimally. However, we use an upper bound of the optimal value, which captures some aspects of the original problem that reproduce the intrinsic difficulty of a particular instance. Based on Proposition 4, we propose the next dimension related to polyhedron P ( d ) ’s geometry.

3. Experimental Design

This section describes the experimental design and shows the mathematical structure of the test problem considered, which is part of Karp’s list. The previous section found the variables related to geometric aspects. On this basis, a linear search established relationships between these variables. Additionally, two measures were related to the B & B tree, which were the number of explored nodes and the algorithm’s CPU time. Thus, the relationship searching work is fully experimental, and the results relate to the test problems only. They are not generalizable to other cases without re-running a similar experiment. The statistical model employed is a multiple regression model that uses the ANOVA test, using the F statistic to validate the overall model significance. The explained variables are the following:
  • The solved instance CPU time, using the B & B algorithm.
  • The number of nodes scanned by the B & B algorithm.
The explanatory variables studied were x i , i = 1 , 2 , 3 , 4 , 5 , 6 as follows:
  • λ m a x ( Q ) is the maximum eigenvalue of the matrix Q = A T H ( x o ) 2 A ;
  • λ m i n ( Q ) is the minimum eigenvalue of the matrix Q = A T H ( x o ) 2 A ;
  • μ m a x is the maximum eigenvalue of the matrix A T A ;
  • μ m i n is the minimum eigenvalue of the matrix A T A ;
  • h m a x ( x 0 ) = m a x i { b i α i T x 0 } is the maximum slack to the center x 0 ;
  • h m i n ( x 0 ) = m i n i { b i α i T x 0 } is the minimum slack to the center x 0 .
We constructed two multiple regression models, named Model 1 and Model 2. The first uses the CPU time as the explained variable. The second model uses the number of nodes as the explained variable. The models are as follows:
Model 1: Number of nodes = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6
Model 2: CPU time = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4 + β 5 x 5 + β 6 x 6

3.1. Used Test Problems

3.1.1. The Set Covering Problem

This problem seeks to find the minimum number of variables with which to cover all sets at least once. [35,36] provided a formal definition. Let us look at this problem through an example. Suppose that we must cover with antennas a set of geographical areas in a city. Then, we define x j as a binary variable, i.e., 1 if an antenna is located in the area and 0 otherwise. Installing an antenna in area j has the cost c j 0 and c j R n . There are n geographical areas. Then, j = 1 , , n . Formally, the set covering problem is expressed as follows:
m i n z = j = 1 n c j x j
subject to
j = 1 n x j 1 , j
x j = { 1 i f a n a n t e n n a i s i n t h e a r e a j 0 o t h e r w i s e
M j : areas served by an antenna located in area J.

3.1.2. The Set Packing Problem

Varizani [36] formulated the maximum set packing integer linear program as follows:
m a x = s S x S
subject to
s S x S 1 e U
x S { 0 , 1 } s S
where U is the cover set.

3.1.3. The Multi-Dimensional Knapsack Problem

All coefficients are non-negative. More precisely, we can assume, without loss of generality, c j 0 , b i 0 and j = 1 n a i j b i , i M Furthermore, any MKP with at least one of the parameters a i j equal to 0 may be replaced by an equivalent MKP with positive parameters, i.e., both problems have the same feasible solutions [29].
z = m a x j = 1 n c j x j
subject to
j = 1 n a i j x j b i i M = { 1 , , m } ,
x j { 0 , 1 } j N = { 1 , , n } ,

3.1.4. The Multi-Demand Multi-Dimensional Knapsack Problem

This problem comes from OR-Library and Beasley documented it in 1990. There are nine data files: MDMKPC T1,…, MDMKPC T9. Each file has fifteen instances. In total, there are 90 test problems, which are the test problems of [29]. The MDMKP problem to solve is
z = m a x j = 1 n c j x j
subject to
j = 1 n a i j x j b i i = 1 , , m
j = 1 n a i j x j b i i = m + 1 , m + 2 , , m + q
x j { 0 , 1 } j N = { 1 , , n } ,
MDMKP instances result from appropriately modifying the MKP instances resolved in each combination cost type (either positive or mixed), and q number of constraints = (q = 1, q = m / 2 and q = m , respectively). Number of test problems ( K = 15 ), 6 cost coefficients c j , j = 1 , , n . The first 3 correspond to the positive cost case for q = 1 , q = m / 2 and q = m constraints, respectively. The last 3 correspond to the mixed cost case for q = 1 , q = m / 2 and q = m constraints, respectively [29].
We took the test problems from two public libraries, MIPLIB and OR-Library. Table 1 shows problems from the MIPLIB library. There are binary and MIP programs, as well as other types of problems. The considered problems are knapsack, set covering, set packing, and other problems. The test problems from OR-Library correspond to the multi-dimensional knapsack problem and the multi-demand and multi-dimensional knapsack problem.

4. Results

Table 1 shows the problems solved using the MIPLIB library. These were inequality-type constraint problems. They included set covering, set packing, knapsack multi-dimensional, knapsack multi-dimensional and multi-demand, general integer, binary, and integer. It should be noted that some resolution times were lengthy.

4.1. MIPLIB Library Test Problem Results

Table 2 and Table 3 show the results of the nodes and CPU times obtained by solving the problems optimally with different threads for instances of the MIPLIB library. The most time-consuming problem was mas74, which took 25,447 s to solve with two threads.
High CPU times coincide with the number of highly scanned nodes, which is a sign of the consistency of the results. However, the average CPU time per node scanned is 0.017 s, with a standard deviation of 0.0631. This gives a coefficient of variation of 3.7, which indicates that these results are highly dispersed. The average number of restrictions of the instances is 5850.82 and the average number of variables is 1521.11, while the standard deviation is 8390.56 and 2711.22, respectively. This gives coefficients of variation of 1.43 and 1.78, respectively, for constraints and variables. Comparing these dispersions with the CPU time/node results, it can be concluded that the instances are less dispersed than the resolution results.
Table 4 shows the calculated predictor geometric values, which are λ m i n ( Q ) , λ m a x ( Q ) , μ m i n , μ m a x , h m i n ( x 0 ) , and h m a x ( x 0 ) . It is observed that there is a group of high values of λ m a x ( Q ) , μ m a x , and h m a x ( x 0 ) . However, no relationship is observed with the values of nodes visited, nor with CPU time.
The results of Table 4 were obtained through the development of ellipses as explained in Section 2. It should be noted that the main difficulty in these calculations is found in the calculation of the analytical center (Expression 7). This is because a nonlinear problem must be solved, which was completed using Newton’s method, and it is not very efficient. Only those problems in which the calculation of these values took less than an hour were included.
Table 4 shows very high h m a x ( x 0 ) values, which coincides in instances 21 and 22 with low values of restrictions and variables.
Table 5 shows the results of the MIPLIB library Model 1 (nodes), for resolutions with different threads. When compared to other experiments with two, four, and eight threads, the correlation values were similar. When looking at the explanatory variables’ found values, the x 1 , x 3 , and x 4 values were similar for all experiments. Table 5 shows that nodes versus the MIPLIB library with two, four, and eight instances of threads have a statistical F test value that shows they are statistically significant at a 95% confidence level. The experiment with 12 nodes is significant with a 90% confidence level. All threads show a good fit, with correlation coefficient values ranging from a 0.61 maximum value to a 0.541 minimum. Variables λ m i n ( Q ) and h m a x ( x 0 ) are related to the polyhedron ellipsoid minimum width through Proposition 4. The λ m i n ( Q ) values are similar for all threads. This is the same as with the explanatory variable x 5 = h m a x ( x 0 ) coefficients. The regression coefficients show negative values for almost all variables, except for x 3 = u m a x . This has positive coefficient values for all threads. We also observe that all the explanatory variables’ coefficients are negative. This shows an inverse relationship with the B & B number of nodes. For example, the higher the λ m i n ( Q ) value, the greater the number of nodes generated, and vice versa. Additionally, we see that the correlation between different threads’ coefficient values is slightly different. The highest value is 0.611 and the lowest is 0.541. We must note that this last value was seen in the regression with 12 threads, showing a test value of F = 1.939 . This indicates that the experiment is not statistically significant.
Figure 2 shows the CPU time results for different MIPLIB library problems. We used the Cplex software with 2, 4, 8 and 12 threads. The two-threaded resolution offered the lowest CPU time. Figure 2 data are shown in Table 2. Figure 3 shows the Cplex software results of the nodes scanned for the MIPLIB library problems with 2, 4, 8, and 12 threads. The two-thread resolution offered the fewest visited nodes in most cases. Figure 3 data are shown in Table 3. In both figures, the great dispersion of values can be observed between the different problems solved. It can also be seen that the values that give high numbers correspond to the same resolved instances and that the different threads show similar results.
Table 6 shows the results of the MIPLIB library Model 2 (CPU times), for resolutions with different threads. Taking other experiments with two, four, and eight threads, the correlation values were similar, except for the runs with 12 threads, which showed a value of R = 0.38. Table 6 shows that nodes versus the MIPLIB library with two, four, and eight instances of threads had a statistical F-test value demonstrating statistical significance at a 95% confidence level. The experiment with 12 nodes was significant at a 90% confidence level. All threads show a good fit, with correlation coefficient values ranging from a 0.63 maximum value to a 0.38 minimum. When looking at the explanatory variables’ found values, the x1, x3, and x4 values are similar for all experiments. The λ m i n ( Q ) found values are similar for all threads. The same is true for all explanatory variables, while the regression coefficients show negative values for all variables. We also observed that all the explanatory variables’ coefficients were negative. This showed an inverse relationship with the B & B number of nodes. For example, the higher the λ m i n ( Q ) value, the greater the CPU time, and vice versa. Additionally, we observed that the correlations between different threads’ coefficient values were slightly different. The highest value was 0.63, and the lowest was 0.38. We must note that this last value was seen in the regression with 12 threads, which showed a test value of F = 0.804. This indicates that the experiment was not statistically significant.
The multiple linear regression model for the best coefficient F according to the data in Table 5 and Table 6 is as follows.
Model 1: Number of nodes = 244,916.2219 − 9.986 × 10 6 x 1 − 18.197 x 2 + 1.820 × 10 6 x 3 − 8.746 × 10 7 x 4 − 10,190.00 x 5 + 626,257 x 6
Model 2: CPU time = 220,408 + −4.445 × 10 8 x 1 − 0.08 x 2 − 4.851 × 10 10 x 3 − 5.067 × 10 9 x 4 − 23.83 x 5 − 2828.07 x 6

4.2. OR-Library Problems with MDMKP Problem Results

We solved a set of 30 problems divided into two sets of 15 problems each. We named them C t 1 and C t 2 , respectively. Table 7 shows the number of nodes of each instance of the set C t 1 and C t 2 for different threads. Table 8 shows the CPU times in seconds for each instance of the set c t 1 and C t 2 for different threads.
The estimation of Models 1 and 2 was performed with the 30 results obtained from the instances of Ct1 and Ct2. The regression results of Model 1 are shown in Table 9, and the results of Model 2 are shown in Table 10. The first notable result is that the set regression coefficients are values higher than 0.86 for Model 1 and 0.57 for Model 2. This is a medium–high correlation. It can be observed that in Model 1, the values of the correlation coefficient are high and that the F-test shows critical values lower than 1% for all the threads, which indicates that the experiment is statistically significant for all threads. Regarding Model 2, the experiments with two and four threads show critical values lower than 1%, while the results of experiments resolved with 8 and 12 threads present critical values higher than 5%, which makes them less reliable. This is curious since it would be expected that with more threads, the estimate would be more reliable. Finally, the most reliable model estimating the complexity of solving an integer programming model is Model 1, since the explained variable is the number of nodes visited by the B & B , while Model 2 uses the CPU time and this depends on the computer used.
The multiple linear regression model for the best coefficient F according to the data in Table 9 and Table 10 is as follows.
Model 1: Number of nodes = −3.5531 × 10 14 − 8516.669 x 1 1,099,153.01 x 2 + 0.0444 x 3 +1.776 x 4 − 8197.559 x 5 − 200,762.974 x 6 .
Model 2: CPU time = − 64,363,757,062 + −1.901 x 1 + 269.798 x 2 + 9.826 × 10 6 x 3 + 32,181,877,396 x 4 − 2.0594 x 5 + 411.355 x 6 .

4.3. Estimated Multiple Linear Regression Model Validation

To confirm the developed models, we first calculated the determination coefficient values corresponding to the correlation coefficient square R 2 . Second, we performed an F-test value analysis of variance and obtained the corresponding critical value
R 2 = c o v ( y , y 1 ) s d ( y ) s d ( y 1 )
where y is the observed value and y 1 . The typically used multiple correlation coefficient is ρ = R 2 .
Table 11 summarizes the implemented regression models. Each case shows the regression coefficient and its corresponding F-test value. Table 11 shows that one model only has a linear regression coefficient above 0.5. It is Model 2 (CPU time), solved with 12 treads. Accordingly, the F-test value is low, with a high statistical type I error. Model 1 (nodes) shows linear regression values above 0.5, with nine of them above 0.6. Therefore, we conclude that the used explanatory variables are adequate to explain the B & B algorithm’s number of nodes and CPU time.

4.4. Reliability and Generality Level

To estimate the experiments’ reliability and show their generality level, we conducted a reliability analysis, as shown in Table 12. We considered reliability in terms of two values: the multiple linear correlation coefficient Rho and the F-statistic. The former measures the estimate quality determined by the explanatory variables from x 1 to x 6 . The latter measures the performed experiments’ reliability. Both variables complement each other, as the experiments must be reliable and the estimation must have a high correlation. We provide Table 12 to show the Rho and F results. The first block shows the 95% and 90% confidence intervals of the MIPLIB public library problem instances for the Rho coefficient and the ANOVA test F-value. The Rho and F values are random variables, in an experimental sense, as they are the results of conducted experiments. We applied multiple linear regressions between the explained variable nodes and the explanatory variables x 1 to x 6 , and between the explained variable CPU time and the explanatory variables x 1 to x 6 .
We ran a total of 34 instances of the MIPLIB library corresponding to the set covering problem and other similar problem structures, and 30 instances of the OR-Library solving the MDMKP problem. We solved each problem set of 34 and 30 instances using RAM types with different numbers of operating system threads. Each thread generated one result, and these constituted the sample, whose size was 4. As usual, we assumed that the variables Rho and F followed an exponential distribution with unknown mean and variance. Thus, we used Student’s t-distribution to find the critical values needed to construct confidence intervals for the means of both variables, from the MIPLIB and OR-Library instances’ results. The results in Table 12 show that the average value for the Rho correlation coefficient for the explained variable nodes was 0.576 for MKIPLIB instances and 0.7776 for MDMKP instances. Both values show that the number of explanatory variables has a good capacity to estimate the number of nodes visited by the B & B algorithm.
The 95% confidence interval for Rho in the MIPLIB library instances, when the explained variable is the number of nodes, has a width of 16.8% with respect to the mean. This implies that with 95% probability, the Rho value will be between 0.528 and 0.625. For MDMKP instances, when the explained variable is the number of nodes, the confidence interval width is 16.6% of the mean. These values show that the explanatory variables from x 1 to x 6 are good predictors for the variable nodes visited by the B & B . The confidence interval indicates that with 95% probability, the Rho values are between 0.712 and 0.841.
The average value for the Rho correlation coefficient, when the explained variable was the CPU time, was 0.542 for MIPLIB instances and 0.639 for MDMKP instances. Both values show a good capability to estimate the CPU time variable. The confidence interval for this variable is 95%, with a width of 23.1% with respect to the mean for MIPLIB instances, and 14.9% with respect to the mean for MDKMKP instances. The 90% confidence interval shows a width of 12.8% with respect to the mean; this is narrower than the previous one and with a lower confidence level. Table 11 shows that the number of visited nodes is an explained variable with a better estimation capacity, which confirms the use of the B&B node tree as a measure of computational effort.
Regarding the F-statistical analysis, Table 12 shows that its variability is low when the explained variable is the number of nodes. The variation coefficient is 0.17 for MIPLIB instances and 0.08 for MDKP instances. When the explained variable is the CPU time, the values of the variation coefficient are 0.44 for MIPLIB and 0.54 for MDMKP instances. This analysis confirms that the number of nodes estimation, using the variables x 1 to x 6 , is highly reliable. The CPU time estimation, with the same variables, is moderately reliable.

5. Discussion

We compared this work’s results with other researchers’ findings. We found that the only comparable published result is that of Hendel et al., published in 2021 [20]. There is a substantive difference from our work. Hendel et al. presented estimation methods that drew on the results of B & B algorithm execution, whereas our estimators are applicable before the execution of B & B . Hendel et al.’s estimation methods implemented four predictors for the tree size using SCIP integer linear programming software [21]. These predictors estimated the gap between the number of nodes and the unknown final tree during the B & B algorithm’s execution. The prediction used one explanatory variable only, which was the number of leaves of the tree. A leaf is an end node that no longer branches. The used estimation methods included the tree weight, leaf frequency, Weighted Backtrack Estimator (WBE), and Sum of Subtree Gaps (SSG). Each of them uses a series with double exponential smoothing (DES). They used a level value and a trend value. The software, during the B & B algorithm’s execution, delivered the models’ data feedings. Hendel et al. [20] applied this to the MIPLIb 2017 library danoint instance. The results showed that the methods were unsuccessful until the execution was partially completed. After this point, the estimation improved, with good results after 80% execution. The prediction methods improved with greater data availability.
Our linear regression method uses geometric variables to estimate the tree size and the CPU time. It is comparable to Hendel et al.’s estimates with few iterations. Our method has 60% reliability given by the coefficient of determination. This % is higher than that of the methods in [20], for estimates up to 66% algorithm execution. In addition, our method to predict the B & B tree and CPU time to compute the explanatory variables is mathematically simple. It implies calculating Q matrix eigenvalues and other low-complexity calculations. Therefore, these complexity measures can be embedded into available software to predict the resolution time a priori. This is a topic that has great practical importance for available software efficiency. However, few researchers have examined the area, and there is a restricted volume of scientific production. We found no more than 10 publications, and most are outdated.
Finally, this study has some limitations. The first is that the results are valid for the data obtained with the tested problems. This is a limited sample that allows us to see a trend. It is not generalizable to a larger context, without the risk of extrapolation errors. Another limitation is that, for some problems, obtaining the analytical solution presents computational complications. This is because solutions are obtained via a nonlinear method, a Newton-type method. For many problems, our algorithm to obtain the analytical center took longer than ten hours to deliver a solution. The 10-h limitation is important because this study provides problem complexity indicators, and if the analytical center calculation takes a long time, it is no longer feasible to use it for these purposes.

6. Conclusions

In this work, we investigated integer programming based on the flatness theorem and conditioning in integer programming. It was a theoretical and applied work. We developed the measures and then implemented and tested them as B & B tree predictors. Within the integer programming context, we developed geometric measurements to estimate the CPU time and number of nodes visited by the B & B algorithm, based on the concept of conditioning in integer programming. The results showed high values for multiple correlation coefficients. The used explanatory variables came from one of the dimensions proposed for the width of the relaxed polyhedron ellipsoid constructed with the problem’s constraints. The explanatory variables correspond to expressions associated with a Dikin ellipse matrix within the polyhedron that replicates the shape of the polyhedron. Here, the analytical center was the analytical polyhedron center. One limitation of this work is the analytical center calculation. This is because solving a nonlinear problem requires a large amount of CPU time. In some problems, results exceed the ten-hour limit. The calculation of the center of the polyhedron is typical of the interior point methods for linear programming, such as the Karmakar algorithm and the ellipsoidal method, which use analysis techniques and nonlinear programming methodology. However, this is a bottleneck when we wish to obtain B & B effort estimation measures that need to be calculated quickly. Thus, one line of future work is to study how to speed up the calculation of these indices, so that they can be incorporated into linear optimization software. To achieve this, other centers of the polyhedron can be explored, such as the center developed by the method of the central path. This can be used directly as a feasible center of the Dikin ellipse, or it can be used to approximate the analytic center, under certain conditions. Its solution no longer requires solving a nonlinear problem, but a classic simplex.

Author Contributions

Conceptualization, I.D.; methodology, I.D. and J.V.; software, J.V.; validation, I.D. and J.V.; formal analysis, M.L.; investigation, I.D. and J.V.; writing—original draft preparation, I.D.; writing—review and editing, M.L.; visualization, M.L.; supervision, I.D.; project administration, M.L. funding acquisition, I.D. and M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by DICYT-USACH, Grant No. 062117DC.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge the support of the University of Santiago, Chile, and the Center of Operations Management and Operations Research CIGOMM.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Karp, R.M. Complexity of computer computation. In Reducibility among Combinatorial Problems; Springer: Berlin/Heidelberg, Germany, 1972; pp. 85–103. [Google Scholar]
  2. Skiena, S. The Algorithm Design Manual; Springer: New York, NY, USA, 1997; pp. 32–58. [Google Scholar]
  3. Crescenzi, P.; Kann, V.; Halldórsson, M.; Karpinski, M.; Woeginger, G. A compendium of NP optimization problems. Braz. J. Oper. Prod. Manag. Available online: http://www.nada.kth.se/~viggo/problemlist/compendium.html (accessed on 12 June 2023).
  4. Garey, M.; Johnson, D. Computers and Intractability: A Guide to the Theory of NP-Completeness; W.H. Freeman: New York, NY, USA, 1979. [Google Scholar]
  5. Fréville, A. The multidimensional 0–1 knapsack problem: An overview. Eur. J. Oper. Res. 2004, 155, 1–21. [Google Scholar] [CrossRef]
  6. Derpich, I.; Herrera, C.; Sepulveda, F.; Ubilla, H. Complexity indices for the multidimensional knapsack problem. Cent. Eur. J. Oper. Res. 2021, 29, 589–609. [Google Scholar] [CrossRef]
  7. Knuth, D. Estimating the efficiency of backtrack programs. Math. Comput. 1975, 29, 122–136. [Google Scholar] [CrossRef] [Green Version]
  8. Purdom, P.W. Tree size by partial backtracking. SIAM J. Comput. 1978, 7, 481–491. [Google Scholar] [CrossRef]
  9. Chen, P.C. Heuristic sampling: A method for predicting the performance of tree searching programs. SIAM J. Comput. 1992, 21, 295–315. [Google Scholar] [CrossRef] [Green Version]
  10. Belov, G.; Esler, S.; Fernando, D.; Le Bodic, P.; Nemhauser, G.L. Estimating the Size of Search Trees by Sampling with Domain Knowledge. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, 19–25 August 2017; pp. 473–479. [Google Scholar]
  11. Pierre Le Bodic, P.; Nemhauser, G.L. An Abstract Model for Branching and its Application to Mixed Integer Programming. Math. Program. 2015, 166, 369–405. [Google Scholar] [CrossRef] [Green Version]
  12. Lelis, L.H.; Otten, L.; Dechter, R. Predicting the size of depth-first branch and bound search trees. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, Beijing, China, 3–9 August 2013; pp. 594–600. [Google Scholar]
  13. Ozaltın, Y.; Hunsaker, B.; Schaefer, A.J. Predicting the solution time of branch-andbound algorithms for mixed-integer programs. INFORMS J. Comput. 2011, 23, 392–403. [Google Scholar] [CrossRef] [Green Version]
  14. Alvarez, M.; Louveaux, Q.; Wehenkel, L. A Supervised Machine Learning Approach to Variable Branching in Branch-and-Bound. Technical Report, Universite de Liege. 2014. Available online: https://orbi.uliege.be/handle/2268/167559 (accessed on 12 June 2023).
  15. Benda, F.; Braune, R.; Doerner, K.F.; Hartl, R.F. A machine learning approach for flow shop scheduling problems with alternative resources, sequence-dependent setup times, and blocking. OR Spectr. 2019, 41, 871–893. [Google Scholar] [CrossRef] [Green Version]
  16. Lin, J.C.; Zhu, J.L.; Wang, H.G.; Zhang, T. Learning to branch with Tree-aware Branching Transformers. Knowl.-Based Syst. 2022, 252, 109455. [Google Scholar] [CrossRef]
  17. Kilby, P.; Slaney, J.; Sylvie Thiebaux, S.; Walsh, T. Estimating Search Tree Size. In Proceedings of the Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, Boston, MA, USA, 16–20 July 2006; pp. 1–7. [Google Scholar]
  18. Fischetti, M.; Monaci, M. Exploiting Erraticism in Search. Oper. Res. 2014, 62, 114–122. [Google Scholar] [CrossRef] [Green Version]
  19. Hendel, G.; Anderson, D.; Le Bodic, P.; Pfetschd, M.E. Estimating the Size of Branch-and-Bound Trees. INFORMS J. Comput. 2021, 34, 934–952. [Google Scholar] [CrossRef]
  20. Bestuzheva, K.; Besançon, M.; Wei-Kun, C.; Chmiela, A.; Donkiewicz, T.; van Doornmalen, J.; Eifler, L.; Gaul, O.; Gamrath, G.; Gleixner, A.; et al. The SCIP Optimization Suite 8.0. 2021. Available online: https://optimization-online.org/2021/12/8728/ (accessed on 12 June 2023).
  21. Renegar, J.; Belloni, A.; Freund, R.M. A geometric analysis of Renegar’s condition number, and its interplay with conic curvature. Math. Program. 2007, 119, 95–107. [Google Scholar]
  22. Vera, J. On the complexity of linear programming under finite precision arithmetic. Math. Program. 1998, 80, 91–123. [Google Scholar] [CrossRef]
  23. Cai, Z.; Freund, R.M. On two measures of problem instance complexity and their correlation with the performance of SeDuMi on second-order cone problems. Comput. Optim. Appl. 2006, 34, 299–319. [Google Scholar] [CrossRef]
  24. Vera, J.; Derpich, I. Incorporando condition measures in the context of combinatorial optimization. SIAM J. Optim. 2006, 16, 965–985. [Google Scholar] [CrossRef]
  25. Lenstra, H.W., Jr. Integer programming with a fixed number of variables. Math. Oper. Res. 1983, 8, 538–548. [Google Scholar] [CrossRef] [Green Version]
  26. Koch, T.; Achterberg, T.; Andersen, E.; Bastert, O.; Berthold, T.; Bixby, R.E.; Danna, E.; Gamrath, G.; Gleixner, A.M.; Heinz, S.; et al. MIPLIB 2010: Mixed Integer Programming Library version 5. Math. Prog. Comp. 2011, 3, 103–163. [Google Scholar] [CrossRef]
  27. Gleixner, A.; Hendel, G.; Gamrath, G.; Achterberg, T.; Bastubbe, M.; Berthold, T.; Christophel, P.; Jarck, K.; Koch, T.; Linderoth, J.; et al. Miplib 2017: Data-Driven Compilation of the 6th Mixed-Integer Programming Library. Math. Program. Comput. 2017, 13, 443–490. [Google Scholar] [CrossRef]
  28. Beasley, J.E. OR-Library: Distributing test problems by electronic mail. J. Oper. Res. Soc. 1990, 41, 1069–1072. [Google Scholar] [CrossRef]
  29. Khintcine, A. A quantitative formulation of Kronecker’theory pf approximation. Izv. Ross. Akad. Nauk. Seriya Mat. 1948, 12, 113–122. (In Russian) [Google Scholar]
  30. Freund, R.M.; Vera, J.R. Some characterizations and properties of the “distance to ill-posedness” and the condition measure of a conic linear system. Math. Program. 1999, 86, 225–260. [Google Scholar] [CrossRef]
  31. Jhon, F. Extremum problems with inequalities as subsidiary conditions. In Studies and Essays; Intersciences: New York, NY, USA, 1948; pp. 187–204. [Google Scholar]
  32. Schrijver, A. Chapter 14: The ellipsoid method for polyhedra more generally. In Theory of Linear and Integer Programming; Wiley Interscience Series; John Wiley & Sons: Hoboken, NJ, USA, 1986; pp. 172–189. [Google Scholar]
  33. Nesterov, Y.; Nemirosky, A. Acceleration and parallelization of the path-following interior point method for a linearly constrainde convex quadratic problem. Siam J. Optim. 1991, 1, 548–564. [Google Scholar] [CrossRef]
  34. Elhedhli, S.; Naom-Sawaya, J. Improved branching disjunctions for branch-and-bound: An analytic center approach. Eur. J. Oper. Res. 2015, 247, 37–45. [Google Scholar] [CrossRef]
  35. Micciancio, D. The shortest vector in a lattice is hard to approximate to within some constants. SIAM J. Comput. 2001, 30, 2008–2035. [Google Scholar] [CrossRef] [Green Version]
  36. Vazirani, V. Approximation Algorithms; Springer-Verlag: Berlin, Germany, 2001; ISBN 3-540-65367-8. [Google Scholar]
Figure 1. Ellipsoidal rounding using a pair of Dikin ellipses.
Figure 1. Ellipsoidal rounding using a pair of Dikin ellipses.
Mathematics 11 02794 g001
Figure 2. CPU time for MIPLIB library problems.
Figure 2. CPU time for MIPLIB library problems.
Mathematics 11 02794 g002
Figure 3. Nodes scanned for MIPLIB library problems.
Figure 3. Nodes scanned for MIPLIB library problems.
Mathematics 11 02794 g003
Table 1. List of problems studied from MIPLIB library.
Table 1. List of problems studied from MIPLIB library.
NumberInstanceConstraintsVariablesNonzeroesIntegersBinariesConstraint
Classification
Version
MIPLIB
1opm2-z7-s231,798202379,76202023Knapsack2010
2mine 90-106270900154070900Knapsack2010
3mine 166-58429830194120830Knapsack2010
4opm2-z6-s115,533135041,84401350Knapsack2017
5opm2-z7-s831,798202379,75602023Knapsack2017
6reblock67252367074950670Knapsack2010
7m100n500k4r10050020000500Set covering2010
8iis-100-0-cov383110022,9860100Set covering2010
9iis-pima-cobv720176871,9410768Set covering2010
10iis-glass-cov537521463,9180214Set covering2017
11iis-hc-cov9727297142,9710297Set covering2010
12glass-sc611921463,9180214Set covering2017
13iis-bupa-cov480334538,3920345Set covering2017
14reblock16617,024166039,44201660knapsack2010
15macrophage31642260949202260Mip2010
16mik-250-20-75-1195270927017575Mip2017
17mik-250-20-75-2195270927017575Mip2017
18mik-250-20-75-3195270927017575Mip2017
19mik-250-20-75-4195270927017575Mip2017
20toll-like4408288313,22402883Knapsack2017
21mas761215116400150Knapsack2017
22mas741315117060150Set covering2017
23cod1051024102457,34401024Knapsack2017
24reblock1154735115013,72401150Mip2017
25neos563632016053MIB2017
26pg5_34225260077000100Mip2017
27gen-ip03646291303290Mip2017
28mik-250-20-75-51952709270175750Mip2017
29rmine6842983019,4120830Kna2017
30mik-250-1-100.1195251-150100Set covering2017
31sp98ic82510,894316,317010,894Mip2017
32neos1320,8521827253,84201815Set covering2017
33sp7ic103312,497316,629012,497Set covering2017
34cv08r139-9423981864645601864Set covering2017
Table 2. Nodes explored vs. different threads of Cplex (nodes) of MIPLIB library.
Table 2. Nodes explored vs. different threads of Cplex (nodes) of MIPLIB library.
ProblemCplex 2 Thread
Nodes 2
Cplex 4 Thread
Nodes 4
Cplex 8 Thread
Nodes 8
Cplex 12 Thread
Nodes 12
1opm2-z7-s1941112016722178
2mine 90-1028,15726,95729,06876,974
3mine 166-583710511142449
4opm2-z6-s774914181046890
5opm2-z7-s83382313029333881
6reblock67107,99491,05379,944125,561
7m100n500k4r1152,66546,19069,23285,296
8iis-100-0-cov223,353217,678148,831148,153
9iis-pima-cov22,17231,78042,21820,296
10iis-glass-cov79,065142,79680,25067,054
11iis-hc-cov160,515149,142134,323177,704
12glass-sc499,757507,667561,998501,703
13iis-bupa-cov353,578377,314295,739382,852
14reblock16670,24880,19248,87072,225
15macrophage101504976
16mik-250-20-75-131,40815,16010,35011,222
17mik-250-20-75-24185433664627590
18mik-250-20-75-3557012,73414,60418,770
19mik-250-20-75-4145,57556,77785,05659,346
20toll-like25,252114,548290,787149,336
21mas76180,932232,596327,2312,440,166
22mas743,717,7953,296,0233,167,1092,440,166
23cod 10583494747
24reblock 1551,418,0571,518,8102,315,2101,544,191
25neos 5288,450306,724167,241932,326
26pg5 _ 342534173838914182
27mik-250-20-75-5603014,24215,5729705
28gen-ip0361,668,1031,715,8371,646,3202,105,963
29mik-250-1-100.149,76369,32930,17236,646
30rmine 6137843150,624187,319223,924
31sp98ic27,73947,29146,40146,401
32neos 136221392611,1999622
33sp97ic823,181580,5001,001,8821,001,882
34cvs08r139-94374,311200,833231,599248,878
Table 3. CPU time explored vs. different threads of Cplex (seconds) of MIPLIB library.
Table 3. CPU time explored vs. different threads of Cplex (seconds) of MIPLIB library.
ProblemCplex 2 Thread
CPU Time 2
Cplex 4 Thread
CPU Time 4
Cplex 8 Thread
CPU Time 8
Cplex 12 Thread
CPU Time 12
1opm2-z7-s142426888
2mine 90-1068302561
3mine 166-53225
4opm2-z6-s711121613
5opm2-z7-s894536578
6reblock67156674880
7m100n500k4r168121118
8iis-100-0-cov1329444305327
9iis-pima-cov420222399236
10iis-glass-cov1413675974660
11iis-hc-cov3369187918372226
12glass-sc5758598312,61318,354
13iis-bupa-cov38073202535414,624
14reblock1661759958101
15macrophage3359
16mik-250-20-75-17226
17mik-250-20-75-22226
18mik-250-20-75-32216
19mik-250-20-75-4256710
20toll-like6624393922749
21mas7628223737
22mas7425,447839519,80119,142
23cod 10524182327
24reblock 15513,023321965753647
25neos 545181068
26pg5 _ 34126813
27mik-250-20-75-52226
28gen-ip036213216189430
29mik-250-1-100.19738
30rmine 6315166175223
31sp98ic280121126132
32neos 1376769295
33sp97ic11,082209041327607
34cvs08r139-942947664618781
Table 4. Calculated predictor geometric values from MIPLIB library.
Table 4. Calculated predictor geometric values from MIPLIB library.
NumberInstance λ max ( Q ) λ min ( Q ) μ max μ min h max ( x 0 ) h min ( x 0 )
1opm2-z7-s23,813,267.004113.60370,946,050,355.7728.629393629.46040.00159
2mine 90-102,612,605.48957.883295,100,968,602.4226.70572132,272.50310.000756306
3mine 166-53,758,044.72992.091141,842,115,832.7946.78679293,878.69230.000567547
4opm2-z6-s11,775,288.42988.69447,211,835,995.6166.498742867.09930.002310646
5opm2-z7-s85,697,695.777113.29470,896,040,857.2448.611211200.28890.001615675
6reblock671,051,736.61826.3805,573,004,555.5612.7688310,809.50030.001093823
7m100n500k4r10,548.397253.66786.7091.999990.97850.021469971
8iis-100-0-cov55,742.52410.1221947.4456.858974.94430.007242645
9iis-pima-cobv24,068.2848.0005569.0042.000008.89930.006447786
10iis-glass-cov25,658.2198.1086918.3113.874449.88360.006246341
11iis-hc-cov50,325.5978.00021,423.8502.0000013.89570.004458005
12glass-sc31,716.7508.0957730.6763.867989.89530.00561613
13iis-bupa-cov16,840.5758.0003007.7812.000006.90480.007710287
14reblock16619,423,492.77693.083150,648,772,421.0796.78679147,052.37460.000250464
15macrophage148.269834380.9022.192820.75000.24999999
16mik-250-20-75-11096.5000.0027,346,325,551.7372.000004020.93880.039778441
17mik-250-20-75-21081.6350.0027,205,424,894.9602.000004022.47880.039345742
18mik-250-20-75-31048.3190.0027,023,485,477.6382.000003987.90960.03950111
19mik-250-20-75-41097.6680.0027,352,045,046.7212.000003914.21600.0394924
20toll-like262.4488.651145.1282.366270.75000.2499990
21mas7612,157.6915.96 × 10 17 33,588,731,506.8252.00000923,076,932,681.4550.099256344
22mas746215.0151.7496 × 10 15 29,517,476,806.1721.99999928,571,434,175.4720.132292728
23cod10525,090.54512,658.7883138.0002.000000.99110.008888355
24reblock1153,002,050.10219.4522,736,204,400.7532.0006111,185.85400.000635524
25neos517.16313.1421026.00018.0000019.48340.296536577
26pg5-341585.1088.0003,389,781.0361.99999153.80320.143846007
27gen-ip0361047.1460.0027,026,695,879.0812.000003866.83730.03958145
28mik-250-20-75-58.9990.00211,172.4502.37624282.11300.344303931
29rmine68.0022.0829 × 10 6 4,983,686,994.1011.9999999,999.38360.496593313
30mik-250-1-100.1160,625.52659.9174,144,778.2023.024451144.74640.004304036
31sp98ic15,501,44714.7172,405,591.6841.999992135.09390.039243544
32neos1321,435,625,658.174213.20239,262,984.2401.9999925.15481.10633E-05
33sp7ic17,032.28114.4173,028,596.2271.999993240.26130.037956414
34cv08r139-9430,771.085200.813143.1271.9999914.92290.016345049
Table 5. Model 1 (node) results for resolutions with different threads in the MIPLIB library.
Table 5. Model 1 (node) results for resolutions with different threads in the MIPLIB library.
Regression Statistics2 Thread Nodes4 Thread Nodes8 Thread Nodes12 Thread Nodes
Multiple correlation coefficient0.6110.6020.5530.541
test F2.7832.6622.0631.939
Remarks34343434
Variable X 1  =  λ m a x ( Q ) −9.986 × 10 6 −9.141 × 10 6 −1.240 × 10 5 −8.797 × 10 6
Variable X 2  =  λ m i n ( Q ) −18.197−16.941−23.072−17.103
Variable X 3  =  μ m a x 1.820 × 10 6 1.625 × 10 6 1.553 × 10 6 1.191 × 10 6
Variable X 4  =  μ m i n −8.746 × 10 7 −7.899 × 10 7 −9.315 × 10 7 −1.329 × 10 6
Variable X 5  =  h m a x ( x 0 ) −10,190.000−8273.803−20,718.45513,965.216
Variable X 6  =  h m i n ( x 0 ) 626,257.256730,807.629517,075.1771,069,004.67
Table 6. Model 2 (CPU time) results for resolutions with different threads in MIPLIB library.
Table 6. Model 2 (CPU time) results for resolutions with different threads in MIPLIB library.
Regression Statistics2 Thread Time4 Thread Time8 Thread Time12 Thread Time
Multiple correlation coefficient0.580.630.5790.38
test F2.4743.1302.3550.804
Remarks34343434
Variable X 1  =  λ m a x ( Q ) −1.961 × 10 7 −4.445 × 10 8 −8.248 × 10 8 −1.159 × 10 7
Variable X 2  =  λ m i n ( Q ) −0.34−0.08−0.15−0.20
Variable X 3  =  μ m a x −1.913 × 10 9 −4.851 × 10 10 −1.137 × 10 9 −1.357 × 10 9
Variable X 4  =  μ m i n −1.837 × 10 8 −5.067 × 10 9 −9.012 × 10 9 −1.318 × 10 8
Variable X 5  =  h m a x ( x 0 ) −160.39−23.83−59.59−56.69
Variable X 6 = h m i n ( x 0 ) −11,127.64−2828.07−3790.14−7251.32
Table 7. Nodes vs. different Cplex threads of the multi-demand multi-dimensional knapsack problem (MDMKP OR-Library) set Ct1 and set Ct2 (nodes).
Table 7. Nodes vs. different Cplex threads of the multi-demand multi-dimensional knapsack problem (MDMKP OR-Library) set Ct1 and set Ct2 (nodes).
SetProblemCplex 2 ThreadCplex 4 ThreadCplex 8 ThreadCplex 12 Thread
Nodes 2Nodes 4Nodes 8Nodes 12
Ct1p146233144
Ct1p2138723
Ct1p323141427
Ct1p46325
Ct1p535141620
Ct1p6148516283
Ct1p7168917
Ct1p881212444
Ct1p945252541
Ct1p107547
Ct1p116559
Ct1p1264510
Ct1p131491218
Ct1p141076225
Ct1p1519131520
Ct2p121651315754656
Ct2p2436226182243
Ct2p335202311
Ct2p424610782104
Ct2p5976461427432
Ct2p65023219625455700
Ct2p71368639541440
Ct2p83730146717981414
Ct2p960012914922920,732
Ct2p10264182716601522
Ct2p111137559453476
Ct2p12407120133145
Ct2p1333181727
Ct2p141130415411445
Ct2p15348331133851645
Table 8. CPU times vs. different Cplex threads of the multi-demand multi-dimensional knapsack problem (MDMKP OR-Library) sets Ct1 and Ct2 (CPU time in seconds).
Table 8. CPU times vs. different Cplex threads of the multi-demand multi-dimensional knapsack problem (MDMKP OR-Library) sets Ct1 and Ct2 (CPU time in seconds).
ProblemCplex 2 ThreadCplex 4 ThreadCplex 8 ThreadCplex 12 Thread
CPU Time 2CPU Time 4CPU Time 8CPU Time 12
Ct1p1294,370370,529354,355391,377
Ct1p282,096105,68983,213141,773
Ct1p3150,846166,753155,003223,589
Ct1p423,42428,7881,385,71128,982
Ct1p5177,854190,111176,672171,314
Ct1p61,060,425901,9511,096,551931,373
Ct1p7101,832101,978109,418128,464
Ct1p8611,902317,520321,045509,262
Ct1p9362,827422,673380,613490,323
Ct1p1034,76149,79641,53841,412
Ct1p1137,64353,34658,84245,170
Ct1p1233,89140,98768,94544,145
Ct1p1390,914107,400131,613115,225
Ct1p1464,11774,44958,948125,946
Ct1p15124,365163,650183,902155,928
Ct2p112,090,44813,819,24411,527,54210,447,674
Ct2p22,909,3473,608,1013,045,4523,623,566
Ct2p32,681,96330,824423,06981,786
Ct2p41,610,7901,656,4641,385,71114,631,259
Ct2p56,097,8677,033,6266,156,8216,904,771
Ct2p618,649,80618,544,43618,656,81918,649,806
Ct2p76,323,0896,616,8056,210,1146,160,737
Ct2p816,071,91915,531,81316,578,61816,089,911
Ct2p925,154,95323,455,34224,639,30228,994,168
Ct2p1013,818,28311,907,28715,849,30215,051,050
Ct2p116,560,3717,598,2827,379,0916,779,751
Ct2p121,855,1481,779,3102,112,0431,670,868
Ct2p13181,362218,802218,258223,567
Ct2p147,086,5637,259,4207,398,1447,571,377
Ct2p1513,782,66414,672,78413,406,2439,236,575
Table 9. Model 1 OR-Library problem results with MDMKP problems. Sets Ct1 and Ct2 (nodes).
Table 9. Model 1 OR-Library problem results with MDMKP problems. Sets Ct1 and Ct2 (nodes).
ProblemModel 1Model 1Model 1Model 1
Nodes 2 ThreadNodes 4 TthreadNodes 8 ThreadNodes 12 Thread
Multiple correlation coefficient0.79050.78600.79960.8126
Remark30303030
Test F6.386.196.7907.45
Critical value of F0.00040.00050.00030.0001
Intercept−4.20141 × 10 14 −4.63514 × 10 14 −3.5531 × 10 14 −5.12549 × 10 14
Variable X 1  =  λ m a x ( Q ) −8365.365−8091.3780−8516.669−12,128.824
Variable X 2  =  λ m a x ( Q ) 4,476,702.8268,589,483.231−1,099,153.01−2,331,837.54
Variable X 3  =  λ m i n ( Q ) 0.04280,04150,04440,0530
Variable X 4  =  μ m a x 2.1007 × 10 14 2.31757 × 10 14 1.77655 × 10 14 2.563 × 10 14
Variable X 5  =  μ m i n −6563.823−4033.059−8197.559−8861.325
Variable X 6  =  h m a x ( x 0 ) −692,266.948−2,125,303.665−200,762.974303,964.749
Table 10. Model 2 OR-Library problem results with MDMKP problems. Sets Ct1 and Ct2 (CPU time).
Table 10. Model 2 OR-Library problem results with MDMKP problems. Sets Ct1 and Ct2 (CPU time).
ProblemModel 2Model 2Model 2Model 2
CPU Time 2 ThreadCPU Time 4 ThreadCPU Time 8 ThreadCPU Time 12 Thread
Multiple correlation coefficient0.7620.6280.5780.591
Remark30303030
Test F5.312.501.922.06
Critical value of F0.00140.050.1190.097
Intercept−64,363,757,062−21,401,703,335−78,486,829,134−3173 × 10 11
Variable X 1  =  λ m a x ( Q ) −1.901−0.204−1.661−7.636
Variable X 2  =  λ m a x ( Q ) 269.7983365.8017322.2013134.783
Variable X 3  =  λ m i n ( Q ) 9.826 × 10 6 2.690 × 10 6 7.498 × 10 6 2.457 × 10 5
Variable X 4  =  μ m a x 32,181,877,39610,700,850,993392434102161.586 × 10 11
Variable X 5  =  μ m i n −2.0594−0.0214−0.3841−3.0742
Variable X 6  =  h m a x ( x 0 ) 411.35541.793946.7291542.990
Table 11. Multiple correlation coefficients for problems of MIPLIB library.
Table 11. Multiple correlation coefficients for problems of MIPLIB library.
Regression Statistics2 Thread Nodes4 Thread Nodes8 Thread Nodes12 Thread Nodes
Model 1 Miplib
Multiple correlation0.6110.6020.5530.541
coefficient
test F2.7832.6622.0631.939
Model 2 Miplib
Multiple correlation0.580.630.5790.38
coefficient
test F2.4743.1302.3550.804
Model 1 MDMKP
Multiple correlation0.7900.7860.7990.812
coefficient
Test F6.386.196.7907.45
Model 2 MDMKP
Multiple correlation0.7620.6280.5780.591
coefficient
Remark30303030
Test F5.312.501.922.06
Table 12. Reliability of the statistical parameter estimation process.
Table 12. Reliability of the statistical parameter estimation process.
Database2 Thread4 Thread8 Thread12 ThreadMediaStandard95% Confidence95% Confidence95% Confidence90% Confidence90% Confidence90% Confidence
MIPLIBNodesNodesNodesNodes DeviationIntervalIntervalIntervalIntervalIntervalInterval
Left LimitRight LimitWidth (%)Left LimitRight LimitWidth (%)
Coefficient ρ 0.6110.6020.5530.5410.5760.03480.5280.62516.80.5390.61312.8
Statistic F2.782.662.061.932.360.4221.7742.94949.71.912.8138.1
Database2 thread4 thread8 thread12 threadMediaStandard95% confidence95% confidence95% confidence90% confidence90% confidence90% confidence
MIPLIBCPU TimeCPU TimeCPU TimeCPU Time deviationintervalintervalintervalintervalintervalinterval
left limitright limitwidth (%)left limitright limitwidth (%)
Coefficient ρ 0.580.630.5790.380.5420.1100.3880.69656.780.4240.6643.5
Statistic F23.132.3550.8042.1910.9850.8213.56125.0321.143.2495.79
Database2 thread4 thread8 thread12 threadMediaStandard95% confidence95% confidence95% confidence90% confidence90% confidence90% confidence
OR-Librarynodesnodesnodesnodes deviationintervalintervalintervalintervalintervalinterval
left limitright limitwidth (%)left limitright limitwidth (%)
Coefficient ρ 0.7090.7860.7990.8120.7760.0460.7120.84116.60.7270.82612.7
Statistic F6.386.196.797.456.700.5575.9277.48723.16.17.2917.7
Database2 thread4 thread8 thread12 threadMediaStandard95% confidence95% confidence95% confidence90% confidence90% confidence90% confidence
OR-LibraryCPU TimeCPU TimeCPU TimeCPU Time deviationintervalintervalintervalintervalintervalinterval
left limitright limitwidth (%)left limitright limitwidth (%)
Coefficient ρ 0.7620.6280.5780.5910.6390.0840.5230.75736.50.550.72928.0
Statistic F5.02.51.922.062.941.5940.7315.164150.3671.244.64115.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Derpich, I.; Valencia, J.; Lopez, M. The Set Covering and Other Problems: An Empiric Complexity Analysis Using the Minimum Ellipsoidal Width. Mathematics 2023, 11, 2794. https://doi.org/10.3390/math11132794

AMA Style

Derpich I, Valencia J, Lopez M. The Set Covering and Other Problems: An Empiric Complexity Analysis Using the Minimum Ellipsoidal Width. Mathematics. 2023; 11(13):2794. https://doi.org/10.3390/math11132794

Chicago/Turabian Style

Derpich, Ivan, Juan Valencia, and Mario Lopez. 2023. "The Set Covering and Other Problems: An Empiric Complexity Analysis Using the Minimum Ellipsoidal Width" Mathematics 11, no. 13: 2794. https://doi.org/10.3390/math11132794

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop