Next Article in Journal
Robustness of Artificial Neural Networks Based on Weight Alterations Used for Prediction Purposes
Next Article in Special Issue
Exploring Graph and Digraph Persistence
Previous Article in Journal
On Orthogonal Double Covers and Decompositions of Complete Bipartite Graphs by Caterpillar Graphs
Previous Article in Special Issue
Combinatorial Generation Algorithms for Some Lattice Paths Using the Method Based on AND/OR Trees
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Surprisal-Based Greedy Heuristic for the Set Covering Problem

Department of Engineering for Innovation, University of Salento, Via per Monteroni, 73100 Lecce, Italy
*
Author to whom correspondence should be addressed.
Algorithms 2023, 16(7), 321; https://doi.org/10.3390/a16070321
Submission received: 1 June 2023 / Revised: 23 June 2023 / Accepted: 28 June 2023 / Published: 29 June 2023

Abstract

:
In this paper we exploit concepts from Information Theory to improve the classical Chvatal greedy algorithm for the set covering problem. In particular, we develop a new greedy procedure, called Surprisal-Based Greedy Heuristic (SBH), incorporating the computation of a “surprisal” measure when selecting the solution columns. Computational experiments, performed on instances from the OR-Library, showed that SBH yields a 2.5% improvement in terms of the objective function value over the Chvatal’s algorithm while retaining similar execution times, making it suitable for real-time applications. The new heuristic was also compared with Kordalewski’s greedy algorithm, obtaining similar solutions in much shorter times on large instances, and Grossmann and Wool’s algorithm for unicost instances, where SBH obtained better solutions.

1. Introduction

The Set Covering Problem (SCP) is a classical combinatorial optimization problem which, given a collection of elements, aims to find the minimum number of sets that incorporate (cover) all of these elements. More formally, let I be a set of m items and  J = { S 1 , S 2 , S n }  a collection of n subsets of I where each subset  S j  ( j = 1 , , n ) is associated to a non-negative cost  c j . The SCP finds a minimum cost sub-collection of J that covers all the elements of I at minimum cost, the cost being defined as the sum of subsets cost.
The SCP finds applications in many fields. One of the most important is crew scheduling, where SCP provides a minumum-cost set of crews in order to cover a given set of trips. These problems include airline crew scheduling (see, for example, Rubin [1] and Marchiori [2]) and railway crew scheduling (see, example, Caprara [3]). Other applications are the winner determination problem in combinatorial auctions, a class of sales mechanisms (Abrache et al. [4]) and vehicle routing (Foster et al. [5], Cacchiani et al. [6] and Bai et al. [7]). The SCP is also relevant in a number of production planning problems, as described by Vemuganti in [8], wherein solving is often required in real-time. In addition, it is worth noting that the set covering problem is equivalent to the hitting set problem [9]. Indeed, we can view an instance of set covering as a bipartite graph in which vertices on the left represent the items, whilst vertices on the right represent the sets, and edges represent the inclusion of items in sets. The goal of the hitting set problem is to find a subset with the minimum number of right vertices such that all left vertices are covered.
Garey and Johnson in [10] have proven that the SCP is NP-hard in the strong sense. Exact algorithms are mostly based on branch-and-bound and branch-and-cut techniques. Etcheberry [11] utilizes sub-gradient optimization in a branch-and-bound framework. Balas and Ho [12] present a procedure based on cutting planes from conditional bounds, i.e., valid lower bounds if the constraint set is amended by certain inequalities. Beasley [13] introduces an algorithm which blends dual ascent, sub-gradient optimization and linear programming. In [14], Beasley and Jornsten incorporate the [13] algorithm into a Lagrangian heuristic. Fisher and Kedia [15] use continuous heuristics applied to the dual of the linear programming relaxation, obtaining lower bounds for a branch-and-bound algorithm. Finally, we mention Balas and Carrera [16] with their procedure based on a dynamic sub-gradient optimization and branch-and-bound. These algorithms were tested on instances involving up to 200 rows and 2000 columns in the case of Balas and Fisher’s algorithms and 400 rows and 4000 columns in [13,14,16]. Among these algorithms the fastest one is the Balas and Carrera’s algorithm, with an average time in the order of 100 s on small instances and 1000 s on the largest ones (on a Cray-1S computer). Caprara [17] compared these methods with the general-purpose ILP solvers CPLEX 4.0.8 and MINTO 2.3, observing that the latter ones have execution times competitive with that of the best exact algorithms for the SCP in the literature.
In most industrial applications it is important to rely on heuristic methods in order to obtain “good” solutions quickly enough to meet the expectations of decision-makers. To this purpose, many heuristics have been presented in the literature. The classical greedy algorithm proposed by Chvatal [18] sequentially inserts the set with a minimum score in the solution. Chvatal proved that the worst case performance ratio does not exceed  H ( d ) = i = 1 d 1 i , where d is the size of the largest set. More recently, Kordalewski [19] described a new approximation heuristics for the SCP. His algorithm involves the same scheme of Chvatal’s procedure, but modifies the score by including a new parameter, named difficulty. Wang et al. [20] presented the TS-IDS algorithm designed for deep web crawling and, then, Singhania [21] tested it in a resource management application. Feo and Resende [22] presented a Greedy Randomized Adaptive Procedure (GRASP), in which they first constructed an initial solution through an adaptive randomized greedy function and then applied a local search procedure. Haouari and Chaouachi [23] introduced PROGRES, a probabilistic greedy search heuristic which uses diversification schemes along with a learning strategy.
Regarding Lagrangian heuristics, we mention the algorithm developed by Beasley [24] and later improved by Haddadi [25], which consists of a sub-gradient optimization procedure coupled with a greedy algorithm and Lagrangian cost fixing. A similar procedure was designed by Caprara et al. [26], which includes three phases, sub-gradient, heuristic and column fixing, followed by a refining procedure. Beasley and Chu [27] proposed a genetic algorithm in which a variable mutation rate and two new operators are defined. Similarly Aickelin [28] describes an indirect genetic algorithm. In this procedure actual solutions are found by an external decoder function and then another indirect optimization layer is used to improve the result. Lastly, we mention Meta-Raps, introduced by Lan et al. [29], an iterative search procedure that uses randomness as a way to avoid local optima. All the mentioned heuristics present calculation times not compatible with real contexts. For example, Caprara’s algorithm [26] produces solutions with an average computing time of about 400 s (on a DECstation 5000/240 CPU), if executed on non-unicost instances from Beasley’s OR Library, with  500 × 5000  and  1000   ×  10,000 as matrix sizes. Indeed, the difficulty of the problem leads to very high computational costs, which has led academics to research heuristics and meta-heuristics capable of obtaining good solutions, as close as possible to the optimal, in a very short time, in order to tackle real-time applications. In this respect, it is worth noting the paper by Grossman and Wool [30], in which a comparative study of eight approximation algorithms for the unicost SCP are proposed. Among these there were several greedy variants, fractional relaxations and randomized algorithms. Other investigations carried out over the years include the following: Galinier et al. [31], who studied a variant of SCP, called the Large Set Covering Problem (LSCP), in which sets are possibly infinite; Lanza-Gutierrez et al. [32], who were interested in the difficulty of applying metaheuristics designed for solving continuous optimization problems to the SCP; Sundar et al. [33], who proposed another algorithm to solve the SCP by combining an artificial bee colony (ABC) algorithm with local search; Maneengam et al. [34], who, in order to solve the green ship routing and scheduling problem (GSRSP), developed a set covering model based on route representation which includes berth time-window constraints; finally, an empiric complexity analysis over the set covering problem, and other problems, was recently conducted by Derpich et al. [35].
In this paper, we exploit concepts from Information Theory (see Borda [36]) to improve Chvatal’s greedy algorithm. Our purpose is to devise a heuristic able to improve the quality of the solution while retaining similar execution times to those of Chvatal’s algorithm, making it suitable for real-time applications. The main contributions of the current work can be summarized as follows.
  • The development of a real-time algorithm, named Surprisal-Based Greedy Heuristic (SBH), for the fast computation of high quality solutions for the set covering problem. In particular, our algorithm introduces a surprisal measure, also known as self-information, to partly account for the problem structure while constructing the solution.
  • A comparison of the proposed heuristic with three other greedy algorithms, namely Chvatal’s greedy procedure [18], Kordalewski’s algorithm [19] and the Altgreedy procedure [30] for unicost problems. SBH improves the classical Chvatal greedy algorithm [18] in terms of objective function and has the same scalability in computation time, while Kordalewski’s algorithm produces slightly better solutions but has computation times that are much higher than those of the SBH algorithm, making it impractical for real-time applications.
We emphasize that there is a plethora of other methods in the literature for solving the SCP, but most of them are time-consuming. We are only interested in fast heuristics that are compatible with real-time applications.
The remainder of the article is organized as follows. In Section 2 we describe the three algorithms involved in our analysis and illustrate SBH. Section 3 presents an experimental campaign which compares the greedy algorithms mentioned above. Finally, Section 4 reports on some of the conclusions.

2. Surprisal-Based Greedy Heuristic

2.1. Problem Formulation

The SCP can be formulated as follows. In addition to the notation introduced in Section 1, let  a i j  be a constant equal to 1 if item i is covered by subset j and 0 otherwise. Moreover, let  x j  denote a binary variable defined as follows:
x j = 1 if column j is selected , 0 otherwise .
An SCP formulation is:
minimize j J c j x j
j J a i j x j 1 i I ,
x j { 0 , 1 } j J ,
where (1) aims to minimize the total cost of the selected columns and (2) imposes the condition that every row is covered by at least one column.

2.2. Greedy Algorithms

As we explained in the previous section, we were interested in greedy procedures in order to produce good solutions in a very short time, suitable for real-time applications. SCP greedy algorithms are sequential procedures that identify the best unselected column with respect to a given score and then insert this in the solution set.
Let  I j  be the set of rows covered by column j and  J i  the set of columns covering row i. Algorithm 1 shows the pseudocode of Chvatal’s greedy algorithm [18]. Each column j is given a score equal to the column cost  c j  divided by the number of rows  I j  covered by j. At each step, the algorithm inserts the column  j *  with the minimum score in the solution set.
Algorithm 1 Chvatal’s greedy algorithm
1:
S                           ▹ initially empty set
2:
while  I  do
3:
     j arg ⁡  min j J c j I j                 ▹ selection of the best column
4:
    add  j *  to S
5:
     I I I j *
6:
    for  j J  do               ▹ remove the already covered rows
7:
           I j I j I j *
A variant of Chvatal’s procedure for unicost problems was suggested by Grossman and Wool [30], named Altgreedy. This algorithm is composed of two main steps: in the first phase, the column with the highest number of covered rows is inserted in the solution; then, in the secomd phase, some columns are removed from the solution set according to lexicographic order, as long as the number of the new uncovered rows remains smaller than the last number of new rows covered.
More recently, Kordalewski [19] proposed a new greedy heuristic which is a recursive procedure that introduces two new terms: valuation and difficulty. In the first step, valuation is computed for all columns j by dividing the number of rows, covered by j, by the column cost, as in Chvatal’s score. For each row i is defined a parameter, difficulty, which is the inverse of the sum of the valuations of the sets covering i, used to indicate how difficult it might be to cover that row. This is based on the observation that a low valuation implies a low probability of selection. The valuation v can be computed as:
v j = i I j d i c j
while difficulty d will be only updated with the new valuations.

2.3. The SBH Algorithm

In this section, we describe the SBH greedy heuristic, that constitutes an improvement on the classic Chvatal greedy procedure. As illustrated in Section 2.2, Chvatal’s algorithm assigns each column j a score equal to the unit cost to cover the rows in  I j . Then it iteratively inserts the columns with the lowest score in the solution set. However, this approach is flawed when rows in  I j  are poorly covered. Indeed, it does not consider the probability that rows  i I j  are covered by other columns  j J i . Our algorithm aims to correct this by introducing an additional term expressing the “surprisal” that a column j is selected. Therefore, our score considers two aspects: the cost of a column j and the probability that the rows in  I j  can be covered by other columns.
To formally describe our procedure, we introduce some concepts from Information Theory. The term information refers to any message which gives details in an uncertain problem and is closely related with the probability of occurrence of an uncertain event. Information is an additive and non-negative measure which is equal to 0 when the event is certain and it grows when its probability decreases. More specifically, given an event A with probability to occur  p A , the self-information I A  is defined as:
I A = log p A .
Self-information is also called surprisal because it expresses the “surprise” of seeing event A as the outcome of an experiment. In the SBH algorithm, at each stage we compute the surprisal of each column. The columns containing row i are considered independent of each other, so the probability of selecting one of them (denoted as event  A - ) is
p A - = 1 | J i | .
Therefore, the opposite event, i.e., selecting row i with a column different from the current one, is:
p A = 1 1 | J i | = | J i | 1 | J i | .
The self-information measure contained in this event is:
I i = log | J i | 1 | J i | .
Thanks to the additivity of the self-information measure, surprisal of a column j can be written as:
I j = i I j I i = i I j log | J i | 1 | J i | .
We modify Chvatal’s cost of column j, i.e.,  c j | I j | , by introducing the surprisal of j to the denominator, in order to favor columns with high self-information. In particular, at each step we select the column that minimizes:
min j J c j | I j | · I j ,
which is equivalent to
min j J c j | I j | i I j | J i | 1 | J i | .
This formulation is the same as minimizing the probability of the intersection of independent events, each of which selects a column, other than the current one, covering row i. Two extreme cases can occur:
  • if column j is the only one covering a row  i I j , it is no surprise that it is selected: in this case  I j  is high and the modified cost (9) of column j is 0 so that column j is included in the solution;
  • if, on the other hand, all rows  i I j  are covered by a high number of other columns  j J i , surprisal  I j  is very low. In this case, the cost attributed to column j is greater than its Chvatal’s cost.
To illustrate this concept, we now present a numerical example. Let
( a i j ) = 1 1 0 0 0 0 0 1 1 1 1 0 1 0 1 1 , ( c j ) = 3 1 2 5
be the coverage matrix and the column cost vector. We denote, with  C H s c o r e i  and  S B H s c o r e i , respectively, as the Chvatal and SBH scores vectors, at the i-th iteration. A hyphen is inserted to indicate that the corresponding column can no longer be considered because it either has already been selected or it is empty, meaning that the column does not cover rows that still need to be covered. At the first iteration of Chvatal’s algorithm we have the following scores:
C H s c o r e 1 = 1 ; 1 2 ; 1 ; 5 2 .
The second column (the one with lowest score) is selected. Subsequently, at the second iteration the scores are as follows:
C H s c o r e 2 = 3 ; ; 2 ; 5 2 .
At this point, the third column is selected. Finally, it is worth noting that the first column covers only rows already covered by the other selected columns. Then, at the third iteration, the scores become:
C H s c o r e 3 = ; ; ; 5 .
Therefore, column 4 is selected and the total cost for the current solution (columns 2, 3 and 4) amounts to 8 units. On the other hand, computing the SBH score for each column j according to (10): our SBH algorithm, at the first iteration, produces:
S B H s c o r e 1 = 2 9 ; 1 6 ; 4 9 ; 0 .
The fourth column has the least score, and is embedded in the current solution. At the second iteration, the scores are the following:
S B H s c o r e 2 = 1 2 ; 1 6 ; 4 3 ; .
Column 2 is selected and the procedure ends. In conclusion, our algorithm selects only two columns (4 and 2), with a total cost of 6 units, in contrast to Chvatal’s greedy algorithm which ends up with a greater solution cost. Therefore, SBH outperforms Chvatal’s procedure because the latter cannot recognize the column 4 that must necessarily be part of the solution.
It is worth noting that SBH has the same computational complexity as Chvatal’s algorithm, since they require the same number of steps in order to compute the score measure.

3. Experimental Results

The aim of our computational experiments was to assess the performance of the SBH heuristic procedure with respect to the other greedy heuristics proposed in literature. We implemented the heuristics in C++ and performed our experiments on a stand-alone Linux machine with a 4 core processor clocked at 3 GHz and equipped with 16 GB of RAM. The algorithm was tested on 77 instances from Beasley’s OR Library [37]. Table 1 describes the main features of the test instances and, in particular, the column density, i.e., the percentage of ones in matrix a and column range, i.e., the minimum and maximum values of objective function coefficients. The remaining column headings are self-explanatory. Instances are divided into sets having sizes ranging from  200 × 1000  to  1000   ×  10,000. Set E contains small unicost instances of size  50 × 500 . Sets 4, 5 and 6 were generated by Balas and Ho [12] and consist of small instances with low density, while sets A to E come from Beasley [13]. The remaining instances (sets  N R E  to  N R H ) are from [24]. Such instances are significantly larger and optimal solutions are not available. Similarly, Table 2 reports features of seven large scale real-word instances derived from the crew-scheduling problem [26].
We compared SBH with Chvatal’s procedure [18] (CH) and the heuristic by Kordalewski [19] (KORD). Table 3, Table 4 and Table 5 report the computational results for each instance under the following headings:
  • Instance: the name of the instance where the string before “dot” refers to the set which the instance belongs to;
  • BS: objective function value of the best known solution;
  • SOL: the objective function value of the best solution determined by the heuristic;
  • TIME: the execution time in seconds;
  • GAP: percentage gap between BS and the SOL value, i.e.,
GAP = 100 × SOL BS BS
Columns “SBH vs. CH” and “SBH vs. KORD” report the percentage improvement of SBH w.r.t. CH and KORD, respectively. Regarding Table 3, it is worth noting that our heuristic, compared to Chvatal’s greedy procedure, had a smaller gap, ranging from  12.65 %  to  11.03 % , with an average improvement of  1.42 % . Among these instances, SBH provided a better solution than [18] in 19 out of 24 instance problems. We point out that the best objective function value was given by Kordalewski’s algorithm, which was slightly better than our SBH procedure (by only  0.59 % ), but was slower.
Similar observations can be derived from Table 4. Here, SBH performed better, even though it differed from the Kordalewski algorithm by only  0.07 % . Comparing SBH with CH, it is worth noting that only in 4 instances out of 45 did SBH obtain a worse solution. SBH came close to the optimal solution, with an average gap of  10.69 % , and was better than CH by  2.62 % . The execution time for all the instances averaged  0.113  s for CH,  0.230  s for the Kordalewski procedure and  0.564  s for SBH. Increasing the size of the instances (which is the case in  r a i l  problems), Kordalewski’s algorithm became much slower. Consequently, on these instances we compared only the CH and SBH heuristics. On these instances, our SBH algorithm provided an average objective function improvement of  5.82 %  with comparable execution times. In conclusion, this first analysis showed that the new SBH heuristic generally produced very similar results with respect to Kordalewski’s heuristic. This is due to the fact that both heuristics consider the degree of row coverage, although in different ways, and, thus, the difficulty in covering them. However, the large amount of time the KORD algorithm took to solve  r a i l  instances points out that the use of SBH meets the requirements of real-time applications. Finally, the average percentage improvement of SBH with respect to CH, taking into account all instances, i.e., sets 4–6, scp and rail, amounted to  2.5 % .
We next compared the algorithms on unicost instances, obtained by setting the cost of all columns equal to 1, as in Grossman and Wool’s paper [30]. In particular, we compared SBH with the Altgreedy (ALTG) algorithm proposed by Grossman and Wool [30], introduced in Section 2.2. The results are shown in Table 6, Table 7 and Table 8, where the subdivision of instances was the same as before. The additional column “SBH vs. ALTG” reports the percentage improvement of SBH with respect to ALTG algorithm. Looking at Table 6, it is worth noting that the heuristic which performed better was that of Kordalewski. Indeed, our heuristic SBH was worse than KORD by about  3.49 % , while it was better than the other two greedy procedures, with a gap of  1.15 % . Here, computation times were all comparable and ranged between  0.002  and  0.007  s. SBH improved its performance in larger instances, as shown in Table 7 and Table 8. We would like to point out that ALTG and CH produced the same solution cost for all of the instances, except for the  r a i l  ones. In particular, SBH yielded an average improvement of  1.50 %  on CH and ALTG ([30]) on  s c p  instances, and, respectively,  1.39 %  and  12.97 %  on  r a i l  instances. Comparing SBH and  K O R D  on the  s c p  instances, we observed that they were very similar with a  0.07 %  improvement. In the largest instances (Table 8), as said before, it emerged that the computational time of KORD maade it impractical for real-time applications. The analysis showed that, in most cases, SBH produced better solutions than classical Chvatal’s algorithm. However, in a few instances CH presented a better solution. This phenomenon was attributable to the features of the instances. As shown in the example provided in Section 2.3, SBH immediately recognizes columns that must necessarily be present in the solution, while CH only selects them when they exhibit the lowest unit cost. In conclusion, the computational campaign revealed that SBH generally outperformed CH when considering instances containing columns with few covered rows.

4. Conclusions

In this paper, we proposed a new greedy heuristic, SBH, an improvement on the classical greedy algorithm proposed by Chvatal [18]. We showed that, in the vast majority of the test instances, SBH generated better solutions than other greedy algorithms, such as Kordalewski’s algorithm [19] and Altgreedy [30]. Computational tests also showed that Kordalewski’s algorithm is not suitable for real-time application, since it presents very large execution times, while our SBH algorithm runs in a few seconds, even on very large instances.

Author Contributions

Conceptualization, G.G. and E.G.; methodology and validation, T.A.; formal analysis and software, D.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data sharing not applicable. No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

This work was partly supported by Ministero dell’Università e della Ricerca (MUR) of Italy. This support is gratefully acknowledged (“Decreto Ministeriale n. 1062 del 10-08-2021. PON Ricerca e Innovazione 14-20 nuove risorse per contratti di ricerca su temi dell’innovazione” contract number 12-I-13147-10).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rubin, J. A technique for the solution of massive set covering problems, with application to airline crew scheduling. Transp. Sci. 1973, 7, 34–48. [Google Scholar] [CrossRef]
  2. Marchiori, E.; Steenbeek, A. An evolutionary algorithm for large scale set covering problems with application to airline crew scheduling. In Proceedings of the Real-World Applications of Evolutionary Computing: EvoWorkshops 2000: EvoIASP, EvoSCONDI, EvoTel, EvoSTIM, EvoRob, and EvoFlight Edinburgh, Scotland, UK, 17 April 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 370–384. [Google Scholar]
  3. Caprara, A.; Fischetti, M.; Toth, P.; Vigo, D.; Guida, P.L. Algorithms for railway crew management. Math. Program. 1997, 79, 125–141. [Google Scholar] [CrossRef] [Green Version]
  4. Abrache, J.; Crainic, T.G.; Gendreau, M.; Rekik, M. Combinatorial auctions. Ann. Oper. Res. 2007, 153, 131–164. [Google Scholar] [CrossRef]
  5. Foster, B.A.; Ryan, D.M. An integer programming approach to the vehicle scheduling problem. J. Oper. Res. Soc. 1976, 27, 367–384. [Google Scholar] [CrossRef]
  6. Cacchiani, V.; Hemmelmayr, V.C.; Tricoire, F. A set-covering based heuristic algorithm for the periodic vehicle routing problem. Discret. Appl. Math. 2014, 163, 53–64. [Google Scholar] [CrossRef] [Green Version]
  7. Bai, R.; Xue, N.; Chen, J.; Roberts, G.W. A set-covering model for a bidirectional multi-shift full truckload vehicle routing problem. Transp. Res. Part B Methodol. 2015, 79, 134–148. [Google Scholar] [CrossRef]
  8. Vemuganti, R.R. Applications of set covering, set packing and set partitioning models: A survey. In Handbook of Combinatorial Optimization: Volume 1–3; Springer: Boston, MA, USA, 1998; pp. 573–746. [Google Scholar]
  9. Karp, R.M. Reducibility among Combinatorial Problems; Miller, R.E., Thatcher, J.W., Eds.; Complexity of Computer Computations; Plenum Press: New York, NY, USA, 1972; Volume 10, pp. 978–981. [Google Scholar]
  10. Garey, M.R.; Johnson, D.S. Computers and Intractability; Freeman: San Francisco, CA, USA, 1979; Volume 174. [Google Scholar]
  11. Etcheberry, J. The set-covering problem: A new implicit enumeration algorithm. Oper. Res. 1977, 25, 760–772. [Google Scholar] [CrossRef]
  12. Balas, E.; Ho, A. Set Covering Algorithms Using Cutting Planes, Heuristics, and Subgradient Optimization: A Computational Study; Springer: Berlin/Heidelberg, Germany, 1980. [Google Scholar]
  13. Beasley, J.E. An algorithm for set covering problem. Eur. J. Oper. Res. 1987, 31, 85–93. [Google Scholar] [CrossRef]
  14. Beasley, J.E.; Jörnsten, K. Enhancing an algorithm for set covering problems. Eur. J. Oper. Res. 1992, 58, 293–300. [Google Scholar] [CrossRef]
  15. Fisher, M.L.; Kedia, P. Optimal solution of set covering/partitioning problems using dual heuristics. Manag. Sci. 1990, 36, 674–688. [Google Scholar] [CrossRef]
  16. Balas, E.; Carrera, M.C. A dynamic subgradient-based branch-and-bound procedure for set covering. Oper. Res. 1996, 44, 875–890. [Google Scholar] [CrossRef]
  17. Caprara, A.; Toth, P.; Fischetti, M. Algorithms for the set covering problem. Ann. Oper. Res. 2000, 98, 353–371. [Google Scholar] [CrossRef]
  18. Chvatal, V. A greedy heuristic for the set-covering problem. Math. Oper. Res. 1979, 4, 233–235. [Google Scholar] [CrossRef]
  19. Kordalewski, D. New greedy heuristics for set cover and set packing. arXiv 2013, arXiv:1305.3584. [Google Scholar]
  20. Wang, Y.; Lu, J.; Chen, J. Ts-ids algorithm for query selection in the deep web crawling. In Proceedings of the Web Technologies and Applications: 16th Asia-Pacific Web Conference, APWeb 2014, Changsha, China, 5–7 September 2014; Proceedings 16. Springer: Berlin/Heidelberg, Germany, 2014; pp. 189–200. [Google Scholar]
  21. Singhania, S. Variations in Greedy Approach to Set Covering Problem. Ph.D. Thesis, University of Windsor (Canada), Windsor, ON, Canada, 2019. [Google Scholar]
  22. Feo, T.A.; Resende, M.G. Greedy randomized adaptive search procedures. J. Glob. Optim. 1995, 6, 109–133. [Google Scholar] [CrossRef] [Green Version]
  23. Haouari, M.; Chaouachi, J. A probabilistic greedy search algorithm for combinatorial optimisation with application to the set covering problem. J. Oper. Res. Soc. 2002, 53, 792–799. [Google Scholar] [CrossRef]
  24. Beasley, J.E. A lagrangian heuristic for set-covering problems. Nav. Res. Logist. NRL 1990, 37, 151–164. [Google Scholar] [CrossRef]
  25. Haddadi, S. Simple Lagrangian heuristic for the set covering problem. Eur. J. Oper. Res. 1997, 97, 200–204. [Google Scholar] [CrossRef]
  26. Caprara, A.; Fischetti, M.; Toth, P. A heuristic method for the set covering problem. Oper. Res. 1999, 47, 730–743. [Google Scholar] [CrossRef]
  27. Beasley, J.E.; Chu, P.C. A genetic algorithm for the set covering problem. Eur. J. Oper. Res. 1996, 94, 392–404. [Google Scholar] [CrossRef]
  28. Aickelin, U. An indirect genetic algorithm for set covering problems. J. Oper. Res. Soc. 2002, 53, 1118–1126. [Google Scholar] [CrossRef] [Green Version]
  29. Lan, G.; DePuy, G.W.; Whitehouse, G.E. An effective and simple heuristic for the set covering problem. Eur. J. Oper. Res. 2007, 176, 1387–1403. [Google Scholar] [CrossRef]
  30. Wool, A.; Grossman, T. Computational Experience with Approxima-Tion Algorithms for the Set Covering Problem; Technical Report CS94-25; Weizmann Institute of Science; Elsevier: Amsterdam, Netherlands, 1997. [Google Scholar]
  31. Galinier, P.; Hertz, A. Solution Techniques for the Large Set Covering Problem. Les Cah. Du GERAD ISSN 2003, 7112440, 1–19. [Google Scholar] [CrossRef] [Green Version]
  32. Lanza-Gutierrez, J.M.; Crawford, B.; Soto, R.; Berrios, N.; Gomez-Pulido, J.A.; Paredes, F. Analyzing the effects of binarization techniques when solving the set covering problem through swarm optimization. Expert Syst. Appl. 2017, 70, 67–82. [Google Scholar] [CrossRef]
  33. Sundar, S.; Singh, A. A hybrid heuristic for the set covering problem. Oper. Res. 2012, 12, 345–365. [Google Scholar] [CrossRef]
  34. Maneengam, A.; Udomsakdigool, A. A set covering model for a green ship routing and scheduling problem with berth time-window constraints for use in the bulk cargo industry. Appl. Sci. 2021, 11, 4840. [Google Scholar] [CrossRef]
  35. Derpich, I.; Valencia, J.; Lopez, M. The set covering and other problems: An empiric complexity analysis using the minimum ellipsoidal width. Mathematics 2023, 11, 2794. [Google Scholar] [CrossRef]
  36. Borda, M. Fundamentals in Information Theory and Coding; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  37. Beasley, J.E. OR-Library: Distributing test problems by electronic mail. J. Oper. Res. Soc. 1990, 41, 1069–1072. [Google Scholar] [CrossRef]
Table 1. Instances features: sets 4–6, A-E and NRE-NRH.
Table 1. Instances features: sets 4–6, A-E and NRE-NRH.
Set |I| |J| DensityRange Count
420010002%1–10010
520020002%1–10010
620010005%1–1005
A30030002%1–1005
B30030005%1–1005
C40040002%1–1005
D40040005%1–1005
E5050020%1–1005
NRE500500010%1–1005
NRF500500020%1–1005
NRG100010,0002%1–1005
NRH100010,0005%1–1005
Table 2. Instance features: rail sets.
Table 2. Instance features: rail sets.
Instance |I| |J| RangeDensity
rail51651647,3111–21.3%
rail58258255,5151–21.2%
rail253625361,081,8411–20.4%
rail50750763,0091–21.3%
rail25862586920,6831–20.3%
rail428442841,092,6101–20.2%
rail48724872968,6721–20.2%
Table 3. Results for instance sets 4–6.
Table 3. Results for instance sets 4–6.
InstanceBSCHKORDSBH SBH vs. CH SBH vs. KORD
SOLTIMEGAPSOLTIMEGAPSOLTIMEGAP
4.14294630.0027.93%4580.0116.76%4710.0029.79%1.73%2.84%
4.25125820.00213.67%5690.01011.13%5870.00214.65%0.86%3.16%
4.35165980.00215.89%5760.01111.63%5770.00311.82%−3.51%0.17%
4.44945480.00210.93%5400.0099.31%5420.0029.72%−1.09%0.37%
4.55125770.00212.70%5720.00911.72%5710.00311.52%−1.04%−0.17%
4.65606150.0029.82%6030.0087.68%5990.0026.96%−2.60%−0.66%
4.74304760.00310.70%4800.00811.63%4740.00210.23%−0.42%−1.25%
4.84925330.0038.33%5200.0095.69%5530.00212.40%3.75%6.35%
4.96417470.00316.54%7210.01012.48%7230.00312.79%−3.21%0.28%
4.105145560.0028.17%5510.0107.20%5480.0026.61%−1.44%−0.54%
5.12532890.00514.23%2890.01614.23%2890.00514.23%0.00%0.00%
5.23023480.00515.23%3450.01914.24%3370.00611.59%−3.16%−2.32%
5.32262460.0048.85%2460.0178.85%2430.0057.52%−1.22%−1.22%
5.42422650.0049.50%2640.0169.09%2660.0049.92%0.38%0.76%
5.52112360.00411.85%2280.0168.06%2300.0049.00%−2.54%0.88%
5.62132510.00417.84%2490.01616.90%2450.00415.02%−2.39%−1.61%
5.72933260.00411.26%3140.0177.17%3220.0049.90%−1.23%2.55%
5.82883230.00412.15%3160.0169.72%3150.0059.38%−2.48%−0.32%
5.92793120.00411.83%3040.0158.96%3040.0058.96%−2.56%0.00%
5.102652930.00310.57%2850.0167.55%2860.0087.92%−2.39%0.35%
6.11381590.00415.22%1560.01013.04%1560.00613.04%−1.89%0.00%
6.21461700.00416.44%1640.00912.33%1670.00714.38%−1.76%1.83%
6.31451610.00411.03%1520.0094.83%1630.00612.41%1.24%7.24%
6.41311490.00413.74%1470.00912.21%1380.0075.34%−7.38%−6.12%
6.51611960.00421.74%1900.00918.01%1940.00620.50%−1.02%2.11%
Average 0.00312.65% 0.01210.42% 0.00411.03%−1.42%0.59%
Table 4. Results for instance sets  s c p .
Table 4. Results for instance sets  s c p .
InstanceBSCHKORDSBH SBH vs. CH SBH vs. KORD
SOLTIMEGAPSOLTIMEGAPSOLTIMEGAP
A.12532880.00813.83%2790.03310.28%2810.01211.07%−2.43%0.72%
A.22522840.00812.70%2760.0359.52%2820.01111.90%−0.70%2.17%
A.32322700.00816.38%2530.0379.05%2530.0129.05%−6.30%0.00%
A.42342780.00818.80%2650.03713.25%2730.01216.67%−1.80%3.02%
A.52362710.00814.83%2550.0338.05%2580.0129.32%−4.80%1.18%
B.169770.01911.59%750.0448.70%750.0348.70%−2.60%0.00%
B.276860.01813.16%840.03610.53%860.05113.16%0.00%2.38%
B.380890.01911.25%850.0396.25%850.0386.25%−4.49%0.00%
B.479890.02112.66%890.04612.66%870.03510.13%−2.25%−2.25%
B.572780.0198.33%780.0378.33%790.0529.72%1.28%1.28%
C.12272580.01413.66%2540.05911.89%2550.02812.33%−1.16%0.39%
C.22192580.01717.81%2510.06114.61%2490.02313.70%−3.49%−0.80%
C.32432760.01413.58%2710.05911.52%2700.02111.11%−2.17%−0.37%
C.42192570.01417.35%2520.05915.07%2560.03016.89%−0.39%1.59%
C.52152330.0138.37%2290.0606.51%2300.0266.98%−1.29%0.44%
D.160740.04923.33%680.06613.33%710.08618.33%−4.05%4.41%
D.266740.04212.12%700.0706.06%710.0887.58%−4.05%1.43%
D.372830.03715.28%810.08112.50%790.1049.72%−4.82%−2.47%
D.462710.04214.52%670.0718.06%650.0854.84%−8.45%−2.99%
D.561690.03713.11%700.07014.75%740.09821.31%7.25%5.71%
E.1550.0020.00%50.0010.00%50.0050.00%0.00%0.00%
E.2550.0030.00%60.00220.00%50.0030.00%0.00%−16.67%
E.3550.0020.00%50.0020.00%50.0030.00%0.00%0.00%
E.4560.00220.00%50.0010.00%50.0050.00%−16.67%0.00%
E.5550.0020.00%50.0020.00%50.0030.00%0.00%0.00%
NRE.129300.1503.45%320.21710.34%300.7723.45%0.00%−6.25%
NRE.230360.16320.00%340.20213.33%350.83616.67%−2.78%2.94%
NRE.327310.14514.81%310.20414.81%300.66111.11%−3.23%−3.23%
NRE.428320.15314.29%330.21117.86%310.62210.71%−3.13%−6.06%
NRE.528330.15117.86%310.20210.71%320.57914.29%−3.03%3.23%
NRF.114160.32414.29%150.3127.14%162.21614.29%0.00%6.67%
NRF.215160.3166.67%160.3696.67%162.5446.67%0.00%0.00%
NRF.314170.31821.43%150.3287.14%162.34614.29%−5.88%6.67%
NRF.414170.32221.43%160.31814.29%162.51014.29%−5.88%0.00%
NRF.513160.32023.08%150.31215.38%152.46515.38%−6.25%0.00%
NRG.11762030.12015.34%1970.54511.93%1970.28711.93%−2.96%0.00%
NRG.21541820.13618.18%1760.51214.29%1710.29711.04%−6.04%−2.84%
NRG.31661920.12315.66%1860.54912.05%1860.32212.05%−3.13%0.00%
NRG.41681910.13713.69%1910.51813.69%1930.30714.88%1.05%1.05%
NRG.51681940.12015.48%1880.52811.90%1900.31213.10%−2.06%1.06%
NRH.163760.33020.63%740.82617.46%721.45314.29%−5.26%−2.70%
NRH.263740.34017.46%720.82414.29%741.43217.46%0.00%2.78%
NRH.359650.33510.17%710.78520.34%671.51613.56%3.08%−5.63%
NRH.458690.32218.97%650.78412.07%651.61012.07%−5.80%0.00%
NRH.555630.32714.55%610.77910.91%611.39910.91%−3.17%0.00%
Average 0.11313.78% 0.23010.83% 0.56410.69%−2.62%−0.07%
Table 5. Results for instance set  r a i l .
Table 5. Results for instance set  r a i l .
InstanceBSCHSBH SBH vs. CH
SOLTIMEGAPSOLTIMEGAP
rail5071742160.19324.14%1990.27714.37%−7.87%
rail5161822040.16012.09%1960.2117.69%−3.92%
rail5822112510.21418.96%2400.31013.74%−4.38%
rail25366918947.27629.38%82810.20619.83%−7.38%
rail258695211665.52122.48%10898.22414.39%−6.60%
rail4284106513768.28429.20%131112.16523.10%−4.72%
rail4872153819027.31823.67%179010.19916.38%−5.89%
Average 4.13822.84% 5.94215.64%−5.82%
Table 6. Results for unicost instance sets 4–6.
Table 6. Results for unicost instance sets 4–6.
InstanceCHALTGKORDSBH SBH vs. CH SBH vs. ALTG SBH vs. KORD
SOLTIMESOLTIMESOLTIMESOLTIME
4.1410.003410.001410.005420.0032.44%2.44%2.44%
4.2410.002410.001380.004420.0022.44%2.44%10.53%
4.3430.002430.001390.004430.0020.00%0.00%10.26%
4.4440.002440.001420.005450.0022.27%2.27%7.14%
4.5440.002440.001400.004410.002−6.82%−6.82%2.50%
4.6430.003430.001400.006420.002−2.33%−2.33%5.00%
4.7430.002430.001410.005430.0030.00%0.00%4.88%
4.8420.002420.001400.005390.003−7.14%−7.14%−2.50%
4.9420.002420.001420.005420.0030.00%0.00%0.00%
4.10430.002430.001410.006410.002−4.65%−4.65%0.00%
5.1370.007370.002370.009380.0052.70%2.70%2.70%
5.2380.005380.004360.008370.007−2.63%−2.63%2.78%
5.3370.004370.003350.012380.0052.70%2.70%8.57%
5.4390.003390.002360.008370.004−5.13%−5.13%2.78%
5.5370.004370.002370.008370.0070.00%0.00%0.00%
5.6400.004400.002360.008370.005−7.50%−7.50%2.78%
5.7380.005380.002370.008360.006−5.26%−5.26%−2.70%
5.8390.005390.002370.010390.0050.00%0.00%5.41%
5.9380.003380.002370.009390.0052.63%2.63%5.41%
5.10390.003390.002360.009380.004−2.56%−2.56%5.56%
6.1230.004230.002220.005230.0060.00%0.00%4.55%
6.2220.005220.003210.005210.006−4.55%−4.55%0.00%
6.3230.005230.002230.005230.0070.00%0.00%0.00%
6.4220.004220.002220.005230.0084.55%4.55%4.55%
6.5230.005230.002220.006230.0060.00%0.00%4.55%
Average 0.003 0.002 0.007 0.004−1.15%−1.15%3.49%
Table 7. Results for unicost instance sets  s c p .
Table 7. Results for unicost instance sets  s c p .
InstanceCHALTGKORDSBH SBH vs. CH SBH vs. ALTG SBH vs. KORD
SOLTIMESOLTIMESOLTIMESOLTIME
A.1420.009420.004410.019430.0112.38%2.38%4.88%
A.2420.008420.005410.020420.0110.00%0.00%2.44%
A.3430.009430.004410.020420.011−2.33%−2.33%2.44%
A.4410.008410.005390.018410.0110.00%0.00%5.13%
A.5430.007430.004410.017410.011−4.65%−4.65%0.00%
B.1240.019240.010230.027230.044−4.17%−4.17%0.00%
B.2230.020230.013240.028220.038−4.35%−4.35%−8.33%
B.3230.019230.011230.026230.0360.00%0.00%0.00%
B.4240.024240.011230.031230.037−4.17%−4.17%0.00%
B.5250.021250.011240.029240.038−4.00%−4.00%0.00%
C.1470.015470.008460.041460.023−2.13%−2.13%0.00%
C.2470.018470.009470.037450.023−4.26%−4.26%−4.26%
C.3470.017470.007460.038460.023−2.13%−2.13%0.00%
C.4460.013460.008450.036460.0230.00%0.00%2.22%
C.5470.013470.012460.040460.023−2.13%−2.13%0.00%
D.1270.036270.020260.047270.0780.00%0.00%3.85%
D.2260.037260.021260.048270.0823.85%3.85%3.85%
D.3270.040270.020270.049260.077−3.70%−3.70%−3.70%
D.4260.038260.020260.048270.0803.85%3.85%3.85%
D.5270.039270.020260.050270.0910.00%0.00%3.85%
E.150.00250.00150.00150.0030.00%0.00%0.00%
E.250.00250.00160.00150.0040.00%0.00%−16.67%
E.350.00250.00150.00150.0030.00%0.00%0.00%
E.460.00260.00150.00150.004−16.67%−16.67%0.00%
E.550.00250.00150.00150.0030.00%0.00%0.00%
NRE.1180.144180.089180.178180.5770.00%0.00%0.00%
NRE.2180.150180.088180.188180.5700.00%0.00%0.00%
NRE.3180.145180.089180.172180.5600.00%0.00%0.00%
NRE.4180.142180.087180.174180.5520.00%0.00%0.00%
NRE.5180.148180.088180.180180.5510.00%0.00%0.00%
NRF.1110.311110.201110.321112.5130.00%0.00%0.00%
NRF.2110.309110.214110.315112.6090.00%0.00%0.00%
NRF.3110.307110.211110.309112.5600.00%0.00%0.00%
NRF.4110.299110.203110.339112.3130.00%0.00%0.00%
NRF.5110.309110.204110.306112.3200.00%0.00%0.00%
NRG.1650.116650.077640.463640.262−1.54%−1.54%0.00%
NRG.2650.115650.125650.402650.2580.00%0.00%0.00%
NRG.3660.125660.110640.442640.273−3.03%−3.03%0.00%
NRG.4660.124660.136650.437650.279−1.52%−1.52%0.00%
NRG.5660.115660.076640.490640.271−3.03%−3.03%0.00%
NRH.1360.340360.217360.712351.460−2.78%−2.78%−2.78%
NRH.2360.327360.247350.658351.424−2.78%−2.78%0.00%
NRH.3360.323360.236350.640351.458−2.78%−2.78%0.00%
NRH.4360.334360.216350.653351.436−2.78%−2.78%0.00%
NRH.5360.324360.211350.644351.427−2.78%−2.78%0.00%
Average 0.110 0.075 0.193 0.544−1.50%−1.50%−0.07%
Table 8. Results for unicost instance sets  r a i l .
Table 8. Results for unicost instance sets  r a i l .
InstanceCHALTGKORDSBH SBH vs. CH SBH vs. ALTG SBH vs. KORD
SOLTIMESOLTIMESOLTIMESOLTIME
rail25368947.2639755.561821126.09184710.030−5.26%−13.13%3.17%
rail258611665.56212534.5391112172.44811397.300−2.32%−9.10%2.43%
rail428413768.37215636.6371285260.187133912.740−2.69%−14.33%4.20%
rail487219027.39921376.1781848315.863186011.312−2.21%−12.96%0.65%
rail5072160.1932370.1442111.2762110.267−2.31%−10.97%0.00%
rail5162040.1562590.1212321.4322110.2183.43%−18.53%−9.05%
rail5822510.2152890.1482651.7292550.3001.59%−11.76%−3.77%
Average 4.166 3.333 125.575 6.024−1.39%−12.97%−0.34%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Adamo, T.; Ghiani, G.; Guerriero, E.; Pareo, D. A Surprisal-Based Greedy Heuristic for the Set Covering Problem. Algorithms 2023, 16, 321. https://doi.org/10.3390/a16070321

AMA Style

Adamo T, Ghiani G, Guerriero E, Pareo D. A Surprisal-Based Greedy Heuristic for the Set Covering Problem. Algorithms. 2023; 16(7):321. https://doi.org/10.3390/a16070321

Chicago/Turabian Style

Adamo, Tommaso, Gianpaolo Ghiani, Emanuela Guerriero, and Deborah Pareo. 2023. "A Surprisal-Based Greedy Heuristic for the Set Covering Problem" Algorithms 16, no. 7: 321. https://doi.org/10.3390/a16070321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop