An Exact Algorithm for Minimum Vertex Cover Problem

: In this paper, we propose a branch-and-bound algorithm to solve exactly the minimum vertex cover (MVC) problem. Since a tight lower bound for MVC has a signiﬁcant inﬂuence on the efﬁciency of a branch-and-bound algorithm, we deﬁne two novel lower bounds to help prune the search space. One is based on the degree of vertices, and the other is based on MaxSAT reasoning. The experiment conﬁrms that our algorithm is faster than previous exact algorithms and can ﬁnd better results than heuristic algorithms.


Introduction
In an undirected graph, G = (V, E), where V is the set of vertices and E is the set of edges. A vertex cover (VC) is a subset S ⊆ V in which each edge in G has at least one endpoint in S. The minimum vertex cover (MVC) problem is to find the minimum size of the vertex cover in a graph. The MVC problem is a typical NP-complete problem, which plays an important role in many practical applications, such as network security, parallel machine scheduling [1], financial networks [2], and economics [3]. The MVC problem is closely related to the well-known maximum clique (MC) problem. A minimum vertex cover of the graph G can be implemented by finding a maximum clique of the complementary graph of G (denoted by G). Unfortunately, when G is large and sparse, searching for a maximum clique in G is very hard in practice, while it is usually relatively easy to find a minimum vertex cover in G. Thus, finding an algorithm that solves MVC directly is needed.
There exist a remarkable number of swarm intelligence algorithms for the MVC problem. For example, the work in [4] provided an algorithm for MVC using an ant colony optimization algorithm. The work in [5] gave an algorithm based on the simulated annealing algorithm. In addition to the above algorithms, many new algorithms have been developed. The work in [6] proposed a solving algorithm using DNA molecules. The work in [7] adopted a quantum algorithm to design the MVC algorithm. The work in [8] presented a deterministic data structure approximation ratio (2 + ) to solve the MVC problem, where is a small constant. The work in [9] addressed a learning automaton-based algorithm. The work in [10] integrated reinforcement learning to solve the MVC problem.
Compared with the algorithms mentioned above, exact algorithms can prove the optimality of solutions. The studies on exact algorithms were mainly concentrated on fixed parametrized algorithms. We highlight the fixed parametrized algorithms described in [11][12][13][14][15][16]. Although these studies have made tremendous progress in theory, there is still a huge gap between theory and practice. To our best knowledge, SBMS [17] is the only empirically exact algorithm which is especially good at solving minimum weight vertex cover(MWVC). The Satisfiability problem (SAT) is to decide whether some Boolean formula is satisfiable or not. SBMS solves MWVC with efficient SAT solvers.
In this paper, we introduce a new exact algorithm, EMVC, for solving the MVC problem, which is based on the branch-and-bound (BnB) search scheme. In EMVC, we provide two approaches, DegLB and SatLB, to compute the lower bound. DegLB is based on the degrees of vertices, and SatLB is based on the MaxSAT reasoning. We carry out experiments on the DIMACS benchmark set which is a challenge for the Maximum Clique problem.The results show that EMVC is an extremely competitive exact solver.
In Section 2, we introduce some basic concepts related to the MVC problem. We propose the framework of the EMVC algorithm in Section 3. We also present three methods to calculate lower bounds to prune the searches in Section 4. Section 5 shows the experimental results of EMVC, along with experiments validating the effectiveness of the proposed novel ideas under the DIMACS Implementation Challenge. Finally, we give some conclusions and future works.

Basic Definition and Notation
Given an undirected graph G = (V, E), where V = {v 1 , v 2 , ..., v n } is a set of vertices and E = {e 1 , e 2 , ..., e m } is a set of edges, a vertex cover (VC) is a subset S ⊆ V, such that each edge in G has at least one endpoint in S. An edge is said to be covered if at least one endpoint of the edge is in the vertex cover S. The minimum vertex cover (MVC) problem is to find the minimum size of vertex cover in a graph. For a vertex v in G, the neighborhood of v, denoted by N(v), is the set containing all vertices adjacent to v, i.e., The degree of a vertex v, denoted by d(v), is the number of edges incident to v. The density D of G is computed as 2 × |E|/(|V| × (|V| − 1)). G \ T is a sub-graph derived from G by removing all vertices of T and all edges with at least a vertex in T. An independent set I of G is a subset of vertices such that no two vertices in I are adjacent, i.e., for any (u, v) ∈ I × I, (u, v) / ∈ E. The maximum independent set (MIS) problem is to find the maximum size of independent set I in a graph. A clique C is a subset of vertices of G in which every two vertices are adjacent, i.e., for any (u, v) ∈ C × C, (u, v) ∈ E. The maximum clique (MC) problem consists of finding a clique with the maximum number of vertices. The complement graph of G is G, A clique of G is an independent set of G and vice versa. Example 1. Figure 1 gives an undirected graph

EMVC Algorithm for Solving the MVC Problem
In this section, we propose the EMVCalgorithm, which employs the branch-and-bound framework. The EMVC algorithm explores the search tree in a depth-first manner, which searches for the optimal solution by recursively branching on a vertex and thus creating two new nodes until there is no vertex in G. In other words, at first, the algorithm searches the complete space of solutions. When it branches on a vertex, the solution space is divided into a set of smaller subsets, and it obtains the relative upper and lower bound for each node to further reduce the search space. The outline of the algorithm is presented in Algorithm 1. The algorithm takes the graph G, the upper bound UB, and a growing partial vertex cover C as inputs. The upper bound UB, which is the overestimation of the size of the minimum vertex cover of the sub-graph, can be obtained by computing the size of the minimum vertex cover found so far. When the size of the current partial vertex cover plus the the lower bound of the sub-graph is equal to or greater than UB, we prune the search and return UB as the size of the minimum vertex cover (Lines 1-2). In the algorithm, we use three methods to compute the lower bound of the sub-graph, including DegLB, SatLB, and ClqLB. The DegLB and the SatLB are the two new methods to calculate the lower bound, which we will describe in the next section. The ClqLB adopts the clique to figure out the lower bound [18]. The ClqLB decomposes the graph G into a set of disjoint cliques C 1 , C 2 , ..., C r . Then, the lower bound of G is: where r is the number of cliques. If there is no vertex in G, the current growing partial vertex cover is the minimum vertex cover S min (Lines 3-4). Otherwise, the EMVC extends the current minimum vertex cover by selecting a vertex v with the maximum degree and generates two branches, i.e., two relaxed graphs G \ N * (v) and G \ v (Lines 5-7). Finally, EMVC repeats until it finds the best solution (Line 8).
Input: a graph G = (V, E), an upper bound UB = |V|, and a growing partial vertex cover C = ∅. Output: the size of the minimum vertex cover S min of G.

Two Novel Lower Bounds for the MVC Problem
The lower bound is an important aspect of the branch-and-bound algorithm because explicit enumeration is normally impossible due to the exponentially-increasing number of potential solutions. The use of the lower bound enables the branch-and-bound algorithm to search only parts of the solution space implicitly. Thus, designing a tight lower bound is crucial to enhance the efficiency of the branch-and-bound algorithm. In this section, we provide two novel methods to compute the lower bound.

DegLB: The Degree-Based Lower Bound for MVC
The first lower bound DegLB uses the degrees of vertices to compute. We select the vertex with the largest degree, name it as v 1 , and update the degree of other vertices. Then, we select the largest vertex among the remaining vertices and name it v 2 , and so on. We find a vertex v i such that Then, the lower bound of G of the MVC problem is defined as follows.
Since the number of edges that i vertices covered is equal to or smaller than |E|, there may exist some edges that are not covered. Thus, we need at least |E | d(v i+1 ) vertices to cover |E |. Example 2. Figure 1 illustrates an undirected graph G In G, we sort the vertices with dynamic degrees. We find that d(v 1 ) = 2, so we choose v 1 as v 1 and update other vertices. At present, d( Then, we choose v 4 as v 2 and update the degrees as d( The lower bound of G obtained by DegLB is three, where i = 3, |E | = 0.

SatLB: The Max-SAT-Based Lower Bound for MVC
Before describing the lower bound SatLB, we concentrate on specifying some notions. A variable x may take a value of zero (false) or one (true). A literal l is a variable x or its negation x. A clause is defined as a disjunction of literals, e.g. c = l 1 ∨ l 2 ∨ ... ∨ l k . A unit clause is a clause containing only one literal. A conjunctive normal form (CNF) formula F is a conjunction of clauses, denoted as F = c 1 ∧ c 2 ∧ ... ∧ c m . A truth assignment is a map that assigns each variable a value. A clause is satisfied iff at least one literal takes the value true. The partial MaxSAT formula is a conjunction of hard and soft clauses, where the hard clause is a clause that must be satisfied, while a soft clause is a clause that may or may not be satisfied. The partial MaxSAT problem is to determine a truth assignment satisfying all hard clauses and maximizing the number of satisfied soft clauses. A subset of soft clauses is called an inconsistent subset iff the subset together with all hard clauses results in a contradiction (an empty clause).
Unit propagation (UP), which is based on unit clauses, is a successful technique widely used in SAT and MaxSAT. Since each clause needs to be satisfied, the single literal composed of the unit clause must be true. Thus, if a set of clauses contains the unit clause l, the other clauses are simplified by the application of the following UP rule:

•
Every clause (other than the unit clause itself) containing l is removed; • In every clause that contains l, this literal l is deleted.
In the following, we will introduce a lower bound of the MVC problem with MaxSAT reasoning. At first, we reduce a graph into a MaxSAT instance. Given a graph G = (V, E), G can be partitioned into a set of cliques {C 1 , C 2 , ..., C m }, where C 1 ∪ C 2 ∪ ... ∪ C m = V and C i ∩ C j = ∅ (i = j). The graph can be encoded into a MaxSAT instance as follows. (1) Each vertex v i can be encoded into a variable x i .
(2) Each pair of non-connected vertices (v i , v j ) / ∈ E can be represented by a hard clause Then, we compute the inconsistent subsets of the transformed MaxSAT instance using the UP approach. Specifically, given a CNF formula F = c 1 ∧ c 2 ∧ ... ∧ c m , we use a stack ST to store all unit clauses including the ones generated by UP rule, where c i (1 ≤ i ≥ m) is the clause (hard clause or soft clause) in the transformed MaxSAT and m is the number of hard clauses plus the soft clauses in the transformed MaxSAT. In the beginning, ST stores all unit clauses in F, then we recursively pop out a unit clause from ST and apply the UP rule, which may produce new unit clauses, until no more unit clauses in the stack or an empty clause are produced. The soft clauses generating the empty clause using the UP rule build an inconsistent subset. After finding an inconsistent subset, the soft clauses in the subset are removed from the formula F, and we continue to find other inconsistent subsets in the remaining F until no inconsistent subsets can be found. UP can be enhanced by failed literal detection [19]. A literal l is called a failed literal in the formula F if the UP rule working on F ∧ l produces an empty clause. When both l and l are failed literals, the union of the soft clauses used to generate the two empty clauses constitutes an inconsistent subset. For the MaxSAT instance encoding an MVC instance, one does not detect whether or not a negative literal has failed, because a variable only has a positive occurrence in a soft clause.
After obtaining the number of inconsistent subsets, we can compute the lower bound of G of the MVC problem as follows. Theorem 1. Let G = (V, E) be a graph. If G can be partitioned into k disjoint cliques and there are s inconsistent subsets in the transformed MaxSAT instance, then the lower bound of G for the MVC problem is SatLB = |V| − k + s.
Proof. Since |S MVC | = |V| + |S MIS |, we can compute the lower bound of the MVC problem after obtaining the upper bound of the MISproblem, where S MVC and S MIS are the solutions of the MVC and MIS problems, respectively. If G can be partitioned into k disjoint cliques, selecting one vertex from each clique constitutes an independent set. k is the upper bound of the MIS problem. Assuming that an instance contains k soft clauses, the upper bound is less than or equal to k − s if s disjoint inconsistent clause sets can be detected. After using MaxSAT reasoning, we can find s inconsistent subsets, which can further decrease the upper bound of MIS to k − s. Thus, the lower bound of the MVC problem is |V| − k + s. Figure 1 to illustrate how to obtain a lower bound using the SatLB method based on MaxSAT reasoning. We first encode the graph in Figure 1 into a partial MaxSAT formula. Since the graph can be partitioned into three disjoint cliques {v 1 , v 2 }, {v 3 , v 5 }, and {v 4 }, the transformed partial MaxSAT formula is composed of hard clauses

Example 3. Let us also employ
and the stack ST is initialized to c 3 . Then, based on Theorem 1, let c 3 become unit propagation. Thus, x 4 is true. Then, we remove x 4 from hard clauses. Notice that hard clauses must be satisfied. Then, x 2 and x 5 are false. At this point, the soft clauses change. c 1 = x 1 , and c 2 = x 3 . When x 1 is true, to satisfied hard clauses, x 2 and x 3 , are false. There is one conflict at this point because all soft clauses cannot be satisfied at the same time. Therefore, the upper bound of the independent set is narrowed to a size of 3 − 1. Thus, the lower bound of MVC is 5 − 3 + 1 = 3.

Experimental Results
In this section, we carry out extensive experiments to test the performance of our algorithm EMVC on DIMACS instances obtained from NetworkRepository [20]. We compare EMVC with the exact algorithm SBMS and heuristic algorithm FastVC. FastVC is the most competitive heuristic solver for MVC so far [21]. SBMS is the unique exact algorithm for solving the MVC problem, which encodes MVC into SAT [17]. In the experiment, the cutoff time of all solvers was set to 1000 seconds. For the heuristic solver FastVC, we report the best solution (best) and average time (avgt) in 10 runs. For exact solvers, we recorded the time running on each instance and the size of the minimum vertex cover of each instance, denoted as S m . For the sake of space, we only show the results on graphs with more than 200 vertices and densities greater than 0.7. This is because dense graphs are more challenging for the MVC problem. We implemented our algorithm in Java and compiled it using JDK1.7. All the experiments were run on CentOS Linux, with a 3.1 GHZ CPU and 64 GB of memory.
Experimental results on the classical random benchmark are shown in Table 1. Table 1 illustrates the comparison of the three algorithms, where "↑" means the heuristic algorithms did not find the optimal value. In the table, the runtimes are in seconds, and "-" means timeout or out of memory. From the table, we see that EMVC could find the optimum for most graphs, while SBMS failed on 27 instances; FastVC could not obtain the optimum on four instances. In addition, although the runtime of EMVC was more than FastVC, FastVC could not ensure the optimum result, and EMVC could solve these instances in a reasonable time. In general, these results indicate that EMVC is an extremely competitive exact solver.

Summary and Future Work
In this paper, we proposed a new exact algorithm EMVC for MVC. In the algorithm, we defined two tight lower bounds to reduce the search space. Extensive experiments showed that our algorithm achieved good performance. In the future, we will study variants of the reduction rules to improve the performance of the MVC solver.

Conflicts of Interest:
The authors declare no conflict of interest.