TIVC: An Efficient Local Search Algorithm for Minimum Vertex Cover in Large Graphs

The minimum vertex cover (MVC) problem is a canonical NP-hard combinatorial optimization problem aiming to find the smallest set of vertices such that every edge has at least one endpoint in the set. This problem has extensive applications in cybersecurity, scheduling, and monitoring link failures in wireless sensor networks (WSNs). Numerous local search algorithms have been proposed to obtain “good” vertex coverage. However, due to the NP-hard nature, it is challenging to efficiently solve the MVC problem, especially on large graphs. In this paper, we propose an efficient local search algorithm for MVC called TIVC, which is based on two main ideas: a 3-improvements (TI) framework with a tiny perturbation and edge selection strategy. We conducted experiments on real-world large instances of a massive graph benchmark. Compared with three state-of-the-art MVC algorithms, TIVC shows superior performance in accuracy and possesses a remarkable ability to identify significantly smaller vertex covers on many graphs.


Introduction
Given an undirected graph G = (V, E), a vertex cover (VC) C ⊆ V of G is a subset of vertices such that every edge e ∈ E has at least one endpoint belonging to C. The minimum vertex cover (MVC) problem is to find a VC with the smallest size in a graph, which is a classical NP-hard problem with an approximation factor of 1.3606 [1].MVC plays an important role in graph theory for its extensive applications, including scheduling [2], cybersecurity [3], and wireless sensor networks (WSNs) [4].For example, a WSN can be modeled as an undirected graph for which vertices and edges represent infrastructures and communication links, respectively.Then, elements in VCs can be used for various purposes such as monitoring link failures, facility location, clustering, and data aggregation since each communication link (edge) is incident with at least one vertex in a VC. Figure 1 shows an example of simulating a wireless sensor network using a unit disk graph [5,6], where each vertex is the center of a circle and there is an edge between it and the other vertices within its radius.Another more specific example involves the installation of electronic cameras for a city road network to ensure that each road segment is monitored by at least one camera, thereby observing the traffic conditions.To minimize costs, it is necessary to deploy the fewest possible electronic cameras, which is equivalent to finding an MVC in the graph representing the city road network.

Background
The MVC problem has been extensively studied, and many algorithms have been proposed, including exact and approximate algorithms.Regarding exact algorithms, branchand-reduce methods currently have the best time complexity [7,8].However, the exact algorithms are still exponential-time, which cannot solve the MVC problem in a reasonable time, especially on large graphs.Therefore, approximate algorithms are proposed to solve MVC.Greedy algorithms are the common method used for approximately solving intractable problems, such as connected dominating sets [9], weighted vertex covers [10], and independent sets [11].While greedy algorithms can quickly produce feasible solutions, the quality of solutions is generally not high enough to meet real-world requirements.
In practice, tackling intractable problems often resorts to heuristic approaches for obtaining a high-quality solution within a reasonable time, and a number of such algorithms have been proposed to address various problems, such as job shop scheduling [12], partition coloring [13], and the critical nodes problem [14].Local search is one of the extensively studied heuristics for solving NP-hard problems [15][16][17][18][19]. Regarding the MVC problem, it has been shown that local search outperforms other heuristics [20].The primary idea of local search algorithms for solving graph theory problems can be described as follows: initiate with a feasible solution and iteratively update it by removing, adding, or swapping vertices until a cutoff time is reached.A common strategy is (j, k)-swaps, i.e., removing j vertices from a solution and adding k vertices to it.We refer to a (j, k)-swap as a j-improvement [21].The local search algorithms have the advantages of simple implementation and effective performance.However, they do suffer from a few challenges: the cycling phenomenon (i.e., revisiting recently visited vertices) [22,23] leads to the algorithm wasting too much computational time, resulting in a local optimum; moreover, complex vertex selection strategies may diminish the efficiency of the local search, resulting in poor performance on large graphs.To address these issues, researchers have proposed many strategies, which will be described in detail in Section 3.

Our Contributions
This paper proposes an efficient algorithm named TIVC for the MVC problem on large graphs.TIVC involves two main ideas.The first one is a 3-improvements framework with a tiny perturbation, which has a chance to directly search for a solution of size (k − 2) after adding one vertex based on a k-sized feasible solution.Moreover, we use an effective edge selection strategy to accelerate search speed, which combines the edge-age-based best from multiple selections (EABMS) technique [20] with a random vertex selection method to choose the uncovered edges to be covered.We conduct experiments to compare TIVC with state-of-the-art local search algorithms for MVC on the Network Repository benchmark, including 72 real-world large instances.TIVC shows the best accuracy performance and significantly outperforms other algorithms in many instances.

Organization of This Paper
The remainder of this paper is organized as follows.Section 2 presents basic definitions.Section 3 gives a brief review of the related work on MVC.Section 4 describes the TIVC algorithm.Section 5 is devoted to the design and analysis of experiments, and Section 6 provides concluding remarks.

Preliminaries
This section introduces some preliminary knowledge.Specifically, Section 2.1 describes some notations and terminologies, and Section 2.2 briefly summarizes local search.

Notations and Terminologies
Denote by G = (V, E) an undirected graph with vertex set V and edge set E. For an edge e = (u, v), the two vertices u and v are called endpoints of e.A vertex is adjacent to another vertex if they are the two endpoints of an edge, and one is called a neighbor of the other.An edge is incident with each of its endpoints.The set consisting of all neighbors of a vertex v ∈ V, denoted by N(v), is the neighborhood of v, and The degree of v is the number of edges incident with v.For a vertex set For a graph G = (V, E) and a set of vertices S ⊆ V, an edge e ∈ E is covered by S if at least one endpoint of e belongs to S; otherwise, e is uncovered by S. If all edges of G are covered by S, then S is called a vertex cover (VC) of G.A VC with the smallest cardinality is called a minimum vertex cover (MVC) of G.Note that a graph G = (V, E) may have more than one MVC.We use E u (S) ⊆ E to denote the set of edges uncovered by S and use E c (S) ⊆ E to denote the set of edges covered by S. The MVC problem is to find an MVC from a graph.

Local Search
From this section, C represents a candidate (or partial) solution of the MVC problem.The general scheme of local search for MVC is to construct an initial VC first and then iteratively improve the solution to a smaller one by vertex swapping.Generally, local search algorithms use gain(v) and loss(v) to measure the importance of a vertex v, where gain(v) denotes the number of edges uncovered by C but covered by C ∪ {v}, and loss(v) is the number of edges covered by C but uncovered by C \ {v}.The age of a vertex v, denoted by age(v), is the number of steps since it was last removed from C. The age values are usually used to break ties, where ties mean the existence of multiple vertices with the same gain or loss.In addition, the age of an edge e, denoted by age(e), is the number of steps since it was last uncovered by C, which is often used as a criterion for selecting edges [20].

Related Work
This section provides a brief review on heuristic algorithms for MVC.In 2013, Cat et al. [24] proposed a two-stage strategy that allows the selection of a pair of vertices separately and exchanges vertices in two stages, based on which a NuMVC algorithm for MVC is developed, which addresses the drawback (time-consuming) of previous algorithms that require selecting vertices simultaneously [22,25,26].However, with the rapid development of the Internet and the widespread deployment of sensors, the size of datasets has dramatically increased, and many algorithms fail to solve MVC on large instances.For this, Cai et al. [27] introduced the Best from Multiple Selections (BMS) heuristic, which randomly samples k vertices in C and removes one with the minimum loss value from C. This heuristic aims to obtain a trade-off between efficiency and accuracy.Based on BMS, an algorithm named FastVC is developed for solving MVC well on large instances.By combining BMS and the best-picking strategy [24], Ma et al. [28] proposed best-picking with a noisy strategy and developed an algorithm NoiseVC; they also proposed a BMS with random walk strategy (WalkBMS) in another study [29], which selects (with a probability) BMS or random walk as the vertex selection strategy to handle the issue with FastVC becoming easily trapped in a local optimum.Subsequently, Cai et al. [30] proposed an improved version of FatVC, named FastVC2+p, by integrating some processing techniques and initial solution construction methods.In 2019, Luo et al. [31] proposed a highly parametric local search framework for MVC, called MetaVC, which incorporates many effective local search techniques.In addition, the authors used an automatic algorithm configurator that sets parameters for the type of instances to maximize the performance of MetaVC.In 2021, Quan et al. [20] proposed a new edge weighting method based on edge age (EABMS), which randomly samples a edges in E u (C) and selects one edge with the maximum age value for covering (by adding one of its endpoints to C).Based on EABMS, an algorithm EAVC and its variant EAVC2+p have been developed for MVC.Both EAVC and EAVC2+p showed superiority on large graphs compared with FastVC and its variants.To date, FastVC, MetaVC, EAVC, and their variants are state-of-the-art MVC local search algorithms for large instances.To demonstrate the effectiveness of TIVC, we compared our algorithm with the baseline algorithms, i.e., FastVC, MetaVC, and EAVC.

Main Algorithm
In this section, we describe our algorithm TIVC.We first introduce the top-level architecture of TIVC and then describe the algorithm in detail.Finally, we give a complexity analysis for TIVC.Note that in this section, C represents a candidate (or partial) solution.

Top-Level Architecture
The top-level architecture of TIVC is shown in Algorithm 1. TIVC starts with constructing an initial VC C for the graph G (line 1) and then enters a loop for finding a VC as small as possible within a given cutoff time (lines 2-10).Specifically, when obtaining a VC C, it updates the best solution C * and then removes a vertex from C (lines 3-6).If C is not a feasible solution, then the algorithm iteratively exchanges vertices until C becomes a VC.First, it removes vertices from C until |C| = |C * | − 3 (line 7).Next, it selects an uncovered edge and adds one of its endpoints to C (line 8).If C remains infeasible, it selects another vertex to add to C in the same way (lines 9-10).Finally, the best-found vertex cover C * is returned when the cutoff time is reached (line 11).

The TIVC Algorithm
Our TIVC algorithm is shown in Algorithm 2, which encompasses two stages, i.e., construction and search.In the construction stage, the algorithm constructs an initial VC C of G (line 1) by EdgeGreedyVC [20,27,31], which is a commonly used approach for MVC algorithms.The process starts with an empty set C and proceeds iteratively by checking and covering edges to extend C. Once a VC is obtained, redundant vertices are removed from C, where redundant vertices are those with loss = 0 and removing them does not produce new uncovered edges.

Algorithm 2: TIVC
In the search phase, the algorithm attempts to iteratively remove vertices from C and add vertices to C to search for a VC smaller than the current best solution C * .First, it repeatedly removes a vertex with the minimum loss from C until C is not a VC, i.e., |C| = |C * | − 1 (lines 3-6).Second, the algorithm repeatedly performs vertex swapping.Each swapping step contains a removing phase (lines 7-11) and an adding phase (lines 12-17).In the removing phase, the first vertex u 1 is selected by the BMS heuristic and removed from C (lines 7-8); then, the second vertex is selected randomly in C and removed from C to perturb the solution slightly (lines 9-10).The above implementation lead to |C| = |C * | − 3.In the adding phase, it first selects an uncovered edge e ∈ E u (C) and adds the vertex with greater gain in its endpoints to C (lines 11-13); if there are uncovered edges (E u (C) = ∅), then it chooses one edge in E u (C) randomly and adds the vertex with greater gain to C (lines [14][15][16][17].If C is a feasible VC at this time, then the algorithm completes a 3-improvement; otherwise, the algorithm continues to perform lines 7-17.Finally, the best-found VC C * is returned when the cutoff time is reached (line 18).
An important implementation detail is that when a vertex v is removed from or added to C, the gain or loss of vertices in N[v] needs to be updated accordingly.

Complexity Analysis
In this section, we analyze the time complexity of TIVC.For a given graph G = (V, E), let |V| = n, |E| = m.
Proof.TIVC (Algorithm 2) runs in O(m + n).First, the ConstructVC procedure (line 1) constructs an initial solution by EdgeGreedyVC, which has a time complexity of O(m) [27].Second, lines 3-6 take O(m + n) time since the procedure traverses E and V once.Third, lines 7-17 take O(m) time because the time complexity of BMS and EABMS has already been proven to be O(1) [20,27]; the time complexity of removing a vertex from a solution C and adding a vertex to C is O(1), and the time complexity of updating gain or loss is O(∆), where ∆ is the maximum degree of the graph; Line 14 needs to traverse E once for checking the condition of whether E u (C) = ∅.Thus, the time complexity of TIVC is O(m + n).

Results and Discussion
In this section, we evaluate TIVC on the Network Repository benchmark (https: //networkrepository.com accessed on 9 March 2023) [32].This benchmark includes enormous amounts of graphs from various areas.To assess the performance of TIVC on large graphs, we specifically selected instances with vertex numbers ranging from 10 4 to 10 7 , encompassing 72 instances.Section 5.1 introduces the design of the experiment; Section 5.2 reports the results of the experiment; Section 5.3 discusses and analyzes the experimental results.

Experiment Setup
TIVC is implemented in C++ and compiled by gcc 7.1.0with the '-O3' optimization option.All experiments are run under CentOS Linux release 7.6.1810with an Intel(R) Xeon(R) Gold 6254 CPU@3.10GHz with 128 GB RAM.The parameters of FastVC and EAVC are set to be the same as those used in the original literature [20,27], and the parameters of MetaVC are set according to the recommended values in reference [31] on large instances (REAL-WORLD).However, no processing techniques are employed, aligning with the other algorithms.TIVC incorporates two tunable parameters: k for the BMS and a for the EABMS.These parameters are set to 50 and 24, respectively, aligning with the settings of EAVC.
We compare TIVC with three state-of-the-art local search algorithms: FastVC [27], MetaVC [31], and EAVC [20] for MVC.All three algorithms are suitable for solving large instances for MVC.FastVC combines the two-stage exchange framework and the BMS heuristic to balance the algorithm's accuracy and efficiency, which achieves good performance on large graphs.MetaVC integrates many local search techniques based on the two-stage framework and incorporates an automatic configurator to select and set parameters.For large instances, MetaVC involves BMS, reconstruction, and random walk mechanisms, where reconstruction means removing t vertices from the solution C during the search phase and then adding t vertices with the greatest gains in V \ C to C. EAVC is based on the two-stage framework and combines WalkBMS and EABMS to provide good guidance for improving the quality of solutions (and also increasing the diversity of solutions) in the vertex and edge selection phase.
Table 1 shows the details of these four algorithms.The construction procedures of the four algorithms are based on EdgeGreedyVC.For vertex selection, FastVC, MetaVC, EAVC, and TIVC utilize BMS, BMS + Random, WalkBMS, and BMS + Random strategies, respectively.Regarding edge selection, FastVC, MetaVC, EAVC, and TIVC use Random, Random, EABMS, and EABMS+Random strategies, respectively.In addition, FastVC, MetaVC, and EAVC are both based on 2-improvements, while TIVC is based on 3-improvements.Note that MetaVC also incorporates a reconstruction mechanism during the search phase.For each instance, all algorithms are executed 10 times with seeds 1, 2, 3, . . ., 10.The cutoff time for each run is set at 1000 s.For each instance, we present the best (i.e., smallest) solution as Min, the average solution as Avg, and the average running time (over the 10 runs) as t avg .

Experimental Result
Results on the Network Repository benchmark are reported in Table 2. TIVC shows superior performance in terms of accuracy in the majority of instances, outperforming both the FastVC, MetaVC, and EAVC algorithms.Specifically, TIVC obtains the best solution on 50 (out of 72) instances, while FastVC, MetaVC, and EAVC obtain 26, 28, and 37 best solutions, respectively.In particular, TIVC shows a remarkable ability to find strictly optimal solutions-in total, 20 such solutions.In comparison, FastVC, MetaVC, and EAVC can find 3, 12, and 7 strictly optimal solutions, respectively.Regarding the average solution, TIVC also outperforms the other algorithms.FastVC, MetaVC, EAVC, and TIVC obtain the optimal average VC on 7, 12, 13, and 20 instances, respectively.In addition, for large instances with 10 7 vertices, TIVC performs remarkably well and finds much smaller VCs than other algorithms on many instances.
Moreover, we report summary results for each algorithm on instances with orders (the number of vertices) from 10 4 to 10 7 , as shown in Table 3.There are 7, 26, 27, and 12 instances whose orders are 10 4 , 10 5 , 10 6 , and 10 7 , respectively.All algorithms found the best solutions on the instances of order 10 4 .Regarding the instances of order 10 5 , the numbers of instances on which TIVC, FastVC, MetaVC, and EAVC found the best solutions (strictly optimal solutions) are 18, 7, 10, and 12 (8, 0, 6, 2), respectively.For the instances of order 10 6 , TIVC and EAVC had similar performance-finding the best solutions on 17 instances and the strictly optimal solutions on 5 instances-while FastVC and EAVC found the best solutions on 10 and 9 instances, respectively, and obtained strictly optimal solutions on 1 and 4 instances.Finally, regarding the instances of order 10 7 , the numbers of instances on which TIVC, FastVC, MetaVC, and EAVC found the best solutions (strictly optimal solutions) are 8, 2, 2, and 1 (7, 2, 2, 0), respectively.
As mentioned in Section 1, MVC has a real-world application in electronic camera installation on road networks.To capture visual information on a road network, such as traffic conditions, vehicle positions and speeds, and pedestrian flows, we need to install cameras at the intersections of roads, guaranteeing that every road can be monitored by at least one camera.Observe that a camera can monitor more than one road.In practice, to save costs, it is sufficient to install cameras at a small number of intersections.Now, a problem arises: given a road network, what are the intersections for installing cameras such that the number of cameras is minimized and all roads are monitored?By modeling a road network as a graph whose vertices represent intersections and for which two vertices are connected by an edge if and only if they are connected by a road (without considering the length and width of roads), this problem is equivalent to finding an MVC in a graph.As an example, we consider the instance "inf-road-usa" , which is a graph abstracted from the road network in the United States.As shown in Table 2, the solution for this instance provided by our TIVC algorihtm is 11,950,231, outperforming the suboptimal solution 11,989,552; i.e., compared with other state-of-the-art algorithms, TIVC can save at least 39,321 cameras when installing cameras for monitoring this network.

Discussion
The experimental results demonstrate the effectiveness of our TIVC algorithm for solving MVC on large graphs.From Section 5.2, it can be seen that TIVC fails to find the best solution mainly on instances of orders 10 5 and 10 6 .A possible explanation for this might be that the tiny perturbation guidance algorithm focuses mainly on the solution diversity when falling into a local optimum, ignoring the intensity of the improvement of the current solution.Nevertheless, the gap between a solution returned by TIVC and the best solution is small, generally within 10 2 .In comparison, on instances of order 10 7 , algorithms in our experiments are less likely to become trapped in local optima due to the large solution space (the t_avg values of all four algorithms are close to 1000 s).In this case, the advantage of the 3-improvements framework is showcased prominently, i.e., it has a chance to directly search for a solution of size (k − 2) after adding one vertex based on a k-sized feasible solution, whereas 2-improvements cannot achieve this.
TIVC has certain limitations.First, for small instances, the diversity of the solution space may be limited and the perturbation could potentially lead the search towards a suboptimal direction, which hinders TIVC from finding the best solution within the time threshold.In contrast, probabilistic (rather than fixed) perturbations are more likely to guide the algorithm into an optimal direction.Nevertheless, the difference between the solutions found by TIVC and the best solutions obtained by other algorithms are not significant (e.g., the instances "wave", "rec-dating", "citationCitesee", "web-Stanford", etc.).In addition, all algorithms considered in this paper except MetaVC have a common limitation, i.e., parameter setting.For different types of instances, different parameter settings may affect the performance of the algorithms to a certain extent.MetaVC has automatic parameter configurations that can tune parameters automatically according to the types of instances; this flexibility could be a contributing factor to its outstanding performance on some instances, such as "hugetrace-00020" and "hugebubbles-00000".

Conclusions
In this paper, we propose an efficient local search algorithm for the MVC problem called TIVC, which consists of a 3-improvements framework with a tiny perturbation and edge selection strategy.The experimental results show that TIVC significantly outperforms state-of-the-art algorithms for MVC on large graphs, especially those with orders exceeding 10 7 (although such instances are scarce currently, they will become increasingly prevalent with the development of the internet and big data).This provides an enhanced approach for designing and analyzing large WSNs.
In the future, we aim to explore more efficient approaches for solving various graph theoretical problems on large-scale graphs-especially algorithms with theoretical guarantees.Additionally, we plan to integrate local search with advanced machine learning methods to accelerate the convergence speed of local search algorithms, allowing them to autonomously terminate the search process instead of relying solely on time constraints.Finally, exploring more real-world problems that can be modeled to the MVC-related problems (and applying our algorithm to solve them) is also further work we will consider.

Figure 1 .
Figure 1.An example of a WSN.

Table 1 .
Equipment of four algorithms.

Table 2 .
The results on the Network Repository benchmark.

Table 3 .
Summary results."A/B" indicates that the corresponding algorithm found A best solutions and B strictly optimal solutions in instances of the corresponding scale.