Clustering Improves the Goemans–Williamson Approximation for the Max-Cut Problem

: M AX -C UT is one of the well-studied NP-hard combinatorial optimization problems. It can be formulated as an Integer Quadratic Programming problem and admits a simple relaxation obtained by replacing the integer “spin” variables x i by unitary vectors (cid:126) v i . The Goemans–Williamson rounding algorithm assigns the solution vectors of the relaxed quadratic program to a corresponding integer spin depending on the sign of the scalar product (cid:126) v i · (cid:126) r with a random vector (cid:126) r . Here, we investigate whether better graph cuts can be obtained by instead using a more sophisticated clustering algorithm. We answer this question afﬁrmatively. Different initializations of k -means and k -medoids clustering produce better cuts for the graph instances of the most well known benchmark for M AX -C UT . In particular, we found a strong correlation of cluster quality and cut weights during the evolution of the clustering algorithms. Finally, since in general the maximal cut weight of a graph is not known beforehand, we derived instance-speciﬁc lower bounds for the approximation ratio, which give information of how close a solution is to the global optima for a particular instance. For the graphs in our benchmark, the instance speciﬁc lower bounds signiﬁcantly exceed the Goemans–Williamson guarantee.


Introduction
The MAX-CUT problem is a well-known NP-hard [1] and APX-hard [2] combinatorial optimization problem. An instance consists of a graph G with vertex set V, edge set E and edge weights w : E → R. A cut is a bipartition A, V \ A of the vertex set V, where A ⊆ V. The MAX-CUT problem consists of finding a cut that maximizes the total weight of the edges that span the cut. That is, the function is to be maximized over all A ⊆ V. While MAX-CUT is NP-hard in general, polynomial-time algorithms exist for restricted graph classes, including planar graphs [3], graph without long odd cycles [4], and cographs [5]. MAX-CUT can also be written as a spin glass model, see, e.g., [6], for an overview.
Using an arbitrary numbering of the n vertices of the graph G, we write x i = +1 if vertex i ∈ A, x i = −1 if vertex i ∈ V \ A, and x = (x 1 , x 2 , . . . , x n ). Furthermore, we set w ij := w({i, j}) if {i, j} is an edge of G and w ij = 0 otherwise. With this notation, we can rewrite f (A) in terms of the "spin vector" as f (x) = 1 Maximizing f (x) amounts to the integer quadratic programming (IQP) formulation of the MAX-CUT problem. Without loss of generality, we assume w ij = w ji . Solutions of large MAX-CUT problems are of considerable practical interest in network design, statistical physics, and data clustering, and hence a broad array of computational techniques have been customized to its solution. Broadly, they can be subdivided into combinatorial heuristics (see e.g., [7][8][9] and the references therein) and methods involving relaxations of the integer constraints in Equation (2).
Replacing the integer vectors by x ∈ R n \ {0} leads to an equivalent continuous optimization problem for which an excellent heuristic is described in [10] and a close connection with the largest eigenvalues of generalized Laplacians [11] and their corresponding eigenvectors [12]. On this basis, algorithms with uniform approximation guarantees have been devised [13,14]. Goemans and Williamson [15] replaced the integers x i by unitary vectors v i of dimension n = |V|.
where · denotes the scalar product on the unitary vectors. This problem contains all the instances of the original problem (2), as seen by setting v i = (x i , 0, . . . , 0) for all i. The relaxed problem (3) is a particular instance of Vector Programming, or using a change of variables, an instance of Semidefinite Programming (SDP), and thus can be solved in polynomial time (up to an arbitrarily small additive error), see, e.g., [16,17].
The solutions of the relaxed problem (3) are translated to solutions of the original IQP. Goemans and Williamson [15] proposed to use a unitary random vector r and to set with sgn(t) = −1 for t < 0 and sgn(t) = +1 for t ≥ 0. This amounts to cutting the sphere at the hyperplane with normal vector r and to assign x i depending on whether v i lies in the "upper" or "lower" hemisphere defined by this hyperplane. The Goemans-Williamson relaxation yields an approximation bound of α := min 0≤θ≤π 2 π θ 1−cos θ > 0.878 for the expected value E[ f (x)]/ max f , where the expectation is taken over the choices of r [15]. At present, it is the best randomized approximation algorithm for the integer quadratic programming formulation of the MAX-CUT problem. A clever derandomization [18] shows that deterministic algorithms can obtain the same approximation bound. On the other hand, it is NP-hard to approximate MAX-CUT better than the ratio 16/17 [19]. If the Unique Games Conjecture [20] is true, furthermore, the approximation ratio cannot be improved beyond the Goemans-Williamson bound for all graphs. However, better ratios are achievable, e.g., for (sub)cubic graphs [21].
The translation of the solution of the related problem (3) in the Goemans-Williamson approximation relies on the choice of a random vector r. Naturally, we ask whether the performance can be improved by expending more efforts to obtain a better choice for r. The key observation is that the purpose of r is to separate the solution vectors v i of Equation (3) into two disjoint sets of points on the sphere. The two sets A + := {i : v i · r ≥ 0} and A − := {i : v i · r < 0} can thus be thought of as a pair of clusters. Indeed, two vectors v i and v j tend to be anti-parallel if w ij is large, while pairs of points i and j with small or even negative weights w ij are likely to wind up on the same side of the maximal cut. Of course, the random vector r is just one way of expressing this idea: if v i and v j are similar, then we will "usually" have sgn v i · r = sgn v j · r. The "randomized rounding" of the Goemans-Williamson method therefore can also be regarded as a clustering method. This immediately begs the questions whether the solutions of the MAX-CUT problem can be improved by replacing the random choice of r by first clustering the solution set { v i } of the relaxed problem (3). We shall see empirically that the answer to this question is affirmative even though the theoretical performance bound is not improved.
In practical applications, solutions of relaxed problems are often post-processed by local search heuristics. Therefore, a local search starting from the final results of both the Goemans-Williamson relaxation and two of our best clustering approaches were made in order to improve the cut values.
This contribution is organized as follows. In the following section, we briefly summarize data sets and clustering algorithms with their relevant properties as well as the details of the local search used to improve the cut values. In Section 3.1, we describe an initial analysis of the effect of clustering on the cut weights, showing that the quality of near-optimal clusters correlates well with cut weights. Since we were not able to show for most clustering methods that they retain the Goemans-Williamson performance bound, we derive an instance specific bound in Section 3.2 that provides a convenient intrinsic quality measure. In Section 3.3, we extend the empirical analysis to the benchmarking set that also contains very large graphs. We show that the use of clustering methods indeed provides a consistent performance gain. We also see that the instance-specific performance bounds are much closer to 1 than the uniform Goemans-Williamson α. Finally, in Section 3.4, we consider the improvement to the cut values that are achieved with local search starting from the the Goemans-Williamson and the two best clustering relaxations.

Benchmark Data
In order to assess the utility of clustering as rounding method, we used the benchmark set of graphs generated using Rinaldi's machine-independent graph generator. Both the generator and the graphs can be downloaded from Ye's web page http://web.stanford.edu/~yyye/yyye/Gset/. These graphs vary from 800 to 20,000 nodes and have edge weights of ±1. The topology of the graph can be toroidal, almost planar or random. The first 21 G-set graphs are a standard benchmark for the MAX-CUT problem. The G-set benchmark consists of graphs G1 to G67, G70, G72, G77, and G81. The optimal cuts are not known for most of these graphs. We therefore use the best known cut-values compiled in [9] for comparison.
The relaxed problem (3) was solved using the CVX package in Matlab for graphs G1 to G21. For graphs G22 to G54, G57 to G59, G62 to G67, and G72, we used Mathematica's function SemidefiniteOptimization. For graphs G55, G56, G60, G61, G70, G77, and G81 neither SemidefiniteOptimization nor CVX were able to find a solution to the SDP problem. We therefore had to exclude these instances from further consideration. From the SDP solution of each instance, we computed 50 iterations for the randomized clustering algorithms, including Goeamans and Williamson randomized clustering and reported the best solution for the seven best algorithms.
In order to obtain the spherical variants of k-means and Fuzzy c-means, it is necessary to define the cluster centroids as points on the sphere. This can be achieved by rescaling the centroid to be unit length [23]. On the sphere, cosine similarity is a more natural choice than Euclidean dissimilarity. It is not difficult to see that the two variants are actually equivalent: Lemma 1. Spherical k-means clustering with cosine similarity is equivalent to k-means clustering with Euclidean distances and normalizing of the centroid vectors in each step.
Proof. For unitary vectors, the square of their Euclidean distance can be expressed as in terms of the angle θ between x and y. The centroid c of a given cluster C minimizes ∑ i∈C x i − c 2 and thus maximizes the sum ∑ i∈C cos θ i . Analogously, each x i is assigned to cluster C j with minimal value of x i − c j 2 and thus maximal cos θ i . Thus, the squared Euclidean distance and the cosine distance optimize the same objective function.
The same argument can be made for Fuzzy c-means clustering, since its cost function is also a linear combination of squared Euclidean distances and thus, equivalently, a linear combination of the corresponding cosines. For MST clustering, no adjustment is necessary since the relationship between Euclidean distances and cosines is monotonic and thus the transformation does not affect the MST. By the same token, k-medoids clustering is unaffected by the restriction to the sphere since medoids by definition are always the unitary vectors v i .
An important ingredient for the clustering procedures is the initialization. For k-means, k-medoids, and fuzzy c-means, we consider both a deterministic and a non-deterministic version. In the deterministic version, the pair v * i and v * j with maximal distance from each other is chosen. In the non-deterministic variant, the two initial cluster centroids are selected from the solution vectors v i at random. For k-means, we observed that choosing the initial cluster centroids as vectors v i does not work well since the optimization quickly get stuck in a local minimum. We therefore devised two alternative random initialization methods: (1) We generate a random unitary vector r and use c 1 = r and c 2 = − r as initial centroids. (2) We use two independently generated random unitary vectors r 1 and r 2 as initial centroids. The main advantage of method (1) is that this initialization is equivalent to starting k-means from solutions obtained by Goemans-Williamson rounding. To see this, assume (without loss of generality) that the random vector chosen as initial centroid points towards the north pole and therefore the negative side of it will point towards the south pole. Then, any point in the lower hemisphere of the hypersphere will be clustered with the south pole vector since this is the closest centroid. The same will happen for points on the upper hemisphere and therefore this is equivalent to splitting the hypershpere into two hemispheres, which is the Goemans-Williamson rounding.
In summary, we consider a total of nine clustering procedures: 1. In order to quantify the quality of the clusters, we use the distortion [27,28], a measure of cluster coherence defined as Here, µ C j := 1 |C j | ∑ x i ∈C j x i is the centroid of the cluster C j ∈ C. We note that k-means clustering minimizes the distortion. [12]. In terms of the spin vectors x, this amounts to "flipping" (changing the sign of) exactly one spin x i . An adaptive walk iteratively accepts a spin flip if the cut value f (A) improves. By construction, therefore, an adaptive walk terminates in a locally optimal solution, i.e., in a cut A * for which there is no adjacent cut with a strictly larger cut weight. In general, neither the Goemans-Williamson nor any of the clustering results are locally optimal. We therefore use adaptive walks as a simple way to further improve solutions. We performed local improvement for each of the 50 repetitions of Goemans-Williamson randomized rounding, and the two best-performing clustering algorithms: K-MeansNM and K-Means2N.

Cluster Quality Correlations with Solution Quality
In a preliminary evaluation on the first 21 G-set graphs (see Materials and Methods), we observed that clustering instead of random rounding yields systematically larger cuts. In order to better understand the reason for the beneficial effect of clustering, we investigated the relationship between a quality measure for the clustering and the resulting weight of the maximal cut. Since k-means clustering minimizes distortion, it serves as a natural measure of cluster quality, irrespective of the clustering method that is actually used. We chose K-MeansNM for this initial analysis because it uses RR solutions to initialize clusters and thus allows for a direct evaluation of the effect of clustering heuristics.
The RR solutions fall into a very narrow range of distortion values that is clearly separated from the near optimal range achievable by the clustering methods. The cut weights of the RR solutions do not appear to be correlated with the distortion of the corresponding clusters. However, after only a few clustering steps, k-means enters a regime in which distortion and cut weight are strongly correlated, see Figure 1.    We observe that there are two groups of graphs. In the first groups, exemplified by G43, Figure 1a, we consistently observe lower RR values at the starting point of k-MeansNM (right) that at the endpoint (top left) and the cut values mostly increase monotonically with decreasing distortion. In the second group of graphs, exemplified by G31, Figure 1b, this is still the case on average; however, the optimal cut weights are observed at sub-optimal distortions. This observation motivates us to record the cut weights for intermediary steps of the cluster procedures, not only at their endpoints. In the case of K-MeansNM, this guarantees that we retain the performance guarantee of the Goemans-Williamson bound.

An Instance-Specific Approximation Guarantee
Considering a fixed instance of MAX-CUT, let { v i } be the solution of the relaxed problem (3) and let {A, V \ A} be a discrete solution. Denote by ∂A := {(i, j) ∈ E(G)|i ∈ A, j ∈ V \ A} the set of cut edges. The value S of the relaxed solution can be written as Thus, we have f (A) = S + g cut − g in . Writing f * for the weight of the optimal cut, we know that the solution of the relaxed problem is an upper bound, i.e., S ≥ f * . We therefore have Note that g in − g cut ≥ 0 since by definition f * ≥ f (A). First, consider the case of positive edge-weights. Then, f (A) > 0 and we can estimate the approximation ratio for the solution A as If there are negative edge weights, we follow [15], define W − := ∑ (i,j) min(w ij , 0) ≤ 0, and make use of the fact that f * − W − ≥ 0. From Equation (6), we immediately obtain f (A) − W − ≥ f * − W − − (g in − g cut ) and thus a generalized version of the approximation ration can be computed as The bounds in Equations (7) and (8) can be seen as instant-specific versions of the general approximation ratio derived in [15]. Empirically, we found in benchmarking experiments (see below) that the instance specific bound α(A) substantially exceeds the uniform Goemans-Williamson bound of α ≈ 0.878. For the Goemans-Williamson algorithms, of course, we necessarily have E[α(A)] ≥ α.

Benchmarking Experiments
In order to compare the cut values of RR with each of the clustering algorithms, we use the following relative measure of performance: Figure 2 shows that clustering on most instances yield significant improvement for most clustering approaches. K-MeansNM by construction always finds a solution that is at least as good as RR; however, both K-MeansNM and K-Means2N never give a lower solution than RR. K-meansRand improves for all graphs except G12, Fuzzy for all except G12 and G72, K-MedRand for all except G32, G39, and G72, and, finally, for K-MeansDet in nine graphs, a better solution was not found.
The same trend is also observed for the instance-specific performance bounds, see Figure 3. The performance bounds for the individual solutions are well above 0.9, i.e., exceed the Goemans and Williamson also in those few cases where clustering is worse than RR. For bipartite graphs with non-negative edge weights, the entire edge set forms a maximal cut [11]. This is the case for the two unweighted graphs G48 and G49 and explains why, for these two cases, no difference between RR and clustering solutions is observed in Figure 2.  On average, all clustering algorithms yield improvements over RR. Interestingly, the average solution quality depends noticeably on the clustering algorithm. The clustering algorithm with largest average improvement was K-MeansNM, as expected. On the benchmark set, an average increase on the cut weight byf K-MeansNM = 1.541% compared with RR is obtained. Other variants of 2-means performed similarly well. We foundf K-Means2N = 1.533% andf K-MeansRand = 1.471%. For Fuzzy clustering, we only observed an improvement off Fuzzy = 1.247%. Withf K-MedRand = 0.841% and f K-MeansDet = 0.789%, performance of medoids and deterministic methods was less encouraging. For individual instances, the improvement was substantial. For example, we obtain an improvement of 5.81% for the graph G10 and 5.12% for G28 with K-MeansNM.
Despite subtle differences between the clustering methods, the clusters are quite similar in their characteristics. One measure of interest is the mean clustering angle It measures the average angle between two unit vectors in the same cluster. We found that the cardinalities |A| and |X \ A| are nearly even and θ A lies between 60 and 75 degrees for the graphs in the benchmark set. We observed not convincing trends connecting these parameters and the weight of maximum cut.   Figure 5 shows the ratio f rounding / f best , where f rounding is the best solution after local search found either by clustering or RR for that instance and f best is the best solution known in the literature, taken from [9]. The corresponding values are also available in tabular form as Additional Material. Graphs G11 to G13, G32 to G34, G57, G62, G65 to G67, and G72 have toroidal topology; both the clustering algorithm and Randomized Rounding show a comparably low approximation ratio for these instances. For graphs with other topologies, we obtain approximation ratios exceeding 0.96, sometimes closely approaching 1. The combination of the Goemans-Williamson relaxation with clustering thus at least comes close to the best solutions known in the literature.

Conclusions
As we can see from the results, using other clustering methods than the randomized version of [15], on average, leads to better cut values. Using k-means with an initialization equivalent to starting from Goemans-Williamson rounding solutions (K-MeansNM), and keeping track of the points visited by k-means at all time, we can guarantee that the approximation guarantee is maintained, with the possibility of finding larger cut values. For the other clustering algorithms, this is not true; however, for one version of k-means (K-Means2N), the same or better solutions than RR were found, even without the guarantee. On average, the remaining clustering algorithms yield larger cut values than RR, and the number of instances where those algorithms find lower cuts are less than 15% for the worst case (K-MeansDet), and less than 5% for the others. Our approach is not guaranteed to improve all instances. In particular, it does not result in a theoretical improvement of the Goemans-Williamson approximation guarantee.
We have derived, however, an instance-specific lower bound for the approximation ratio that depends both on the instance and the solution, i.e., the cut itself. It provides a plausible measure of performance also for instances with unknown maximal cut value.
The success of k-means related clustering approaches suggests to extend this idea to other clustering methods. For spectral clustering, for instance, a natural starting points would be an auxiliary graph with weights ω ij = max( v i · v j ) and edge set {(i, j)|ω ij > 0}.