Doubly Nonnegative and Semidefinite Relaxations for the Densest k-Subgraph Problem

The densest k-subgraph (DkS) maximization problem is to find a set of k vertices with maximum total weight of edges in the subgraph induced by this set. This problem is in general NP-hard. In this paper, two relaxation methods for solving the DkS problem are presented. One is doubly nonnegative relaxation, and the other is semidefinite relaxation with tighter relaxation compare with the relaxation of standard semidefinite. The two relaxation problems are equivalent under the suitable conditions. Moreover, the corresponding approximation ratios’ results are given for these relaxation problems. Finally, some numerical examples are tested to show the comparison of these relaxation problems, and the numerical results show that the doubly nonnegative relaxation is more promising than the semidefinite relaxation for solving some DkS problems.


Introduction
In this paper, the densest k-subgraph (DkS) problem [1,2] is considered. For a given graph G and a parameter k, the DkS problem consists in finding a maximal average degree in the subgraph induced by the set of k vertices. This problem was first introduced by Corneil and Perl as a natural generalization of the maximum clique problem [3]. It is NP-hard on restricted graph classes such as chordal graphs [3], bipartite graphs [3] and planar graphs [4]. The DkS problem is a classical problem of combinatorial optimization and arises in several applications, such as facility location [5], community detection in social networks, identifying protein families and molecular complexes in protein-protein interaction networks [6], etc. Since the DkS problem is in general NP-hard, there are a few approximation methods [7][8][9] for solving it. It is well-known that semidefinite relaxation is a powerful and computationally efficient approximation technique for solving a host of very difficult optimization problems, for instance, the max-cut problem [10] and the boolean quadratic programming problem [11]. It also has been at the center of some of the very exciting developments in the area of signal processing [12,13].
Optimization problems over the doubly nonnegative cone arise, for example, as a strengthening of the Lovasz-ϑ-number for approximating the largest clique in a graph [14]. The recent work by Burer [15] stimulated the interest in optimization problems over the completely positive cone. A tractable approximation to such problem being defined as an optimization problem over the doubly nonnegative cone. By using the technique of doubly nonnegative relaxation, Bai and Guo proposed an effective and promising method for solving multiple objective quadratic programming problems in [16]. For more details and developments of this technique, one may refer to [17][18][19] and the references therein. It is worth pointing out that the cone of doubly nonnegative matrices is a subset of a positive semidefinite matrices cone. Thus, the doubly nonnegative relaxation is more promising than the basic semidefinite relaxation. Moreover, such relaxation problems can be efficiently solved by some popular package software.
In this paper, motivated by the idea of doubly nonnegative relaxation and semidefinite relaxation, the two relaxation methods for solving the DkS problem are presented. One is doubly nonnegative relaxation, and the other is semidefinite relaxation with tighter relaxation. Furthermore, we prove that the two relaxation problems are equivalent under the suitable conditions. Some approximation accuracy results about these relaxation problems are also given. Finally, we report some numerical examples to show the comparison of the two relaxation problems. The numerical results show that the doubly nonnegative relaxation is more promising than the semidefinite relaxation for solving some DkS problems.
The paper is organized as follows: we present doubly nonnegative relaxation and a new semidefinite relaxation with tighter relaxation for the DkS problem in Sections 2.1 and 2.2, respectively. In Section 3, we prove that the two new relaxations proposed in Section 2 are equivalent. In Section 4, some approximation accuracy results for the proposed relaxation problems are given. Some comparative numerical results are reported in Section 5 to show the efficiency of the proposed new relaxations. Moreover, some concluding remarks are given in Section 6.

Two Relaxations for the Densest k-Subgraph Problem
First of all, the definition of the densest k-subgraph (DkS) problem is given as follows.
Definition 1 (Densest k-subgraph). For a given graph G(V, E), where V is the vertex set and E is the edge set. The DkS problem on G(V, E) is the problem of finding a vertex subset of V of size k with the maximum induced average degree.
Given a symmetric n × n matrix A = (a ij ), the weighted graph with vertex set {1, 2, . . . , n} associates with A in such a way: the edge [i, j] with the weight a ij is introduced in the graph. Then, A is interpreted as the weighted adjacency matrix of the graph with the vertex set V = {1, 2, . . . , n}. Based on Definition 1, the DkS problem consists of determining a subset V 1 ⊆ V consisting of k vertices such that the total weight of edges in the subgraph spanned by V 1 is maximized. To select subgraphs, assign a decision variable y i ∈ {0, 1} for each node (y i = 1 if the node is taken, and y i = 0 if the node is not). The weight of the subgraph given by y is y T Ay. Thus, the DkS problem can be phrased as the 0 − 1 quadratic problem (DkS) It is known that the (DkS) problem is NP-hard [20], even though A is assumed to be positive semidefinite, since the feasible space of the (DkS) problem is nonconvex. For solving this problem efficiently, we present the two new relaxations for the (DkS) problem in the following subsections, based on the idea of approximation methods.

Doubly Nonnegative Relaxation
Note that the quadratic term y T Ay in the (DkS) problem can also be expressed as A • yy T . By introducing a new variable Y = yy T and taking lifting techniques, we could reformulate the (DkS) problem into the following completely positive programming problem: where C 1+n is defined as follows: and for some finite vectors {z h } h∈H ⊂ R + 1+n \{0}. The following theorem shows the relationship between the (DkS) problem and the (CPP DkS ) problem. Its proof is similar to the one of Theorem 2.6 in [15] and is omitted here. Theorem 1. (i) The (DkS) problem and the (CPP DkS ) problem have the same optimal values of objective functions, i.e., Opt(DkS) = Opt(CPP DkS ); (ii) if (y * , Y * ) is an optimal solution for the (CPP DkS ) problem; then, y * is in the convex hull of optimal solutions for the (DkS) problem.
On one hand, according to Definition 2, it is obviously that the (CPP DkS ) problem is equivalent to the (DkS) problem. On the other hand, in view of the definition of convex cone, C 1+n is a closed convex cone, and is called completely positive matrices cone. Thus, the (CPP DkS ) problem is convex. However, since checking whether or not a given matrix belongs to C 1+n is NP-hard, which has been shown by Dickinson and Gijen in [21], the (CPP DkS ) problem is still NP-hard. Thus, C 1+n has to be replaced or approximated by some computable cones. For example, R + n and S + n are both computable cones; furthermore, N + n is also a computable cone. It is worth mentioning that Diananda's decomposition theorem [22] can be reformulated as follows, and its proof can be found in it.
Theorem 2. C n ⊆ S + n ∩ N + n holds for all n. If n ≤ 4, then C n = S + n ∩ N + n .
The matrices cone S + n ∩ N + n is sometimes called "doubly nonnegative matrices cone". Of course, in dimension n ≥ 5, there are matrices which are doubly nonnegative but not completely positive, the counterexample can be seen in [23].
By using Theorem 2, the (CPP DkS ) problem can be relaxed to the problem which is called the doubly nonnegative relaxation for the (DkS) problem. Some explanations are given below for this relaxation problem.

Remark 1.
Obviously, the (DNNP DkS ) problem has a linear objective function and the linear constraints as well as a convex conic constraint, so it is a linear conic programming problem. Meanwhile, it is notable that S + 1+n ∩ N + 1+n ⊆ S + 1+n and the types of variables in both the sets are the same, which further implies that the (DNNP DkS ) problem could be solved by some popular package softwares for solving semidefinite programs.

New Semidefinite Relaxation
It is well-known that semidefinite relaxation is a powerful approximation technique for solving a host of combinatorial optimization problems. In this subsection, we present a new semidefinite relaxation with tighter bound for the (DkS) problem.
The idea of the standard lifting is to introduce the symmetric matrix of rank one Y = yy T . With the help of Y, we could express the integer constraints y i ∈ {0, 1} as Y ii = y i , and the quadratic objective function y T Ay as A • Y . Thus, we can get the following equivalent formulation of the (DkS) problem Notice then that the hard constraint in the above problem is the constraint rank(Y) = 1, which is moreover difficult to handle. Thus, we can relax the above problem to the following standard semidefinite relaxation problem by dropping the rank-one constraint For the (I − SDR DkS ) problem, some remarks are given below.

Remark 2.
(i) Obviously, the (I − SDR DkS ) problem is also a linear conic programming problem, it has the same objective function and the equality constraints with the (DNNP DkS ) problem. The only difference between the (I − SDR DkS ) problem and the (DNNP DkS ) problem is that the (DNNP DkS ) problem has n(n+1) 2 + n nonnegative constraints more than the (I − SDR DkS ) problem.
Thus, the bound of the (DNNP DkS ) problem is not larger than the one of the (I − SDR DkS ) problem. In Section 5, we implement some numerical experiments to show the comparison between the (I − SDR DkS ) problem and the (DNNP DkS ) problem from the computational point of view.
Note that the (DkS) problem is inhomogeneous, but we can homogenize it as follows. First, let z = 2y − e in the (DkS) problem, it follows that z ∈ {−1, 1} n . Thus, the change of variable y → z gives the following equivalent formulation of the (DkS) problem: Then, with the introduction of the extra variable t, the (DkS) problem can be expressed as a homogeneous problem where 0 is a zero matrix with appropriate dimension.
Remark 3. The (DkS) problem is equivalent to the (DkS) problem in the following sense: if t * z * is an optimal solution to the (DkS) problem, then z * (resp. −z * ) is an optimal solution to the (DkS) problem with t * = 1 (resp. t * = −1).
By using the standard semidefinite relaxation technique, and letting let S = t z t z T , the (DkS) problem can be relaxed to the following problem: Moreover, again by using the standard semidefinite relaxation technique directly to the (DkS) problem, we have from Z = zz T , The (SDR DkS ) problem and the (SDR DkS ) problem are both standard semidefinite relaxation problems for the (DkS) problem. The upshot of the formulations of these two relaxation problems is that they can be solved very conveniently and efficiently, to some arbitrary accuracy, by some readily available software packages, such as CVX. Note that there is only one difference between these two relaxation problems, i.e., the (SDR DkS ) problem has one equality constraint more than the (SDR DkS ) problem. In Section 5, some comparative numerical results are reported to show the effectiveness of these two relaxations problems for solving some random (DkS) problems, respectively.
It is worth noting that the constraint z ∈ {−1, +1} n in the (DkS) problem further implies always holds. Thus, adding Formula (1) to the (SDR DkS ) problem, we come up with the following new semidefinite relaxation problem Obviously, the relationship Opt(II − SDR DkS ) ≤ Opt(SDR DkS ) holds since the feasible set of the (II − SDR DkS ) problem is the subset of the feasible set of the (SDR DkS ) problem and the two problems have the same objective function.
Up to now, three new semidefinite relaxation problems for the (DkS) problem are established, i.e., the (SDR DkS ) problem, the (SDR DkS ) problem and the (II − SDR DkS ) problem, in which the upper bound of the (II − SDR DkS ) problem is more promising than the one of the (SDR DkS ) problem. In the following sections, we will further investigate the relationship between these three problems with the (DNNP DkS ) problem.

The Equivalence between the Relaxation Problems
The previous section establishes the doubly nonnegative relaxation (i.e., the (DNNP DkS ) problem) and the semidefinite relaxation with tighter bound (i.e., the (II − SDR DkS ) problem) for the (DkS) problem. Note that the (DNNP DkS ) problem has n inequality constraints more than the (II − SDR DkS ) problem. In this section, we will prove the equivalence between the two relaxations. First of all, the definition of the equivalence of two optimization problems is given as follows. In order to establish the equivalence for the (DNNP DkS ) problem and the (II − SDR DkS ) problem, a crucial theorem is given below and the details of its proof can be seen in [24] (Appendix A.5.5). To the end, by using Definition 2 and Theorem 3, we have the following main equivalence theorem. Proof. First of all, we prove that Opt(DNNP DkS ) ≥ Opt((II − SDR DkS )).
Suppose that (z * , Z * ) is an optimal solution of the (II − SDR DkS ) problem, and let Directly from e T z * = 2k − n and Equation (2), we have By Equation (2) and e T Z * e = (2k − n) 2 , it holds that Since diag(Z * ) = e, Equation (2) further implies that Combining with Equation (5), it is true that from Formula (6) By Theorem 3 (ii) and Equation (2), it follows that i.e., 1 y T y Y ∈ S + 1+n .
Again from Equation (9) and e T Y * e = k 2 , it is true that From Equation (9) and Theorem 3 (ii), it holds that By Equations (11)-(14), we can conclude that (z, Z) defined by Equation (9) is a feasible solution of the (II − SDR DkS ) problem. Furthermore, we have i.e., Opt(DNNP DkS ) ≤ Opt(II − SDR DkS ). Summarizing the analysis above, we obtain Opt(DNNP DkS ) = Opt(II − SDR DkS ). From Equations (2) and (9), we observe that (y, Y) defined by Equation (2) is an optimal solution for the (DNNP DkS ) problem and (z, Z) defined by Equation (9) is also an optimal solution for the (II − SDR DkS ) problem, respectively. According to Definition 2, we conclude that the (DNNP DkS ) problem and the (II − SDR DkS ) problem are equivalent.
The above Theorem 4 shows that Opt(DNNP DkS ) = Opt(II − SDR DkS ). Note that the (DNNP DkS ) problem has n inequality constraints more than the (II − SDR DkS ) problem, thus the computational cost of solving (DNNP DkS ) problem may be greater than that of the (II − SDR DkS ) problem.

The Approximation Accuracy
The above section shows that the (DNNP DkS ) problem is equivalent to the (II − SDR DkS ) problem which has the tighter upper bound compared to the (SDR DkS ) problem (see Theorem 4). In this section, we further investigate the approximation accuracy of the (DNNP DkS ) problem for solving the (DkS) problem, comparing with the standard semidefinite relaxation problems which was proposed in the above sections, under some conditions.
To simplify the expression, we denote then the (SDR DkS ) problem is simplified to the following problem: and the (II − SDR DkS ) problem can be simplified as follows: Combining Theorem 3 in [25] with the corresponding known approximation accuracy of semidefinite relaxation for some quadratic programming problems [26], we immediately have that the following theorem holds.
In the following analysis, we assume that k = n 2 . We first observe that Obviously, diag(S) = e implies that ∑ i S ii = n + 1, i.e., I • S = n + 1, but we could not obtain diag(S) = e from I • S = n + 1. These results further imply that Similar to the Theorem 4.2 in [27], we have that the following approximation accuracy theorem holds.
Up to now, we not only establish the equivalence between the (DNNP DkS ) problem and the (II − SDR DkS ) problem, but also some approximation accuracy results about the (DNNP DkS ) problem and some standard semidefinite relaxation problems are given. In the following Section 5, we will implement some numerical experiments to give a flavour of the actual behaviour of the (DNNP DkS ) problem and some semidefinite relaxation problems.

Numerical Experiments
In this section, some random (DkS) examples are tested to show the efficiency of the proposed relaxation problems. These relaxation problems are all solved by CVX [28], which is implemented by using MATLAB R2010a on the Windows XP platform, and on a PC with 2.53 GHz CPU. The corresponding comparative numerical results are reported in the following parts.
To give a flavour of the behaviour of the above relaxation problems, we consider results for the following test examples. The data of the test examples are given in Table 1.  The first column of Table 1 denotes the name of the test examples, n and k stand for the number of vertices of the given graph and the finding subgraph, respectively. The last column denotes the procedures for generating the coefficient matrices A in the (DkS) problem. The more detailed explanations for the procedures are given as follows: • P25. 50 random examples are generated from the 'seed = 1,2,...,50'. The corresponding coefficient matrices A of order n = 25 with integer weights are drawn from {0, 1, . . . , 10}.
• P30. This example is generated by the MATLAB function randn from the 'seed = 2012'. The elements of A satisfy the standard normal distribution.
• P40. This example is generated by MATLAB function rand from the 'seed = 2017'. The elements of A satisfy the standard uniform distribution on the interval (0, 1).
• • P60. This example is generated by MATLAB function rand from the 'seed = 2020'. The elements of A are drawn from {0, 1}.
First of all, the performances of the (DNNP DkS ) problem and the (II − SDR DkS ) problem as well as the (I − SDR DkS ) problem, for solving P25 and P50, are compared. We use the performance profiles described in Dolan and Moré's paper [29]. Our profiles are based on optimal values (i.e., average degree) and the number of iterations of these relaxation problems. The Cumulative Probability denotes the cumulative distribution function for the performance ratio within a factor τ ∈ R, i.e., is the probability that the solver will win over the rest of the solvers. The corresponding comparative results of performance are shown in Figures 1 and 2.
The comparative results for P25 are shown in Figure 1. It is obvious that the (DNNP DkS ) problem and the (II − SDR DkS ) problem have the same performance, which is a bit better than that of the (I − SDR DkS ) problem from the viewpoint of optimal values. In view of the number of iterations, the performance of the (I − SDR DkS ) problem is the best, and the performance of the (II − SDR DkS ) problem is better than that of the (DNNP DkS ) problem.
The performance of the three relaxation problems for solving P50 is shown in Figure 2. The results show that the performance of the (DNNP DkS ) problem is the same as that of the (II − SDR DkS ) problem; they are both much better than that of the (I − SDR DkS ) problem in view of optimal values-although the performance of the (I − SDR DkS ) problem is better than the one of the (DNNP DkS ) problem and the (II − SDR DkS ) problem from the viewpoint of the number of iterations. All of results show in Figures 1 and 2 further imply that the (DNNP DkS ) problem and the (II − SDR DkS ) problem can generate more promising bounds for solving P25 and P50, compared with the (I − SDR DkS ) problem, while the number of iterations is a bit more. Moreover, the (DNNP DkS ) problem and the (II − SDR DkS ) problem have the same performance based on optimal values, although the performance of the (II − SDR DkS ) problem is better than that of the (DNNP DkS ) problem from the viewpoint of the number of iterations, for solving P25 and P50.
In order to further show the computational efficiency of the (DNNP DkS ) problem, which is compared with the (II − SDR DkS ) problem and some other types of semidefinite relaxation problems proposed in [30], for solving some (DkS) problems. The test examples A50 and A100 are chosen from [30]. (R-20), (R-24) and (R-MET) denote the three semidefinite relaxation problems proposed in [30], respectively. The corresponding numerical results are shown in Table 2, where "−" means that the corresponding information about the number of iterations is not given in [30]. The results show that the computational efficiency of the (DNNP DkS ) problem is better than the one of the (II − SDR DkS ) problem from the viewpoints of optimal values and number of iterations, respectively. Note that the performance of the (DNNP DkS ) problem and the (II − SDR DkS ) problem are both much better than that of (R − 20) and (R − 24). Moreover, the performance of the (DNNP DkS ) problem is more competitive with (R − MET) for solving these two problems.  Table 3. The results signify that the efficiency of the (DNNP DkS ) problem is always better than that of the (II − SDR DkS ) problem from the viewpoint of optimal values and the number of iterations as well as CPU time, respectively, for solving these examples. The performance of the (I − SDR DkS ) problem and the (SDR DkS ) problem are almost the same for solving these examples. Moreover, note that the optimal value of the (DNNP DkS ) problem for solving P80 is larger than that of the (II − SDR DkS ) problem. Thus, we can conclude that it may be more promising to use the (DNNP DkS ) problem than to use the (II − SDR DkS ) problem for solving some specific (DkS) problems in practice.

Conclusions
In this paper, the DkS problem is studied, whose goal is to find a k-vertex subgraph such that the total weight of edges in this subgraph is maximized. This problem is NP-hard on bipartite graphs, chordal graphs, and planar graphs. By using the advantages of the structure of the DkS problem, the doubly nonnegative relaxation and the new semidefinite relaxation with tighter relaxation for solving the DkS problem are established, respectively. Moreover, we prove that the two relaxation problems are equivalent under the suitable conditions, and give some approximation accuracy results for these relaxation problems. Finally, the comparative numerical results show that the efficiency of the doubly nonnegative relaxation is better than the one of semidefinite relaxation for solving some DkS problems. Acknowledgments: The authors thank the reviewers for their very helpful suggestions, which led to substantial improvements of the paper.

Conflicts of Interest:
The authors declare no conflict of interest.