Clustering Vertex-Weighted Graphs by Spectral Methods

Spectral techniques are often used to partition the set of vertices of a graph, or to form clusters. They are based on the Laplacian matrix. These techniques allow easily to integrate weights on the edges. In this work, we introduce a p-Laplacian, or a generalized Laplacian matrix with potential, which also allows us to take into account weights on the vertices. These vertex weights are independent of the edge weights. In this way, we can cluster with the importance of vertices, assigning more weight to some vertices than to others, not considering only the number of vertices. We also provide some bounds, similar to those of Chegeer, for the value of the minimal cut cost with weights at the vertices, as a function of the first non-zero eigenvalue of the p-Laplacian (an analog of the Fiedler eigenvalue).


Introduction
Informally, a cluster in a graph is a subgraph whose vertices are tightly linked between them, and loosely linked with other vertices, out of the subgraph. Such a vague concept is useful in the description of several phenomena: walking, searching, and decomposition of graphs [1]. The concept of a cluster is closely related to that of community [2]. Depending on the intended application, the meaning of the concept can be specified either to better reflect the aspects to be modeled or to ease its calculation. In our case, we were interested in the subdivision of the set of vertices in two parts, of similar size, in such a way that the number of edges between the two parts is kept to a minimum. To solve this problem, there are many references in the literature about spectral tools, based on a certain eigenvector of the Laplacian operator (Fiedler vector) [3]. These references also show how to consider weights at the edges of the graph so that the minimization of the edges takes these weights into account. In this work, we extended this tool in case there are also weights on the vertices independent of those on the edges so that the partition is made in parts of similar total weight, not necessarily a similar number of vertices, minimizing the total edge weight of the cut. Other proposals in the literature deal with this problem using generalized eigenvectors (see [4] and the references therein). In this article, we remain in the context of the usual eigenvectors. Our interest in avoiding generalized eigenvalues is that one of the possible applications is spectral clustering, along the lines of [5]. This method uses several eigenvectors to build a template for each vertex and forms the clusters with these templates. The eigenvectors are orthogonal to each other, but the generalized eigenvectors are not, so the templates formed with these will be less effective. In other words, we needed larger templates with generalized eigenvectors than with usual eigenvectors. An example of our work related to spectral clustering is [6].
A similar problem is studied in [7], where a Laplacian that incorporates both weights at the vertices and at the edges is defined. The vertex weights are multiplicatively integrated directly as matrix factors. The novelty of our approach is that our weights (called ρ) derive from a potential p that is additively integrated on the diagonal. The relationship between the potential p that must be introduced to obtain the desired weights ρ is specified in Theorem 3. This is significant because it allows us to overcome a technical difficulty of the cited work, obtaining error bounds (Theorem 4) as a function of the potential, not of the weights. A case of application of this method of weights from a potential is our contribution (Chapter 2.4) to the collective work [8]. In that work, we used the Laplacian matrix with potentials to perform spectral graph partitioning for process placement on heterogeneous computing platforms.
In the next section, we review standard notations and concepts about graph and matrix representation. In the following Section 3, we describe known facts about the spectral partition using a Laplacian matrix. Section 4 contains our contribution: we define a Laplacian matrix with potential (p-Laplacian matrices) and show that certain matrix of this type can be used to perform a partition on the set of vertices, in sizes of similar weight, minimizing the edges cut (Theorem 3). It also contains a Cheeger-style bound on the difference between the true value of the minimum partition and the approximated value obtained using the Fiedler eigenvector of the p-Laplacian matrix (Theorem 4). The value considered in this result is the ratio cut of the partition, instead of the total cut. Our purpose was to show that it is possible to give bounds analogous to those that appear in the literature but adapted to this approach of weights at the vertices with independence from the weights at the edges. We emphasize that this bound is an improvement, similar to that of Mohar, for vertex-and-edge-weighted graph spectral analysis.

Graphs and Transfer Matrices
A graph G = (V, E) consists of a set V of vertexes and a set E of subsets of two vertices (the edges). That is, E ⊂ P 2 (V). For u, v ∈ V, the edge {u, v}, also noted u ∼ v, is said to go between u and v. In this study, we used this definition of a graph, which does not model the direction of the edge or loops.
A weight on edges is a map w : E → R.
The weight of the edge u ∼ v is noted w(u, v). If a weight on the edges is not specified, implicitly, the constant unit weight must be considered (that is, w(u, v) = 1 for each {u, v} ∈ E). A weight on vertexes is a map The set of these maps, that is, the set of all vertex weights, is denoted R V . Clearly, it is a vector space.
To represent graphs using matrices, we chose an ordering of the set of vertexes, V = {v 1 , . . . , v n }. The adjacency matrix of G (for this conventional ordering) is the n × n matrix A = (a ij ) i,j=1,...,n of values: We represent a vertex weight s ∈ R V as the vector (s i ) i=1,...,n of its values s i = s(v i ). The adjacency matrix operates in R V , the set of vertex weights, as a right matrix product (postmultiplication).
This is the transference (or shift) of the vertex weight (s i ) by the graph G.
The postmultiplication of s, as a row vector, by A, is usual in matrix analysis of finite Markov chains [9]. We can see the shift as the following action: the shift sA represents that each edge v i ∼ v j takes the content of v i , that is, s(v i ), and transports it to the vertex v j (modifying its amount by the factor of transference a ij ). The sum of the values transferred to v j is ∑ s(v i )a ij , with a gain a ij from each adjacent vertex v i . So, the vertex weight sA is: In the literature, the interest is focused on symmetrical matrices (because they are modeling undirected graphs), so there is no difference between pre-and postmultiplication by A. Besides, note that the main diagonal is zero because the graphs do not have loops. In Section 4, we introduce potentials that can be viewed as a method to use the diagonal entries to carry vertex weights, independently of the edge weights.

Laplacian and Partitions
This section summarizes standard notions about graphs, partitions, and the Laplacian operator, which can be seen in [3] or [10], although the notation is adapted to our particular purposes, and the Lemma 3 is proved in a novel way, without using summatories.
A partition of a set V is an array If the graph is weighted, the total cut is Any subset U ⊂ V defines a cut, (U, U c ), which is called the cut of U. It is usually preferable, between several partitions in a graph, that one has a minimum number of cut edges (or total cut, if weighted). The cut of the graph is: In this study, we were interested in partitions with minimal cuts but with a balanced number of vertices, that is, |V 1 | = |V 2 | for an even number of vertices or |V 1 | = |V| 2 for any number of vertices (a bipartition). The bipartition width [11] is: We also used the cut ratio of U (the quotient i(U) = cut(U, U c )/|U|) and the isoperimetric number: We expressed, using linear algebra, the combinatorial problem of finding the partitions that realize these minimums.
Let us suppose we are given an ordering V = {v 1 , v 2 , . . . , v n } in the set of vertices. A vector x ∈ R V has an entry x i for each v i . The characteristic vector c S of a set S ∈ V is c S = (x i ) i=1,...,n with: Sometimes it is preferable to use other values than 0 or 1 in the vector expression of a combinatorial object like a subset or partition [4]. For two real values b 1 For example, the (0,1)-indicator is the characteristic of the second set of the partition. We mainly use (1,−1)-indicators. We denote x · y, which is the standard scalar product in R n . In this way, a matrix A has associated the bilinear form x · Ay. The vector 1 has the value 1 in each component. The degree vector is If A is the adjacency matrix of a graph, the vector A1 has, in the i-th entry, the degree Ac S has, in entry i-th, the number of edges to v i from vertices in S.
Proof. They are straightforward.

Lemma 2.
Being c 1 and c 2 characteristic vectors of the sets of a partition (V 1 , V 2 ): Proof. By Lemma 1 (iv), Ac 2 contains in i-th entry the total weight of edges between v i and V 2 . Hence, c 1 · Ac 2 is the sum of the weight of the edges in the set V 1 ∼ V 2 , defined as Calling D g = diag(g) the matrix with the degree vector g in the diagonal and zero off-diagonal, we have 1 · D g 1 = ∑ i d i . Being x the (1,−1)-indicator of any partition, we also have x · D g x = ∑ i d i , because the minus signs appear in pairs.
Defining the Laplacian as L = D g − A, we have: with c 1 and c 2 characteristic of V 1 and V 2 , respectively, and: We deduced this well-known identity in matrix form, instead of summatory form as usual. So we not only avoided the index chasing but also made explicit the role of the This deduction also clarifies the role of the diagonal degree matrix D g .
In addition to the expression of the cost as a bilinear form L, we expressed the requirement that the partition (V 1 , V 2 ) be balanced as 1 · x = 0. So, the problem of finding the bipartition of minimal cost is the following problem of combinatorial optimization: This combinatorial problem is an NP-complete problem [12]. To approximate a solution, with polynomial computational cost, it is customary to relax the constraints x i = ±1. This relaxed problem is a numerical one that has several features that ease its resolution: L is symmetrical, hence its eigenvalues are real, and there is an orthonormal basis of eigenvectors [13]. Besides, 1 is an eigenvector of eigenvalue 0, because D g 1 − A1 = 0. Additionally, L is a weak diagonally dominant of positive diagonal, hence (by the theorem of the Geršgorin discs [14]) its eigenvalues are nonnegative 0 = λ 0 ≤ λ 1 ≤ · · · ≤ λ k . If G is connected, λ 0 < λ 1 [3]. These features of L are generally deduced from its expression as summatory of squares, which we have avoided. The result about diagonal dominance that we used instead is also easy to see.
The Rayleigh quotient of a n × n symmetric matrix M is defined for x = 0 in R n . It plays a role in the following min-max theorem of Courant-Weyl [13], which we use without proof: Theorem 1. Let S k be the set of subspaces of R n of dimension lesser or equal than k, for k = 1, 2, . . . , n, M symmetric with eigenvalues λ 0 ≤ λ 1 ≤ · · · ≤ λ n−1 and corresponding eigenvectors f 0 , f 1 , . . . , f n−1 . Then Besides, the argument E giving the minimum is E = span( f 0 , f 1 , . . . , f k−1 ), and an argument x giving the maximum is f k−1 .
In particular, λ 0 = min x∈R n ||x||=1 R M (x), because each x = 0 span an E ∈ S 1 . In the case that M = L, the eigenvector x 0 is (a scalar multiple of) 1, as commented above, and the others are orthogonal to it: 1 · f i = 0, i = 1, . . . , k − 1. In the case i = 1, To relate this result with the cut value of partitions note that if x an indicator vector of a partition (x i = ±1), the cut x·Lx 4 is proportional to the Rayleigh quotient. The minimum λ 1 = min 1·x=0 x·Lx x·x is reached in a vector f 1 that is an eigenvector for λ 1 . That is, f 1 is a solution to the relaxed problem, although it may not be an indicator vector. The first non-null eigenvalue λ 1 is termed the Fiedler value, and its eigenvector f 1 is the Fiedler vector.
There are several rounding or truncation methods to obtain an indicator vector x = (x i ) i (i.e., with integer values −1, 1 for x i ) from f 1 (whose entries are not necessarily integers). The most direct rounding is the partition by sign: This rounding can give a partition that is not a bipartition. Another rounding method uses the median, achieving precisely bipartitions: if m is the median value of the entries of f i , For these rounding methods and others that appear in the literature [15], there is an error bound. The error is the difference between the partition obtained by the rounding and the partition that actually minimizes the cut, which is obtainable by combinatorial methods. These bounds are known as discrete Cheeger bounds since they involve the first nonzero eigenvalue. In particular, we use the following Cheeger bound developed by Mohar [16]. It compares the cut ratio i(U s ) = cut(U s ,U c s ) |U s | of the partition induced by the sign rounding, U s = {x i | ( f 1 ) i > 0}, and the isoperimetric number i(G).

Theorem 2.
If G has more than three vertexes and its maximal degree is ∆, then This bound uses the sign rounding and the cut ratio, i(U s ), instead of the median rounding and the edge cut, cut(U, U c ), which we use to describe our framework. However, we chose it because it is easier to present the generalization that we make for vertexweighted graphs in the next section. The cut ratio has also been studied in the stochastic setting [17]. In principle, it is also possible to generalize for vertex-weighted graphs the similar bounds that there are in the literature for median rounding and edge cut [18].

Laplacians with Potential
To motivate our contribution, we recall here the usual interpretation, for instance in [19], of the values of a weight s on vertices using flows on graphs. The vertex weight s(v i ) corresponds to the amount or magnitude of some physical substance placed at v i . The weights at the edges w ij , i = j between different vertices correspond, in this interpretation, to a transmission factor or gain that affects the substance when it flows from one vertex v i to another v j , increasing or decreasing its amount. Following this interpretation of the weight matrix as a transference or shift, the weights in the loop edges w ii are the gain that suffers the substance that stays in the same vertex v i .
In this diffusion process interpretation, the eigenvectors of a shift matrix are the stationary substance distributions. In particular, the Laplacian matrix has a stationary distribution that is uniform (corresponding to the null eigenvalue) because the degree values on the diagonal make the total gain of substance equal to zero. In this sense, the Laplacian process is conservative. Another example is the shift by the adjacency matrix of a connected graph, which has a positive stationary distribution (the random walk limit distribution, corresponding to the Perron eigenvector [3]), with a substance gain given by the Perron eigenvalue.
Following this vein, we show how to control the weights in the diagonal w ii to obtain any positive ρ : V −→ R as a stationary distribution, the eigenvector of a matrix similar to the Laplacian.
As commented, a weight on vertices is a function p : V −→ R. Its diagonal form is the matrix D p = diag(p(x i )). The generalized Laplacian with potential p (or p-Laplacian) is: That is, L p = D g − A + D p . In particular, with p = 0, the 0-Laplacian is the ordinary Laplacian. Some properties of p-Laplacians are similar to those of ordinary Laplacians, as can be viewed in [20] under the name of generalized Laplacian. For our purposes, we highlight the following ones, whose proof require some technicalities: Lemma 4. If the graph G is connected and the potential p verifies then: (a) The eigenvalues of L p are real, and the minimum eigenvalue has multiplicity 1. That is, There is a positive eigenvector corresponding to λ 0 , unique up to a scalar multiple. [21]. As G is connected, L p is irreducible. By Observation 1.4.3 of [21], the claims follow.
The minimum eigenvalue is the Perron eigenvalue λ 0 . It has multiplicity 1, and there is an eigenvector of the Perron eigenvalue with all positive entries. To fix one such eigenvector, we define the Perron eigenvector ρ as that with ρ = 1.
The min-max theorem for the operator L p gives us that λ 1 = min x =0 x·L p x x·x , and the minimum is reached in an eigenvector of λ 1 of norm 1 (the Fiedler vector φ).
With these properties, we can replicate the spectral partition methodology because the spectral decomposition of L p assures that φ · ρ = 0. This can be understood, as in the ordinary Laplacian L above, that the positive and negative values of the Fiedler vector give us an indicator of two sets of vertexes. This indicator cuts V in two parts of equal absolute sum of Perron values. That is, However, in this case, the Perron vector is not the constant distribution, 1, but a positive distribution ρ. We use the distribution ρ as a measure of the relative importance of the vertexes in a partition or clustering.
Note that the Perron eigenvector ρ is defined up to a constant factor, in the sense that ρ = kρ, for k > 0, is also a positive eigenvector of the same eigenvalue. Our choice of ρ = 1 is conventional. Note also that, as in the ordinary case commented in Section 3, the Fiedler vector φ does not necessarily have ±1 components and should be considered as an approximation to the optimal partition.
To build a potential p such that the Perron distribution of L p will be a predefined given ρ, we apply the formula of the following theorem. Note that it is scale-invariant, in the sense that ρ and kρ, for k > 0, will produce the same potential p.
Remember that for a vector x, we denote its i-th component as x i , and a function x : V → R is identified with the vector x i = x(v i ). We can build a potential p such that the Perron vector ρ of L p have predetermined positive values ρ i : Theorem 3. For any vector ρ such that ρ i > 0 for i = 1, . . . , n, if being A = (a ij ) the adjacency matrix, then the Perron vector of L p is ρ, and the Perron value is 0.
Proof. As p verifies the hypothesis of Lemma 4, L p has a Perron eigenvalue. Note that for each i = 1, . . . , n, So, L p ρ = 0. To conclude that 0 and ρ are the Perron eigenvalue and eigenvector, we use the fact that in a symmetrical matrix the eigenvectors of different eigenvalues are orthogonal. Consequently, only one eigenvalue can have associated a positive eigenvector as ρ, and this is 0.
With this result, we can do spectral partition with preassigned weights ρ on the vertexes. By the above discussion, the p-Laplacian for the potential p corresponding to the given ρ has a Fiedler vector orthogonal to the Perron vector ρ (that is, it produces a partition in parts of equal total weight at the vertices). The value of the cut is also well expressed by the p-Laplacian, in the following lemma. For a vertex set U ⊂ V, we denote p(U) = ∑ v∈U p(v).
Therefore, to find a partition (i.e., an indicator x) that minimizes x · L p x is equivalent to minimizing cut(V 1 , V 2 ), because p(V)/4 is a constant, given the potential p.
For a predefined weight on vertices ρ = (ρ 1 , ρ 2 , . . . ρ n ), we build the potential p by Theorem 3. The p-Laplacian L p specifies the vertex ρ-weighted bipartition problem: This combinatorial problem is at least NP-complete as the conventional bipartition problem, which includes it as a subproblem. In any case, we consider the relaxed problem (without the restriction x i = ±1) that has as solution, by the min-max Theorem 1, the Fiedler vector φ of L p .
By taking this Fiedler vector as an approximation to the combinatorial solution, that is, the unrelaxed problem, the error can be bounded with a Chegeer expression similar to that of Mohar (Theorem 2). To express this bound, in the following Theorem 4, we define the cut ratio of U with respect to p as and the isoperimetric number with respect to p as Given a vector x ∈ R V , x i = x(v i ), we order the set of its values as t 1 > t 2 > · · · > t m . That is, for each i in 1, . . . , n there is one j in 1, . . . , m and only one with t j = x(v i ). The level sets of x are, for each k = 1, . . . , m, V k = {v ∈ V | x(v) ≥ t k }. The sweep cut h p (x) is the minimum cut following level sets of x, that is: It is clear that, for any x, i p (G) ≤ h p (x). We define also S(x), the increment of square values along x, as Lemma 6. Let L be the ordinary Laplacian, for any x: Proof. For (a), consider the level sets of x, that is, vertices sorted such that x 1 ≥ x 2 ≥ · · · ≥ x n , being t 1 > t 2 > · · · > t m the different values of x. For each level set V k , by the definition of sweep cut we have: Besides extending with a value t m+1 = 0 to ease the notation.
Combining (1) and (2): The last equality is because, for each pair of vertices v i ∼ v j , if x i = t k 0 and x j = t k 1 with 1 ≤ k 0 < k 1 ≤ m, the double summation includes a chain of terms (t 2 , we consider vectors u, v ∈ R E , that is, indexed by the edges If we apply the Cauchy-Schwarz inequality u · v ≤ u v to the particular vectors For the first root factor, it is known that Laplacian (see for example [15]). For the second root factor, by the trivial fact that (a + b) 2 ≤ 2(a 2 + b 2 ) we have: To prove the following claim, we apply the above lemma to φ, the Fiedler eigenvalue of L p . Theorem 4. Being m p = min i (p(v i )), ∆ = max i (deg(v i )), λ 1 and φ the Fiedler eigenvalue and eigenvector of L p , then: Proof. For the first inequality, calling x b the (−1,1)-indicator of the minimal bipartition (V 1 , V 2 ), that is, such that i p (G) = cut(V 1 , V 2 ), by Lemma 5 we have: x·L p x x·x for each x is λ 1 x · x ≤ x · L p x. In particular, λ 1 n = λ 1 x b · x b ≤ x b · L p x b , hence λ 1 n ≤ 4i p (G) + p(V) and: For the second inequality, being h p (φ), a particular cut (the sweep cut of φ) the minimal cut i p (G) is lesser or equal than it.
Finally, for the third inequality, we use Lemma 6 inequalities for the particular case φ·D p φ , that from (b) is lesser or equal than The value of these terms are φ · D g φ = ∑ i deg(v i )φ 2 i and φ · D p φ = ∑ i p(v i )φ 2 i . Additionally, as L p = L + D p and L p φ = λ 1 φ, we have Substituting those values in the inequalities: The last inequality is because, by definition of m p and ∆, for each i = 1, . . . , n, we have deg(v i ) ≤ ∆, p(v i ) ≥ m p and also (λ − p(v i )) ≤ (λ − m p ).
This result is similar to the above Theorem 2, in this case bounding the cost of the minimal cut i p (G) and the sweep cut h p (φ) of the Fiedler eigenvector of L p .

Conclusions
We introduced a spectral partitioning method for arbitrary (positive) vertex weights not related to edges' weights or vertex degrees. We also provided a bound of the Cheeger type for the error in taking the eigenvector numerical solution as an approximation for the combinatorial problem. This bound is similar to others that appear in the literature for cuts with uniform vertex weights.
In this work, we did not take into account a distribution of weights on the edges. The usual methods of considering ordinary Laplacians with edge weights (for example, [3]) can be extended to generalized Laplacians as we exposed (that is, including a vertex potential that induces vertex weighting in the spectral partitioning). The two types of weights, on vertices and on edges, are independent. This was not considered for simplicity, but it will be the content of future work. Funding: This work was jointly supported by the European Regional Development Fund "A way to achieve Europe" (ERDF) and the Extremadura Local Government (Ref. IB20040) and by the Spanish Ministerio de Ciencia e Innovación through project PID2019-110315RB-I00 (APRISA).

Conflicts of Interest:
The authors declare no conflict of interest.