Cluster Persistence for Weighted Graphs

Persistent homology is a natural tool for probing the topological characteristics of weighted graphs, essentially focusing on their 0-dimensional homology. While this area has been thoroughly studied, we present a new approach to constructing a filtration for cluster analysis via persistent homology. The key advantages of the new filtration is that (a) it provides richer signatures for connected components by introducing non-trivial birth times, and (b) it is robust to outliers. The key idea is that nodes are ignored until they belong to sufficiently large clusters. We demonstrate the computational efficiency of our filtration, its practical effectiveness, and explore into its properties when applied to random graphs.


Introduction
Clustering data is a fundamental task in unsupervised machine learning and exploratory data analysis.It has been the subject of countless studies over the last 50 years with many definitions and algorithms proposed, e.g., [13,15].Persistent homology [8,22] is a powerful topological tool that provides multi-scale structural information about data.Given an increasing sequence of spaces (filtration), persistent homology tracks the formation of connected components (0-dimensional cycles), holes (1-dimensional cycles), cavities (2-dimensional cycles), and their higher-dimensional extensions.The information encoded in persistent homology is often represented by a persistence diagram -a collection of points in R 2 , representing the birth and death of homology classes, and providing an intuitive numerical representation for topological information (see Figure 1).The connection between clustering and 0-dimensional persistent homology has been well-established under a various different scenarios including the relationship with functoriality [4,5], and density-based methods [2,6].An important motivating factor for connecting these methods is stability.Namely, given small perturbations of the input data, persistent homology and can provide guarantees on the number of the output clusters.One

Graph Filtrations and Persistent Homology
In this section, we introduce the required topological notions.As we focus on the special case of graphs and connected components, i.e. 0-dimensional homology, we restrict our definitions to this case.For a general description of k-dimensional homology we refer the reader to [11,16].
Let G = (V, E) be an undirected graph.Our main object of study is a graph filtration or an increasing sequence of graphs.This can be constructed by defining a function τ : (V ∪ E) → [0, ∞), under the restriction that if e = (u, v) ∈ E, then τ (e) ≥ max(τ (u), τ (v)).This restriction ensures that the sublevel sets of τ define a subgraph.The filtration {G t } t≥0 is then defined via As we increase t from 0 to ∞, we can track connected components of G t as they appear and merge, which are referred to as birth and deaths, respectively.When two components merge, we use the 'elder rule' to determine that the later born component is the one that dies.Note that at least one component has an infinite death time in any graph filtration.We refer the reader to [8] for further details on this.
These birth-death events can be tracked by an algebraic object called a 0-dimensional persistent homology.Its most common visualization is via a persistence diagram -a collection of points in R 2+ , where each corresponds to a single connected component.The coordinates of a points encode the information with the x-coordinate representing the birth time, and the y-coordinate representing the death time.An example for a function on a line-graph is shown in Figure 1.Note that one component is infinite which we denote with a dashed line at the top of the diagram.
In a more general context, given a filtration of higher-dimensional objects (e.g., simplicial complexes), we can study the k-dimensional persistent homology.This object tracks the formation of k-dimensional cycles (various types of holes), and its definition is a natural extension of the 0-dimensional persistent homology we study here.We refer the reader to [8] for more information.

The k-Cluster Filtration
Let G = (V, E, W ) be an undirected weighted graph.In computing 0-dimensional persistent homology, the filtration values are commonly taken to be τ (v) = 0 for all v ∈ V , and τ (e) = W (e) for all e ∈ E. We will denote this filtration by G * t .In other words, we assume all vertices are present at time zero, and edges are gradually added according to the weight function W .This has been the practice in the TDA literature in almost all studies, and in particular in the geometric settings where W represents the distance between points (i.e., the geometric graph, which is the skeleton of both the Čech and Vietoris-Rips complexes).While in many models, this choice of τ seems reasonable, it has two significant drawbacks: • The produced persistence diagrams are degenerate, as the birth times of all 0-cycles is t = 0.
This significantly reduces the amount of information we can extract from persistence diagrams.
• The generated persistence diagrams are superfluous, in the sense that they contains a point for each vertex V , while obviously not all vertices contribute significant structural information.
In this paper we propose a modification to the standard graph filtration, that will resolve both of these issues, and will lead to a more concise and informative persistence diagrams.
We will first define the filtration value for the vertices.For every vertex, and a value t > 0 we define N t (v) to be the number of vertices in the connected component of G * t that contains v. Fix k ≥ 1, and define The edges values are then Denoting the corresponding filtration by In other words, compared to G * t , in G (k) t we delay the vertices appearance, until the first time each vertex is contained in a component with at least k vertices (and adjust the edge appearance to be compatible).Effectively, the assignment of the new filtration values to the vertices introduces two changes to the persistence diagrams: 1.All the points that are linked to components of size smaller than k are removed.

Each death time corresponds to an edge merging two components larger than k.
We call this filtration the 'k-cluster filtration', to represent the fact that it tracks the formation and merging of clusters of size at least k.The parameter k determines what we consider as a sufficiently meaningful cluster.In G * t , every vertex is considered a cluster, but statistically speaking, this is an overkill.The chosen value of k should depend on the application as well as the sample size.
We conclude this section showing that the k-cluster filtrations are decreasing (in a set sense) as we increase k.This can be useful, for example, in the context of multi-parameter persistence, which we briefly mention but leave for future work.

Algorithm
In this section, we describe an efficient one-pass algorithm for computing the filtration and persistence diagram at the same time.The time complexity of the algorithm is O(|E| × α(|V |)), where α(•) is the inverse Ackermann function [7].This is the same complexity as computing the 0-dimensional persistence diagram if we were given the filtration as input.
We begin with the (standard) terminology and data structures.For simplicity of the description, we assume that the weights on the edges are unique and the vertices have a lexicographical order.We first define a total order on the vertices as follows: the filtration function determines the ordering.
Undefined filtration functions are assumed to be ∞.If the function is the same or undefined for both vertices, the order is then determined by lexicographical ordering.It is straightforward to check this is a total ordering.Remark 4.1.In the case of a total ordering, one can choose a representative of 0-dimensional persistent homology classes -notably, in the total ordering a unique vertex is the earliest generator for the homology class (i.e., the cluster) which we denote as the canonical representative of the persistent component.
To track components as we proceed incrementally through the filtration, we use the union-find data structure, which supports two operations: • ROOT(v): returns the canonical representative for the connected component containing v.
• MERGE(u, v): merges the connected components containing u and v into one componentincluding updating the root.
We augment the data structure by keeping track of two additional records: • SIZE(v): returns the size of the connected component containing v.
• COMPONENT(v): returns the list of vertices in the same component as v.
To track the size of the component, we store the size at the root (i.e., the canonical representative) of each component, updating each time a merge occurs.To access a connected component, recall that the union-find data structure is implemented as a rooted tree.For each vertex, we store a list of children in the tree.To recover the list of vertices in the component, we perform a depth-first search of the tree starting from the root (although any other traversal method could be used).All update operations have O( 1) cost (cf., [7]).
Note that when k = 1, the filtration value for all vertices is 0 and so the problem reduces to finding the minimum spanning tree of a weighted graph.Hence, we will assume that k > 1.Initially, we set the filtration function τ (v) = 0 for all vertices, and τ (e) = W (e) for all edges, and assume the edges are sorted by increasing weight.Note that if this is not the case, this step will be the bottleneck, with a cost of O(|E| log |E|).Thus, we begin with a forest where each component is a single vertex, i.e. all components are initially born at 0. We proceed as in the case of standard 0-dimensional persistence, adding edges incrementally.As no components are added, we are only concerned with merges, the problem is reduced to updating the birth times as we proceed by keeping track of "active" components (i.e., larger than k).We omit points in the persistence diagram which are on the diagonal (birth=death), but these can be included with some additional book-keeping.
Assume we are adding the edge e = (u, v).If e is internal to a connected component (i.e., ROOT(u) = ROOT(v)), then it does not affect the 0-persistence.Otherwise, it connects two components denoted C u , C v .There are a few cases to consider: The merged component is too small to affect the persistence diagram.We only perform a merge of the components.The components are again merged.We note that for any v, The full procedure is given in Algorithm 1.Note that we only compute the filtration for the vertices, as the correct edge values can then be computed by Equation 3.2.
Algorithm 1 One-pass Algorithm if if SIZE(u) < k then 10: end for Proof of Correctness.We first argue that the function τ is correctly computed.This follows directly from the fact that the algorithm explicitly tests when the component contains at least k vertices.The fact that the persistence diagram is correctly computed is a consequence of the following result.
Lemma 4.2.The minimum spanning tree for k = 1 is a minimum spanning tree for any k.
Proof.The key observation is that until a component contains k vertices, any spanning tree is a minimum spanning tree, as all the edges will be assigned the value when the component becomes active.
The remaining edges do not have their values changed and so remain in the MST.
The equivalence of the MST and the persistence diagram [21] then implies correctness of the algorithm.
Proof of Running Time.The analysis of the merging is covered verbatim from the standard analysis of the union-find data structure.As described above, the update to the size of the component and updating the list of children in the merge are O(1) operations.All that remains is to prove is the cost of updating the function τ .We observe that each vertex is only updated once.This therefore has a total cost of O(|V |), and the edges can be updated at a cost of O(1) per edge (however, there is no practical need for that).This implies the overall running time is Extracting the Clusters.To obtain clusters, we can use the algorithm in [6].This algorithm extracts the ℓ-most persistent clusters by performing merges only when the resulting persistence is less than a threshold.This threshold can be chosen such that there are only ℓ points above the threshold in the diagram.Finally, we note that the cluster extraction can be done on the MST rather than the full graph.
5 Experiments and Applications

Simulated point-clouds
We start by generating point-clouds from a mixture of Gaussians, resulting in several blobs of points (Figure 2).We first show the effect of the parameter k on the filtration function and the corresponding persistence diagrams.For the two point-clouds in Figure 2, we show the resulting persistence diagrams for the k-cluster filtrations in Figure 3. Notice that the correct number of persistent clusters is evident, especially for k = 10, 20, and 50.An important phenomenon that is evident in the figures is that higher values of k filter out more of the 'noise'.
To place the behaviour of the persistence diagrams into further context, we compare the k-cluster filtration with a related construction from the applied topology literature, which has been suggested for dealing with outliers in clustering (and in higher homological dimensions) -the k-degree Vietoris-Rips filtration [14].Given a weighted graph G = (V, E, W ), we define the k-degree filtration, denoted   and increasing the edge weight (commonly, Euclidean distance).In this paper, we do not explore the multi-parameter setting.Rather, we focus the properties of the persistence diagrams for a fixed k.We make two observations before investigating the differences: 1.The k-degree filtration function is determined completely by the local neighborhood of a vertex (i.e., its immediate neighbors in the graph).The same is not true for the k-cluster filtration.
2. For a fixed value of k we have τ k (v) ≤ δ k−1 (v) for all v ∈ V .In other words, the value of k-cluster function is less than or equal to than the value of the (k − 1)-degree function.This follows from the fact that if a vertex has k − 1 neighbors, then it is part of a cluster of at least k vertices.
In Figure 4, we show the relative persistence diagrams for two non-convex clusters for both the kdegree and k-cluster filtrations, for different values of k.In this example, especially for larger k, the persistent clusters are much more prominent in the k-cluster filtration compared to the k-degree filtration.This may be explained by the fact that a much larger radius is needed to obtain the required number of neighbors.In Figure 5, we show the same comparison for relative persistence diagrams for 3 and 4 blobs, where the difference between the two methods is less clear.However, Figure 6 highlights an additional difference in the behaviors of the two filtrations.In this figure, we compare the persistence (death/birth) for the second most persistent cluster, for a wide range of k values.In the left and center plots, the second most persistent cluster corresponds to a true cluster in the data.We observe that the persistence value decays much more slowly for the k-cluster filtration, i.e. the true cluster remains more persistent for increasing values of k.The plot on the right presents the same comparison, but for uniformly distributed random points.In this case, the second most persistent cluster is by construction noise (i.e., not a real cluster in the data).Here although the k-cluster filtration decays more slowly, it is comparable to the k-filtration.Hence we can conclude that persistent clusters show a more stable behavior over ranges of k for the k-cluster filtration compared to the k-degree filtration.

Universality
In Our results in [3] are divided into two main parts.Given a point cloud of size n, we compute the persistence diagram for either the Čech or the Vietoris-Rips filtrations.In weak universality we consider the empirical measure and we conjecture that for iid samples, we have where d is the dimension of the point-cloud, k is the degree of homology, and T is the filtration type (i.e., Čech or Vietoris-Rips).In other words, the limiting distribution for the π-values depends on d, k, T but is independent of probability distribution generating the point-cloud.
In strong universality we present a much more powerful and surprising conjecture.Here, we define ℓ(p) := A log log(π(p)) + B (the values of A and B are speficied in [3]), and the empirical measure Our conjecture is that for wide class of random point-clouds (including non-iid and real-data), we have where L * is a unique universal limit.Furthermore, we conjecture that L * might be the left-skewed Gumbel distribution.
Originally, the results in [3] are irrelevant for the 0-th persistence diagram of random point-clouds, as the birth times are all zeros.However, once we replace the standard filtration with the k-cluster filtration, we have a new persistence diagrams with non-trivial birth time that we can study.In Figure 7 we demonstrate both weak and strong universality properties for the k-cluster persistent homology.
We generated iid point-clouds across different dimensions, with different distributions (uniform in a box, exponential, normal).The results show that both weak and strong universality hold in these cases as well.We note that for weak universality, the limiting distribution depends on both d (dimension of point-cloud) and k (minimum cluster size).

Clustering
As mentioned in the introduction, a key motivation for this work was to apply the k-cluster filtration to clustering.To obtain a clustering from a 0-dimensional persistence diagram, we use the algorithm proposed in [6].Roughly speaking, given a threshold α, it extracts all clusters which are more than α-persistent.We note that the original measure for persistence in [6] was given by d − b, however the change to use d/b in the algorithm is trivial.

Statistical
Testing.An important consequence of the universality results in Section 5.2 is that the limiting distribution (after normalization) appears to be a known distribution, i.e. left-skewed Gumbel.
We can thus perform statistical testing on the number of clusters as in [3].The null-hypothesis denoted by H (i) 0 , is that the i-th most persistent cluster is due to noise.Assuming the universality conjectures hold, the null hypothesis is given in terms of the ℓ-values as where p i represents the i-th most persistent cluster in terms of death/birth.The corresponding p-value is given by p Note that since we are testing sorted values, we must use a multiple hypothesis testing correction.In the experiments we describe below, we use the Bonferroni correction.
In Figure 8, we compared the k-cluster filtration and the k-degree filtration using persistence based clustering from [6] with other common algorithms for clustering.For the other approaches, we used   the standard implementations found in [17], which have associated techniques for choosing the number of clusters.In the cases of the k-cluster filtration and the k-degree filtration, the number of clusters was chosen using the statistical testing described above.Note that since the number of points in the standard examples was quite small, we limited k to 5 and 10.The best result is for the k-cluster filtration with k = 10 (k = 5 fails to identify one of the clusters in the third example).The k-degree filtration performs well but the additional "noise" points in the diagram, mean that some clusters are not identified as significant.
Clustering on Trees.As a second example, we describe clustering on weighted trees.We generated a uniform random tree on n vertices, and assigned uniformly distributed random weights on the edges (between 0 and 1).We show an example in Figure 9.The methods seems to capture certain structure about the tree, although we leave further investigation of this structure as future work.
Note that in the tree case, it is often impossible to use k-degree filtrations, as the tree will have vertices with degree smaller than k that will never be included in the filtration, whereas for the k-clustering filtration, all nodes are included as long as the underlying graph is connected (or all components have at least k vertices).We note that it is possible to use an alternative definition for the k-degree filtrations, by embedding the tree into a metric space (i.e., using the graph metric induced by the weights).However, this is similar to studying a complete graph induced by the metric which is somewhat different than studying the graph directly.We use this method in the rightmost plot of Figure 9.

Probabilistic Analysis
In this section we wish to revisit some of the fundamental results known for the (persistent) homology of random graphs and simplicial complexes, and show that analogous statements hold for our new k-cluster filtration.We provide here the main statements.Proofs are available in the appendix.

Connectivity
We will consider two models here.In the G(n, p) random graph we have n vertices, and each edge is placed independently with probability p.In the G(n, r) random geometric graph, we take a homogeneous Poisson process P n on the d-dimensional flat torus, with rate n.Edges are then placed between vertices that are less than distance r apart.In both models, connectivity results are tied to the expected degree.For the G(n, p) model we define Λ = np, and for the G(n, r) we take Λ = nω d r d .Then in [9] and [19] the following was proved.Theorem 6.1.Let G n be either G(n, p) or G(n, r).Then A key element in proving connectivity (for either models) is to show that around Λ = log n, the random graph consists of a single giant component, a few isolated vertices, and nothing else.Thus, connectivity is achieved when the last isolated vertices gets connected.
Our goal in this section is to analyze connectivity in the G(n, p) and G(n, r) model, via our new k-cluster filtration.Note that for a fixed n, we can view both models as filtrations over the complete graph.For the G(n, p) model the weights of the edges, are independent random variables, uniformly distributed in [0, 1].For the G(n, r) the weight of an edge is given by the distance between the corresponding points in the torus.We define G (k) (n, p) and G (k) (n, r) to be the random filtrations generated by changing the filtration function to be τ k .Our goal here is to explore the phase transition for the kcluster connectivity.As opposed to connectivity in the original random graphs, the results here differ between the models.Theorem 6.2.For the G (k) (n, p) filtered graph we have, For the G (k) (n, r) model, proving the connectivity is a much more challenging task and beyond the scope of this paper.The following statement, however, is relatively straightforward to prove.Proposition 6.3.Let N k = N k (n, r) be the number of connected components of size k in G(n, r). Then, for any w From this lemma we conclude that when Λ = log n − (d − 1)(k − 1) log log n − w(n) the graph G(n, r) has components of size k, which implies that G (k) (n, r) is not connected.On the other hand, when Λ = log n − (d − 1)(k − 1) log log n + w(n), we have N j = 0 for all fixed j ≥ k.Which indicates that G (k) (n, r) should be connected.This leads to the following conjecture.
Conjecture 6.4.For the G (k) (n, r) filtered graph we have, Note that both phase transitions occur before the ones for the original graph models.This is due to the fact that for k > 1 the k-cluster filtration does not allow having any isolated vertices.Also note that taking k = 1 both results coincide with Theorem 6.1.

Limiting Persistence Diagrams
In [12], it was shown that for stationary point processes, persistence diagrams have a non-random limit (in the vague convergences of measures).A similar statement will hold for the k-cluster persistence diagrams.
Let Dgm (k) (P) be the k-cluster persistence diagram for a point-cloud P. We define the discrete measure on R 2 , The following is an analogue of Theorem 1.5 in [12].
Theorem 6.5.Assume that P is a stationary point process in R d with all finite moments.For any k, there exists a deterministic measure µ k , such that , where the limit is in the sense of vague convergence.Furthermore, if P is ergodic, then almost surely For 4k ≤ j ≤ n/2, we have To conclude, we showed that This implies that for Λ = 1 k (log n + (k − 1) log log n) + c, we have P G (k) (n, p) is connected ≈ P (N k > 0) .
Similar estimates to the ones above, show that Therefore, when c = w(n) → ∞, we have P(N k > 0) → 0. Together with a second-order argument, we can similarly show that when c = −w(n), we have P (N k > 0) → 1.This concludes the proof.
Using a second moment argument will show that for all ϵ > 0 completing the proof.

C Limiting Persistence Diagram
The key part of the proof in [12], is bounding the add-one cost of the persistent Betti numbers.Let G = (V, E, W ) be a weighted graph, and {G t } be the corresponding k-cluster filtration.Define β r,s 0 (G (k) ) as the 0-th persistent Betti number, i.e., the number of components born in t ∈ [0, r] and die at (s, ∞] (for a formal definition, see [12]).Fix an edge e 0 ̸ ∈ E, with a given weight W (e 0 ) = w 0 , and let G = (V, Ẽ, W ) be a weighted graph with Ẽ = E ∪ {e 0 }, and Let { G(k) t } denote the corresponding k-cluster filtration.The entire proof Theorem 6.5 follows verbatim from the proofs in [12], provided that we prove the following lemma.In other words, if we add a single edge to the filtration, the number of persistent clusters can change by at most 1.Note that the proof here is not a straightforward application of Lemma 2.10 in [12], since in our case, when a single edge is added to the filtration, the filtration values of other vertices and edges might be affected.

Figure 1 :
Figure 1: An example of a graph filtration on a line-graph.The filtration value of the vertices are given by τ (the y-axis).The filtration value of each edge is taken as the highest value between its plotted endpoints.The bars in the middle represent the tracking of the components.The vertices which are local minima, i.e. a, c, f , and i, generate new components and so τ (a), τ (c), τ (f ), and τ (i) correspond to birth times.The first merge occurs at τ (b) = τ ((a, b)) = τ ((b, c)) merging {a} with {c, d}.In this case we declare the latter as dead since τ (a) < τ (c).Next, at τ ((d, e)), the components {a, b, c, d} and {e, f } are merged, and the latter dies.Finally, at τ ((g, h)), the components {a, b, c, d, e, f, g} and {h, i} are merged, killing the former.The component containing i has the earliest birth time, and thus is declared infinite.

2 .
|C u ∪ C v | ≥ k and |C u | < k: In this case, C u becomes active.Thus, we merge the components, and update the value of τ for all vertices in C u .τ (x) ← W (e) ∀x ∈ C u is performed.We take similar action if |C v | < k (or both are less than k).

Figure 2 :
Figure 2: Two examples point-clouds consisting of an i.i.d.sampling from a mixture of three and four Gaussians and consisting of 1000 and 2000 points respectively.

Figure 3 :
Figure 3: The persistence diagrams with death/birth on the y-axis with different choices of k for the points sampled from the two mixtures of Gaussians (top row) 3 blobs (bottom row) 4 blobs.Note that the number of outstanding features in the diagrams correspond to the number of clusters in the data.

Figure 4 :
Figure 4: A comparison of the k-cluster and k-degree for the two moons data set.On the right we have the death/birth ratios for different values of k.

Figure 5 :
Figure 5: (top row) 3 blobs, (bottom row) 4 blobs.The relative persistence diagrams for each point cloud, with the k-degree filtration in yellow and the k-cluster filtration in blue for k = 5, 10, 20, and 50.

[ 3 ]
, we published a comprehensive experimental work, showing that the distribution of persistence values is universal.We consider a persistence diagram as a finite collection of points in R 2 , dgm = {(b 1 , d 1 ), . . ., (b M , d M )}.For each point p i = (b i , d i ) we consider the multiplicative persistence value π(p i ) = d i /b i .Our goal is to study the distribution of the π-values across an entire diagram.

Figure 6 :
Figure 6: The effect on the second most persistent cluster for different values of k.On the left and center, this corresponds to a true cluster (left -two moons and center -mixture of 3 Gaussians).On the right -uniform random points.Here the noise cluster drops nearly as quickly in both cases.

Figure 7 :
Figure 7: Universal distribution for k-cluster persistence.The labels in the legend are structured as distribution/d/k, where d is the point-cloud dimension, and k is the cluster size.The distributions taken are uniform in a unit box, exponential, and normal.The first two plots show that weak universality holds, and that the limit depends on d, k, but not on the distribution.The rightmost plot, demonstrates that strong universality holds under a proper normalization.We also included the left-skewed Gumbel distribution (dashed line) for comparison.

Figure 8 :
Figure 8: A comparison of standard clustering examples for different clustering approaches.In the case of the k-cluster filtration (PD) and k-degree filtration (Deg), the number of clusters was chosen using statistical significance testing.

Figure 9 :
Figure 9: A clustering on a uniform random tree.The threshold with k-clustering gives 4 clusters while only 3 with the (metric) k-degree.

Proof.
Let e 0 = (u, v) with W (e 0 ) = w 0 .Let C u and C v denote the components of the end points of e 0 at w 0 in the original filtration {G (k) t }.There are three possible cases which can occur.Case I: Both |C u | < k and |C v | < k.Note that in this case τ k (u), τ k (v) > w 0 .Let C ′ u be the cluster of u at τ k (u), so that it is the component of u when it first appears in G (k) t .Similarly define C ′ v .Note that aside from C ′ u ∪ C ′ v the filtration value of all other vertices remains unchanged by adding e 0 .