1. Introduction
Topological data analysis (TDA) has gathered significant interest from a wide range of researchers because of its novel approach and use of classical tools from algebraic topology for extracting descriptive features from data. Succinctly, topological data analysis captures and records the persistence [
2] of algebraically computable topological signatures, and regards it as a measure of significance for different features embedded in the structure of data. For the zero dimensional case, these signatures correspond to clusters within data that merge based on a filtration of the data points. One of the most common filtration used in practice is the
Rips filtration where pairs of points are considered merged at a given filtration slice
when the points are at most
apart. Hence, as opposed to other filtrations that require additional parameter choices, the Rips filtration only depends on intrinsic distances between data points and reveals the underlying multi-scale connectivity information about natural clusters existing within data. The Rips filtration produces summaries of topological signatures all beginning at the start of the filtration, capturing cluster merging dynamics akin to that observed by hierarchical clustering methods (see
Figure 1). This is the setting we will be working on.
Meriting the growing popularity for this approach, and central to its relevance and viability in interrogating real-world data, is its stability under slight perturbations—small discrepancies between measurements within data lead to small differences in the recorded persistence of features. This cornerstone stability result [
3] relies on classic bottleneck matchings to evaluate, measure, and bound changes between two records of feature persistence. These records, called
persistence diagrams, are a collection of points in the extended plane where the coordinates represent the birth and death times of the recorded features. In these diagrams, points that have multiplicity capture distinct features with the same birth-death profile, and points with infinite persistence capture perpetual features. For diagrams induced by the Rips filtration, the sole constant perpetual feature appears in dimension 0, capturing the eventual single cluster that merges all components (see
Figure 1).
Given two persistence diagrams
X and
Y, the
bottleneck distance between them is defined as
where the infimum is taken over all bijections
is the diagonal. In general terms, the bottleneck distance measures the cost to transform one diagram to another. The first, and for a long time the only, publicly available implementation of the bottleneck distance for persistence diagrams is in the library
Dionysus, released in 2010, by Morozov [
4]. This implementation uses a variant of the Hungarian algorithm [
5] for the assignment problem.
Understandably, because of the overwhelming matching step in the computation, this first implementation of the bottleneck distance between two persistence diagrams was considerably slow by practical standards. Consequently, while the theoretical side of topological data analysis has made extensive use of the bottleneck distance for advancing the theory [
8], first computational uses have been few and sparingly far between. Some notable examples include applications to classification of hepatic lesions [
9], and analysis of time-series data [
10] and simulated hippocampal networks [
11]. Most applications of TDA, instead, tap into persistence-based topological features via another class of objects, called
persistence landscapes [
12], that record the persistence of features as a function, thus affording access to desirable properties of the underlying function space. A major motivation for this detour to landscapes is the ability to generate topological summaries that are compatible with classical tools in statistics, and even machine learning.
In 2017, Morozov et al. [
13] provided an improved implementation of the bottleneck distance in the library
Hera by exploiting geometry. Their approach follows closely the work of Efrat et al. [
14]. For the sets
of orthogonal projections on the diagonal
of points respectively from
X and
Y, and the sets
, they consider the weighted complete bipartite graph
is given by
With this, the bottleneck computation problem can be recast in the following manner: if
is the subgraph of
G with all edges
e of weight
, then the bottleneck distance of
G is the minimal value
r such that
contains a perfect matching. Hence the bottleneck distance can be recovered by combining a binary search on the edge weights of
G with a test for a perfect matching. For the matching step, they augment the Hopcroft-Karp algorithm [
15] by appealing to a near-neighbor data structure (a k-d tree) to search for the best candidate pair for a query point, pruning from the search the subtrees (and hence all other candidates within them) whose enclosing box is further away from the query than the current best candidate. This circumvents the overwhelming matching problem by significantly shrinking down the combination pool to retrieve the best matching. To approximate complexity, they fit curves of the form
and found a best fit with
. This translates to speed-up from
Dionysus already by a factor of 400 on diagrams with 2800 points, and opened opportunities for several works that examine larger [
16] or more complex [
18] data sets.
We take inspiration from this idea of exploiting the geometry of persistence diagrams to extract computational speed-up. By considering dimension 0 persistence diagrams induced from the Rips filtration, we can approach the problem via a different framework, birthing a new efficient algorithm for computing the bottleneck distance. The key idea is to begin with a specific initial bijection that one can methodically modify to optimize the norm between matched points. This process allows us to identify all possible instances where the bottleneck matching is achieved, and the exact value for the bottleneck distance, significantly bypassing the overwhelming matching step in previous implementations. We remark that while this strategy only works for persistence diagrams of a specific kind—those whose detected signatures all begin at the same time—this class is in no way less significant than diagrams induced from other filtrations. Moreover, in addition to diagrams induced from the above setting, this class also includes diagrams obtained from the output of any hierarchical clustering algorithm applied to point cloud data. Hence, the computational speed-up for the bottleneck distance we obtain benefits the comparison of these diagrams as well. Furthermore, we note that there are other metrics used in the literature to compare persistence diagrams, and we make no preference claim in favor of the bottleneck distance. In fact, it is a good question to ask whether the above strategy can be followed to generate computational speed-up for these metrics as well (We credit Katharine Turner for raising this question first in relation to the Wasserstein distance.).
We name this algorithm Lumáwig. Lumáwig is significantly faster than the state-of-the-art and provides significantly sharper approximates with respect to the output of the original algorithm than any other available algorithm. We benchmark Lumáwig against all available algorithms in terms of running time and accuracy.
Our motivation for this work is to clear the computational obstruction in the use of bottleneck distance in applications. In the Filipino language, Lumáwig also means to extend, broaden, or expand. Our hope is that this contribution will serve as a catalyst in the further development of the theory that leverages persistence diagrams and the bottleneck distance similar to what has been achieved for persistence landscapes, and will usher in a new era of integrating TDA into the science of big data. As a proof of concept, we use Lumáwig to generate features for the classification of digit images from the MNIST data set.
2. Bypassing Matchings
We propose to bypass the overwhelming matching problem in the computation of 0-dimensional bottleneck distance by showing that the value produced by the bottleneck distance formula can be recovered by considering only a few cases. We will show that these cases naturally come up in the process of minimizing the output of the norm.
We first note that for most practical applications to data analysis of 0-dimensional persistence diagrams, where all components are assumed to be born at the beginning of the filtration for persistent homology, all non-trivial points lie in the vertical axis (or equivalently for persistence barcodes, all bars begin at time
). Hence, in this case, if
, and
are the death times respectively for
x and its matched point
, we have that
This suggests that while it is natural to do a point-to-point matching between diagrams, there are cases when we are better off matching a point to the diagonal. For a point
, this happens precisely when
Figure 2a illustrates this point. Therefore, unless (
2) is satisfied, it is our priority to match a non-trivial point in a diagram
X with a non-trivial point in another diagram. This supports the interpretation that the bottleneck distance is the cost of transforming one diagram to another.
We are now ready to present our algorithm for computing 0-dimensional bottleneck distance between two persistence diagrams. We first induce and ordering of the death times in both diagrams and define a bijection that we can methodically modify to optimize the norm between matched points and recover the desired matching that achieves the bottleneck distance. The proof of Lemma 1 provides the basic argument that allows us to bypass the overwhelming matching problem. Lemma 2 proceeds in the same manner and identifies all other possible instances where the bottleneck matching is achieved, and the exact bottleneck distance in each case.
X and
Y be two 0-dimensional persistence diagrams whose death time entries are arranged from largest to smallest. Equivalently,
X and
Y can be thought of as persistence barcodes whose bars are arranged from longest to shortest. Without loss of generality, assume that
X has at most as many points as
Y has. We remark that this pre-processing is equivalent to considering the bijection
that matches points between
X and
Y according to the relative ranking of death times from largest to smallest, and where unmatched points in
Y are matched to the diagonal. Let
and define
Lemma 1. Let X, Y, Z, N and ϕ be defined as above. If and , thenwhere is the largest death time of a point in Y matched to the diagonal. Proof. For the bijection
corresponding to the pre-processing described above, it follows that
Figure 2b illustrates this case where the point matched to the diagonal maximizes the norm. To see why
achieves the infimum over all bijections between
X and
Y, note that any other bijection
produces a death time for a point in
Y matched to the diagonal that is at least as large as
. Therefore
. □
Lemma 2. Let X, Y, Z, N, l and ϕ be defined as above, and let ζ be the second largest entry of Z.
- 1.
If , then
- 2.
If , then
- 3.
If and for every m such that , then
- 4.
If and there exists such that , then there exists a bijection τ between X and Y such that one of the three preceding cases holds and where
Proof. - 1.
It follows from our remark immediately after (
1) that
is the bijection that matches both
to the diagonal, and coincides with
Figure 2c illustrates this comparison between the two matchings. For any other bijection
, if
such that
is maximum among all non-trivial matchings, either
, or
. If
, then a similar argument as that in Lemma 1 holds. The conclusion now follows.
- 2.
In this case, the same bijection
in the previous case yields
The same argument in the previous case holds for any other bijection . Hence, the inequality above implies the conclusion.
- 3.
For the bijection
that sends
to the diagonal for all such
m, and coincides with
otherwise (see
Figure 2d), we have that
Again, since the same argument in the first case holds for any other bijection , the previous inequality implies the conclusion.
- 4.
Define the bijection
that sends
to the diagonal for all
, and coincides with
otherwise. Then we have that
Moreover, note that depends only on for non-trivially matched x and . Therefore, we can consider only the subsets and respectively of X and Y whose points are non-trivially matched by . In this case, and one of the three previous cases above holds.
The proof is now complete. □
The two Lemmas above provide the theoretical basis for the bypass approach of the
Lumáwig algorithm. Together, they take advantage of the specific form of dimension zero persistence diagrams being considered, and the methodical approach to optimize norms induced by a specific matching. The complete pseudo code is given in Algorithm 1 below.
Algorithm 1Lumáwig algorithm for computing 0-dimensional bottleneck distance between two persistence diagrams |
1: Input: Two dimension zero persistence diagrams X and Y such that and where X has fewer than or as many points as Y. |
2: Output: The bottleneck distance between X and Y. |
3: Initialization , death times of points from X sorted from largest to smallest, death times of points from Y sorted from largest to smallest, , vector , , |
4: if and then |
5: ; |
6: else |
7: while do |
8: if then |
9: |
10: |
11: else if then |
12: if For every m for which , then |
13: |
14: |
15: else |
16: Trim off all for ; update l and |
17: if then |
18: |
19: |
20: end if |
21: end if |
22: else |
23: |
24: |
25: end if |
26: end while |
27: end if |
4. Lumáwig in Digit Classification
With new access to a fast algorithm for computing dimension zero bottleneck distance, we leverage persistence and other clustering-based diagrams to craft features for digit classification. This application illustrates how the significant computational speed-up for the dimension 0 bottleneck distance affords a way to examine intrinsic differences in the multi-scale clustering dynamics of point clouds from the perspective of persistent homology and in conjuction with hierarchical clustering algorithms. In addition, we will show that information captured by dimension 0 bottleneck distance can be a source of a good feature base for point cloud classification.
We classify 10,000 28 × 28-pixel digit images in the MNIST data set via a random forest classifier. Similar to Garin and Tauzin [
16], we train the classifier using features based on topological summaries. However, we depart from Garin and Tauzin’s approach in that we only extract features from dimension zero persistence diagrams and other related clustering-based diagrams. In particular, we craft statistical summaries from distributions of bottleneck distances computed from diagrams resulting from dimension zero persistent homology and clustering of multiple sub-collections of points. It is in this light that the eventual classifier performance must be viewed—the classification of digits, viewed as point clouds with higher dimensional characteristics, is done using lower dimensional clustering information. We summarize this procedure next. For a detailed account of this procedure, we point the interested reader to [
21] where it is used to recover higher dimensional shape information of digits from intrinsic clustering behavior.
The first step in the procedure is to generate multiple collections of points from the digits via samples extracted based on point distributions referenced from nine pre-selected landmark points. We use the same landmark points introduced in [
16]. Sampling is also done across multiple resolutions by varying the number of points selected in every bin of every distribution histogram. Then, for each sampled sub-collection of points in each sampling resolution, persistent homology and clustering algorithms are respectively used to generate persistence and clustering diagrams. We gather diagrams by their sampling setting and compute pairwise bottleneck distances using
Lumáwig. Finally, we compute statistical summaries from the distributions of computed bottleneck distances, and use these to train a random forest classifier with 1000 trees.
We perform a 10-fold cross validation on our training set of 10,000 digit images from MNIST, and report the summary of obtained
scores in
Table 2. The average class predictions of the random forest are summarized in the confusion matrix in
Figure 9.
The results above show that the random forest classifier is able to use our crafted bottleneck-based features to classify, at a respectable level of accuracy, the 10 digits in the MNIST data set despite all digits possessing the same dimension zero topological signature of having only one connected component. In particular, we infer from the exceptionally high score on the classification of the simplest digit 1, that differences captured by the bottleneck distance in the clustering behavior across multiple point samples of this digit is outstandingly subtle, and hence different, from the rest.
5. Discussions and Conclusions
Our benchmarking experiments reveal that Lumáwig outperforms, by several orders of magnitude, all currently available implementations of dimension zero bottleneck distance in terms of running time. Lumáwig also recovers the exact bottleneck distance produced by Dionysus. We believe this is a significant contribution as it affords a viable tool to process and use dimension zero persistence diagrams in comparing evolving connectivity information between large data sets in a manner that goes beyond the simple use of the most persistent components. Even now, a truly comprehensive and holistic treatment of information embedded in dimension zero persistence diagrams has been left unexplored due primarily to the lack of feasible machinery that can handle significant scaling up in data size. In fact, this note presents the first instance that the bottleneck distance is used in practice for data of magnitude and scale in the order of up to a million. In particular, we see that Lumáwig only takes an average of 2 to 3 tenths of a second to compute the bottleneck distance between diagrams each with one million points.
A natural question to ask is whether a similar strategy of methodically modifying a specific initial bijection to recover all possible cases that yield the best matching for the general case, where birth times of features need not be at the beginning of the filtration (this covers the bottleneck distance for higher dimensional features) is possible. We note that an important first step is to induce an appropriate partial order on the points in each diagram that can accommodate a case-exhaustive approach to optimize the norm. Moreover, the added degree of freedom will naturally introduce cases we have not considered in our optimization step. We are currently exploring generalization strategies that leverage shift-invariant versions of the bottleneck distance due to Cavanna et al. [
Our empirical tests suggest that
Lumáwig enjoys linear complexity for the case where both diagrams have equal number of points. Moreover, we also see that even for the special case revealed by
Figure 7, where there is an apparent slowdown in computational time, the trend seen when data size scales up is also practically linear (see
Figure 8c,d). In a future note, we plan to provide a more comprehensive analysis for complexity. Nevertheless, we are confident that
Lumáwig can be useful in practical applications of TDA at this stage.
Finally, our application on digit classification showcases, in the same significant way as Weber et al. did in [
17], the potential in leveraging persistence diagrams and bottleneck distance as sources of novel features for machine learning tasks. It is our hope that
Lumáwig contributes in paving the way for this direction in TDA research.