Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes

Singh, Rohit P.; Malott, Nicholas O.; Rafeek, Raihan; Wilsey, Philip A.

doi:10.3390/math14081339

Open AccessArticle

Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes

¹

Department of Electrical and Computer Engineering, University of Cincinnati, Cincinnati, OH 45221-0030, USA

²

Convergent Analytics, Cincinnati, OH 45217, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(8), 1339; https://doi.org/10.3390/math14081339

Submission received: 9 March 2026 / Revised: 6 April 2026 / Accepted: 10 April 2026 / Published: 16 April 2026

Download

Browse Figures

Versions Notes

Abstract

Persistent Homology (PH) is a method of Topological Data Analysis that characterizes the topological structure of a space. Unfortunately, the computation of PH for high-dimensional and big data is not possible due to the exponential growth of the constructed complex. Fortunately, sparsification techniques can substantially reduce the size of the complex. This paper examines a sparsification technique (

β

-Sparsification) that produces a complex reduction capability that is scalable to a user-specified value

β

. At

β = 0

this scaling generates complexes that can have the same 1-Skeleton as the Vietoris–Rips complex;

β = 1

produces a Delaunay complex, and other values of

β

produce a range of (unnamed) complexes. Experiments with

β

-Sparsification reveal that the topology of the sparsified simplicial complex is preserved for

0 \leq β \leq 1

; for

β > 1

, the complex begins to lose (potentially insignificant) topological features.

Keywords:

d-uniform hypergraphs; simplicial complex; sparsification; persistent homology; data mining

MSC:

55-04; 55N31; 55U05; 62R40

1. Introduction

Topological Data Analysis (TDA), and specifically Persistent Homology (PH), is a well studied discipline of data science [1,2,3]. PH produces a topological summary of a point cloud by characterizing the lifetimes of connected components, loops, voids, and other higher-dimensional features present in the data [1,4,5,6].

Recent introductions of TDA (specifically PH) for Machine Learning (ML) algorithms have been shown to increase the accuracy of ML models [7,8]. PH provides valuable additions to analyze data when used in conjunction with traditional data science approaches for hypothesis testing [9,10], Bayesian statistics [11], and persistence homology transforms [12,13]. Several studies use PH to analyze genomics data to infer important biological processes. For example, cell differentiation trajectories and gene expression aberrations in tumors and cancer cohorts exhibit favorable results [14,15,16,17,18,19].

An increase in the popularity of PH has motivated researchers to address computational issues for evaluating the PH of large point clouds. The computation of PH requires the construction of a filtered simplicial complex (PH can be computed on complexes other than simplicial complexes; however, the focus of this work is simplicial complexes. In the remainder of this paper, the term complex should be understood to denote a simplicial complex unless otherwise specified.) to characterize the topological features of a point cloud. Unfortunately, the size of the simplicial complex grows exponentially and is hyper-sensitive to the dimension of homology to be computed. (The dimension of homology (

H_{d}

) refers to the maximum dimension of topological features to be characterized in the point cloud; in particular,

H_{0}

characterizes connected components,

H_{1}

characterizes loops,

H_{2}

characterizes voids, and so on.) In fact, the space complexity is such that even with only a few hundred points in higher dimensions, the simplicial complex can easily exceed the memory limits of a well-equipped computer. The inability to compute PH on moderately sized point clouds in higher dimensions has restricted most of the work in this field to homologies below

H_{2}

; only in rare cases are higher-dimensional features examined.

This paper develops and examines the use of a d-uniform hypergraph-based complex sparsification technique inspired by the graph-based

β

-Skeleton [20,21] to reduce the size of a simplicial complex. Sparsification is a complex reduction technique that works to preserve the significant topological features in the complex while reducing the overall complex size. While there are several techniques available for sparsification [22,23,24], the d-uniform hypergraph-based β-Sparsification presented here has the advantage that it provides a family of sparsified simplicial complexes that increase the scale of the reduction based on the value of

β \geq 0

.

β

-Sparsification reduces the memory footprint of the simplicial complex in a controlled manner that is useful to higher-dimensional studies. The proposed method takes significantly less memory as compared to VR-complex; however, the technique takes more time to finish for large datasets. Current, graph-based

β

-Skeletons in

R^{d}

lose significance for higher-dimensional topological features. To develop

β

-Sparsification in higher dimensions, this paper introduces generalized d-uniform hypergraph-based

β

-Skeletons in

R^{d}

for

d \geq 2

. This generalization is defined in terms of d-hyperedges (d-simplices in

R^{d}

). The new technique enumerates all possible combinations of high-dimensional simplices to determine their validity as part of the high-dimensional sparse hypergraph. Hypergraphs and their associated simplicial complexes are discussed in [25,26,27,28,29].

The impact that

β

-Sparsification has on the topological structure of the data is experimentally assessed in Section 5 of this paper. The results suggest that for

0 \leq β \leq 1

the topology remains largely unaffected; for values of

β > 1

, the results suggest caution as the complex begins to lose some topological features. Fortunately there is only a gradual decay in accuracy with increasing

β

. Thus,

β

-Sparsification for

β > 1

may still be useful for big data analysis when memory vs. accuracy is an acceptable trade off.

The remainder of this section discusses some widely used simplicial complexes. The complexity of PH is directly dependent on the number of simplices in the complex; complex size can vary based on the complex type constructed (e.g., Vietoris–Rips, Čech, and Alpha [5,30]). However, the computational complexity also varies by the complex type; often complex types with smaller memory complexity have higher computational complexity. Thus, it is not always a simple choice to select among them.

The Vietoris–Rips (VR) complex is one of the most widely used constructions for PH. The VR complex at scale

ϵ

is defined as:

V R_{ϵ} (S) = {σ \subseteq S | d (x, y) \leq 2 ϵ \forall x, y \in σ} .

The VR complex is difficult to use for large data sets in higher dimensions due to its memory complexity. To counter these challenges, various optimizations to the VR complex have been proposed [31,32,33,34].

There are several types of simplicial complexes that have sparse constructions and can handle larger point clouds. The witness complex provides the flexibility to control the size of the complex to approximate the topology within memory limits [35,36]. For a point cloud P, a witness complex is constructed using two subsets of P, namely a subset of landmarks

L \in P

and a set of witnesses

W \in P ∖ L

. A smaller subset of points is used in the witness complex, thus producing a smaller overall complex size; the utility of the witness complex is PH approximation on large data sets. Other sparse complexes include the Voronoi complex, Delaunay complex, and Alpha complex, which share several interesting properties [5]. The Delaunay complex,

D (S)

, is constructed from the triangulation of the point cloud and is the dual of the Voronoi complex,

V (S)

. The Alpha complex,

A (S)

, is a sub-complex of the Delaunay complex for filtration values

α_{i} \geq 0

and

α_{i} < α_{i + 1}

such that

A_{α_{0}} \subset A_{α_{1}} \subset A_{α_{2}} \subset \dots A_{α_{n}} \subset D (S)

.

The Alpha complex was first introduced by Edelsbrunner et al. [37], and was explored further with the Delaunay and Voronoi complexes [38,39]. The worst-case maximum number of d-simplices in an Alpha complex for n points in d dimensions is:

(\binom{n - ⌊ \frac{d + 1}{2} ⌋}{n - d}) + (\binom{n - ⌊ \frac{d + 2}{2} ⌋}{n - d}) = O (n^{⌈ \frac{d}{2} ⌉}) .

Due to their smaller size, these complexes support constructions on significantly larger point clouds than possible with a VR complex. However, as the ambient dimension increases, the construction complexity of an Alpha complex becomes problematic.

The remainder of this paper is organized as follows. Section 2 reviews several complex reduction techniques and space complexity of existing constructions of a simplicial complex. Section 3 discusses sparsification-based homology collapse and graph induction methods on the point cloud. Section 4 details the β-Sparsified d-uniform hypergraph-based generation of simplicial complexes. Section 5 presents an experimental evaluation of

β

-Sparsification. Section 6 contains a discussion on planning values for

β

and the possibility of combining a

β

-Sparsified complex with edge collapse. Finally, Section 7 contains some concluding remarks.

2. Background

The number of simplices in a complex grows significantly with the dimension (d). To efficiently scale the construction of a simplicial complex in higher dimensions, techniques to monitor and reduce the growth of simplices become necessary. One of the key techniques for reducing the size of a complex is to generate a sparse representation that also preserves the underlying topology of the space. Sparsification has been explored in the past with the generation of sparsified Alpha complexes, Sparsified Čech and Sparsified VR-complexes. Despite the ability of an Alpha complex to represent larger datasets than a VR-complex, the computational costs of constructing an Alpha complex becomes prohibitive in higher dimensions. The Sparsified Čech and Sparsified VR-complexes have a worst-case space complexity linear to the size of the data [40,41]. This paper presents a sparsification technique with a scalable sparsification factor

β \geq 0

. The

β

-sparsification factor permits the generation of a family of sparsified complexes with members that include the VR-complex (

β = 0

), Alpha complex (

β = 1

), and other complexes located between and beyond these two complex types.

Another technique that aims to reduce the complex size is called Simplicial collapse [42,43]. Simplicial collapse evaluates if a complex is contractable; if so, it is collapsible. Unfortunately, determining the collapse of a simplicial complex is NP-complete [44,45,46]. Moreover, the method requires construction of the original complex, which can defeat the underlying purpose. The simplicial collapse does not offer any intermediate solution if the original complex is not computable or grows in size beyond the available memory. In contrast,

β

-sparsification assists in generating a reduced complex subjected to the sparsification factor

β

. In other words,

β

-sparsification provides a scalable representation for the complex that can fit in a constrained environment.

Simplicial collapse has two variations, namely strong collapse and edge collapse. For a given simplicial complex, both variations reduce the complex by performing repetitive collapse to generate a minimum simplicial core that cannot be further reduced or collapsed. Collapse has a significant impact on the overall boundary matrix utilized for the PH reduction algorithm [47]. Strong collapse reduces the computation of PH by reducing the number of simplices in the reduced simplicial complex, also known as a nerve complex [48,49]. The main advantage of the approach is its repeated collapse of the complex and the number of times to perform this repetitive collapse. As discussed, the major disadvantage of the strong collapse is the requirement to have all of the highest-order simplices in the original complex, which may not be feasible to generate for large data. That is, as discussed above, the number of such higher-order d-simplices increases exponentially with dimension and therefore complicates the use of strong collapse to compute the persistence of higher-order homologies.

To overcome the difficulties of strong collapse, an edge collapse approach was developed [50,51]. Edge collapse operates on the 1-skeleton of the simplicial complex rather than on d-skeleton, thereby significantly reducing the size of the complex used to start the edge collapse process. The approach is a significant development, as the 1-skeleton size for n vertices has

O (n^{2})

edges independent of the dimension. The approach still requires a 1-skeleton as input to preserve the topology of the space. It is still unknown how to generate a topologically significant 1-skeleton other than combinatorial 1-skeleton from VR-complex that preserves the space topology. Considering this limitation, the approach is studied for random simplicial complexes [52]. While strong collapse and edge collapse are intriguing and elegant, the selection of the maximal d-simplices and 1-skeleton for strong and edge collapse has a significant influence on the final outcome.

Another interesting approach to reduce the complex size is a simplicial batch collapse (SimBa) [33]. Batch collapse generates a significantly smaller complex with sparse rips along with strong and batch collapse operations.

Edge contraction is another technique that aims to compute a significantly reduced homotopic complex [53]. Unfortunately, the computation of a reduced homotopic complex is NP-complete and has thus far only been extended to triangles in

R^{2}

[54].

An Alpha complex is a compact complex with a significantly smaller total simplex count compared to the corresponding VR and Čech complexes. An Alpha complex utilizes a Delaunay triangulation [55] to generate a filtered complex based on the Alpha Filtration Value (AFV) used in the construction process [37]. However, as noted above, the Delaunay triangulation must be performed at the highest dimension of the data and its computational cost becomes prohibitive in high dimensions.

A Delaunay triangulation is a well-known computational geometry procedure that generates a planar mesh of

(d + 1)

-simplices for a finite set of fixed points P in

R^{d}

satisfying the empty d-sphere condition [55]. The empty d-sphere condition ensures that the circum-d-sphere of any simplex does not contain any point

p \in P

. This condition matches the

β

-sparsification condition for

β = 1

, thereby providing the possibility of an optimization for

β > 1

discussed in Section 4.2.

3. Related Work

This section reviews some of the previous work on the reduction of simplicial complexes. Graph-induced complexes (GIC) compute a homology on the

δ

-sample of data [56]. The

δ

-sparse sample is computed using the

δ

-cover, which is

δ

-sparse. The GIC approach does not report small features, as it ignores cliques within the

δ

-cover of sampled points. Topology preserving 1-skeletons for high-dimensional data have been studied by Kurlin et al. [57]. The Homological Persistent Skeleton

(H o P e S^{(d)})

is a stable skeleton with theoretical guarantees in higher dimensions [58].

Given the computational and theoretical challenges to design an efficient and scalable algorithm, various methods have been developed to explore how the underlying topology of the point cloud changes under random perturbations [59]. Unfortunately, these randomized approaches cannot guarantee the preservation of the topology of the point cloud. The generation of skeletons with topological guarantees [60] is studied to overcome the problem of homologies vanishing during reduction. Various investigations have examined how the topology of the simplicial complex collapses [61,62,63,64]. Several studies to obtain the 1-skeleton based flag complexes with combinatorics, geometric, and topological guarantees have been proposed [65,66]. Laplacian preserving sparsification for higher-order spectral learning has been studied [24]. Attempts to generate a homology preserving skeleton have been examined in several contexts: topology-based skeletons [67,68,69,70], minimum spanning trees [58], and neighborhood-based skeletons [71]. The Reeb graph is known to preserve the topology of the underlying point cloud under projection [67,68]. The Mapper algorithm utilizes clustering to obtain results similar to the Reeb graph approximation of the point cloud at discrete overlapping intervals [72].

Neighborhood-based skeletons have resulted in an extremely rich family of proximity graphs. Relative neighborhood graphs (RNGs) [73], Gabriel graphs [74,75], and the

β

-Skeleton family of graphs [76] are a few techniques to induce neighborhood relationships. The

β

-Skeleton family of graphs generates a spectrum of graphs for different

β

values which reduces to the RNG for

β = 2

and the Gabriel graph for

β = 1

. Efforts to understand the behavior of

β

-Skeleton graphs have been conducted [21]. As the value of

β

grows, the graph quickly sheds edges and becomes disconnected. Various methods to control the dynamics of the

β

-Skeleton have been proposed to suit different applications [77,78,79,80]. Studies in the past have explored the relation of

β

-Skeleton to NP-hard min-weight triangulation [81,82,83].

This paper develops a

β

-sparsified d-uniform hypergraph-based associated simplicial complex in

R^{n}

. In higher dimensions, the edge-based

β

-Skeleton cannot be used because it may contain unwanted holes, voids, and higher dimensional cavities omnipresently—no matter how small the sparsification factor

β

. Instead of an edge-based pairwise neighborhood defined on the point cloud, we define a d-hyperedge (d-simplex)-based neighborhood in general dimensions; the skeleton obtained in this way is free from anomalies of pairwise neighborhoods. We generate a simplicial complex from the

β

-sparsified d-uniform hypergraph. Another, significant modification in our approach is to assign maximum edge weight of simplices in the filtered sparsified complex generation process. The stability of persistent diagrams under this modification has been studied [84].

4. Overview of the Approach

In general, the computation of PH builds a filtered simplicial complex that captures the spatial relationship of the topological structure represented by the data. Simplices in this sense represent hyper-tetrahedra of the space: a 0-simplex is a point, a 1-simplex is an edge connecting two points, a 2-simplex consists of 3 edges together with the triangular face formed by them, a 3-simplex consists of 6 edges coming together to form 4 triangular faces, forming a tetrahedron, and so on for higher-dimensional simplices. The maximum dimension of d-simplices (d-hyperedges) in the point cloud’s ambient dimension is utilized for the sparsification. The

β

-Sparsification approach examines the complete d-uniform hypergraph of the highest-order simplices,

S_{d}

. These simplices can be quickly enumerated using d-combinations of all points in the data P. Thus,

S_{d}

represents the ordered enumerated sequence of d-simplices as follows:

S_{d} = {σ_{i} \subseteq P : | σ_{i} | = d + 1}

(1)

Constructions of these simplices have a predictable size and can be determined by the binomial coefficient defined as

∥ S_{d} ∥ = (\binom{n}{d + 1})

, where n and d are, respectively, the size and ambient dimension of the data. The sparsification step then reduces the simplex set

S_{d}

using the

β

-Criterion test. Informally, the approach works to remove simplices that do not significantly impact the underlying topological structure; the reduction preserves the underlying topology of the space and provides a reduced 1-skeleton from topologically significant d-simplices. This approach is described below.

4.1. $β$ -Criterion

The

β

-Sparsification of this paper is inspired by (i) the Delaunay criteria of an empty circum-sphere region of a d-simplex and (ii) the work with

β

-Skeleton graphs and their construction [20,21]. While the Delaunay criteria defines (in part) that a valid d-simplex exists such that no other points lie in the fixed circum-sphere region of the d-simplex vertices,

β

-Sparsification defines an exclusion region (where points cannot lie) as a function of scaled circum-spheres based on the geometry of the d-simplex and a sparsification factor

β

. The aim is to geometrically define a scalable

β

-Coverage region that defines a valid simplex. Figure 1 is a depiction of the exclusion region for an example 2-simplex for some values of

β

. At

β = 0

there is no exclusion region (equivalent to a VR-complex); as

β

increases to

β = 1

, the exclusion regions expands and becomes equivalent to the the circum-sphere of the d-simplex (equivalent to the Delaunay exclusion region). Finally, when

β > 1

, the exclusion region expands asymmetrically outward from the facets of the simplex.

Building on the construction methods for

β

-Skeletons [21], the exclusion region of this work is organized into two cases; one for

β \leq 1

([21], Figure 1) and another for

β \geq 1

([21], Figure 2a). The two cases enjoy a smooth transition through the range of

β

and they produce the same coverage region at

β = 1

. A function (

β

-Criterion) that implements the test to determine if a d-simplex satisfies the

β

-coverage test is shown in Algorithm 1. In general, the

β

-Criterion function establishes the geometric regions that must be empty of non-simplex vertices for the simplex to be considered valid.

The

β

-coverage region for

β \leq 1

is evaluated in Algorithm 1 at lines 4–13 and for

β \geq 1

in Algorithm 1 at lines 14–18. The principle differences in these constructions is the geometric structures of the d-simplex used to construct the hyper-spheres that define the exclusion region. The description of these constructions is aided by the two graphics of Figure 2: Figure 2a illustrates the

β

-coverage region for

0 \leq β \leq 1

and Figure 2b illustrates the

β

-coverage region for

β \geq 1

.

Algorithm 1 The

β

-Criterion Check for d-Simplex

Input: $β$ , $σ$ the d-simplex, and P
Output: $b o o l e a n$ ▹ true if simplex passes validity test, false o/w

1:: function $β$ -Criterion( $β, σ, P$ )
2:: $p t s_{\cap} \leftarrow {P}$
3:: $σ_{c c}$ , $σ_{c r} \leftarrow$ circumCenter( $σ$ ), circumRadius( $σ$ )
4:: if $0 \leq β \leq 1$ then
5:: for each facet $f \in σ$ do
6:: $f_{c c}$ , $f_{c r} \leftarrow$ circumCenter(f), circumRadius(f)
7:: $β_{l i n e} \leftarrow$ lineSegment( $σ_{c c}, f c c$ )
8:: if $β_{l i n e}$ Lies outside the interior of $σ$ then
9:: $β_{c e n t e r} \leftarrow (\frac{f_{c r}}{β} - f_{c r} + 1) * (σ_{c c} - f_{c c}) + f_{c c}$
10:: else
11:: $β_{c e n t e r} \leftarrow (\frac{f_{c r}}{β} - f_{c r}) * (f_{c c} - σ_{c c}) + σ_{c c}$
12:: $β_{r a d i u s} \leftarrow$ distance( $v \in f . v e r t i c e s, β_{c e n t e r}$ )
13:: $p t s_{\cap} \leftarrow p t s_{\cap} ⋂$ P.neighbors( $β_{c e n t e r}, β_{r a d i u s}$ )
14:: if $β \geq 1$ then
15:: for each vertex $v_{i} \in σ$ do
16:: $β_{c e n t e r} \leftarrow (σ_{c r} * β - σ_{c r} + 1) * (σ_{c c} - v_{i}) + v_{i}$
17:: $β_{r a d i u s} \leftarrow$ distance( $v_{i}, β_{c e n t e r}$ )
18:: $p t s_{\cap} \leftarrow p t s_{\cap} ⋂$ P.neighbors( $β_{c e n t e r}, β_{r a d i u s}$ )
19:: return $p t s_{\cap} \in \emptyset ? true : false$

The

β

-Sparsification exclusion region for a d-simplex is defined by the intersection of space formed by

d + 1

hyper-spheres positioned about each facet of the simplex. These hyper-spheres (the red spheres in Figure 2) are positioned with centers at positions (

β_{c e n t e r}

) on a vector (

β_{l i n e}

) originating from the d-simplex that pass outward from the d-simplex (the green vectors of Figure 2). The

β_{c e n t e r}

location is scaled to a position on

β_{l i n e}

that is defined by geometric elements of the d-simplex and the sparsification factor

β

. The orientation of

β_{l i n e}

, the position of

β_{c e n t e r}

, and the

β

-Sparsification exclusion region are established when

β \leq 1

and when

β \geq 1

as follows:

$β \leq 1 :$

For each facet f of a d-simplex

σ

:

1.: $β_{l i n e}$ is defined to bisect the facet at $f_{c c}$ and connecting the d-simplex center ( $σ_{c c}$ ) in the direction away from the simplex;
2.: $β_{c e n t e r}$ is positioned on $β_{l i n e}$ by Equation (3) for acute simplices and Equation (4) for obtuse simplices;
3.: a hyper-sphere centered at $β_{c e n t e r}$ with a radius equal to the distance from $β_{c e n t e r}$ to a vertex of the facet f is defined.

The

β

-Sparsification exclusion region for the d-simplex is defined as the intersection of the hyper-spheres from each facet of the d-simplex. Thus, when

β = 0

, the position of

β_{c e n t e r}

on

β_{l i n e}

begins infinitely far away from

σ_{c c}

; as

β

increases,

β_{c e n t e r}

moves toward

σ_{c c}

and reaches

σ_{c c}

at

β = 1

.

$β \geq 1 :$

For each facet f of a d-simplex

σ

, let

v_{i}

denote the simplex vertex that lies opposite the simplex facet f, then:

1.: $β_{l i n e}$ is defined to originate at vertex $v_{i}$ , passing through the simplex center $σ_{c c}$ , and through the facet f;
2.: $β_{c e n t e r}$ is positioned on $β_{l i n e}$ by Equation (5);
3.: a hyper-sphere centered at $β_{c e n t e r}$ with a radius equal to the distance from $β_{c e n t e r}$ to vertex $v_{i}$ is defined.

The

β

-Sparsification exclusion region for the d-simplex is defined as the intersection of all the hyper-spheres from each facet of the d-simplex. Thus, when

β = 1

, the position of

β_{c e n t e r}

is at

σ_{c c}

; as

β

increases,

β_{c e n t e r}

moves along

β_{l i n e}

away from

σ_{c c}

.

A formal derivation of Equations (3)–(5) that compute the

β_{c e n t e r}

positions (and equivalently the Algorithm 1 assignments of the variable

β_{c e n t e r}

) is contained in the next two sub-sections.

4.1.1. Positioning $β_{c e n t e r}$ for $0 \leq β \leq 1$

The parametric equation of a line from

\vec{a}

to

\vec{b}

can be defined with respect to a time parameter (t). The point

p (t)

represents the location of a point at any time t on the line. The point

p (t)

at

t = 0

, represents a starting point a and at

t = 1

represents an ending point b. The point

p (t)

is well defined for any value of t and represent a line segment

\vec{a b}

for a range

0 < t < 1

between a and b:

p (t) = t \vec{b} + (1 - t) \vec{a}

The expression in Algorithm 1 at line 11 corresponds to an acute d-simplex (circum-center lies inside the d-simplex), as shown in Figure 3a. The equation of the line segment joining

σ_{c c}

and

f_{c c}

is:

f (t) = σ_{c c} (1 - t) + f_{c c} (t) .

(2)

The

β_{c e n t e r}

for any value of

0 \leq β \leq 1

is positioned on this line and should be positioned so that at

β = 1

,

β_{c e n t e r}

is positioned at

σ_{c c}

and at

β = 0

,

β_{c e n t e r}

moves infinitely far away from

σ_{c c}

. Incorporating the computation of

β_{c e n t e r}

into this equation, we have to make the equation dependent on

β

instead of t. However, the parametric equation allows any value for t:

- \infty \leq t \leq \infty

and must be rewritten to switch the locations of

t = 0

and

t = 1

. Thus, the equation is revised to introduce a scaling parameter

f_{c r}

with

β

to replace

t = f_{c r} {\frac{1}{β} - 1}

. The substitution provides the required continuous effect on the range of

β

-coverage with respect to

β, σ_{c c}, f_{c c},

and

f_{c r}

values. Thus, Equation (2) becomes:

β_{c e n t e r} \leftarrow σ_{c c} (f_{c r} - \frac{f_{c r}}{β} + 1) + f_{c c} (\frac{f_{c r}}{β} - f_{c r})

Simplifying yields:

β_{c e n t e r} \leftarrow (\frac{f_{c r}}{β} - f_{c r}) (f_{c c} - σ_{c c}) + σ_{c c}

(3)

and verifying for

β = 1

:

σ_{c c} (f_{c r} - f_{c r} + 1) + f_{c c} (f_{c r} - f_{c r}) \equiv (f_{c r} - f_{c r}) * (f_{c c} - σ_{c c}) + σ_{c c} \equiv σ_{c c}

Thus, the boundary condition evaluates correctly to

β_{c e n t e r} = σ_{c c}

. For

0 \leq β \leq 1

the quantity

f_{c r} (\frac{1}{β} - 1)

increases from zero to infinity as

β

decreases and thus pushes

β_{c e n t e r}

infinitely far away from

σ_{c c}

.

The expression in Algorithm 1 at line 9 corresponds to an obtuse d-simplex (circum-center lies outside the d-simplex), as shown in Figure 3b. The derivation is similar to that for an acute d-simplex with a few adjustments. In particular, the circum-center in this case should not move in the direction of

f_{c c}

from

σ_{c c}

, but in opposite direction. Thus, Equation (2) is rewritten in terms of

f_{c r}

and

β

by replacing

t = f_{c r} (1 - \frac{1}{β})

:

β_{c e n t e r} \leftarrow σ_{c c} (\frac{f_{c r}}{β} - f_{c r} + 1) + f_{c c} (f_{c r} - \frac{f_{c r}}{β})

Simplifying yields:

β_{c e n t e r} \leftarrow (\frac{f_{c r}}{β} - f_{c r} + 1) (σ_{c c} - f_{c c}) + f_{c c}

(4)

The conditional evaluation of

β_{c e n t e r s}

is required by the shape of d-simplex. However, the net effect of both pathways is similar: computing the

β_{c e n t e r}

proportionately to

β

value and facet circum-radius. The use of the facet circum-radius is a straightforward choice, as it controls the sparsification holistically for each d-simplex.

4.1.2. Positioning $β_{c e n t e r}$ for $β \geq 1$

For

β \geq 1

, the relevant lines of Algorithm 1 are 14–18 and the graphic of Figure 2b. The

β

-Criteria for

β \geq 1

is again determined by a scalable positioning of

β_{c e n t e r}

for each simplex facet. The boundary conditions for the parametric equation when

β \geq 1

are decided by the d-simplex circum-center

σ_{c c}

and the vertex

v_{i}

opposite to the facet f. More precisely, rewriting Equation (2) with

t = σ_{c r} (1 - β)

(where

σ_{c r}

is d-simplex circum-radius) yields:

β_{c e n t e r} \leftarrow σ_{c c} (β σ_{c r} - σ_{c r} + 1) + v_{i} (σ_{c r} - β σ_{c r})

Simplifying:

β_{c e n t e r} \leftarrow (β . σ_{c r} - σ_{c r} + 1) (σ_{c c} - v_{i}) + v_{i}

(5)

which serves for the expression in Line 16 of Algorithm 1.

While not shown, the implementation of Algorithm 1 uses

k d

-trees [85] to efficiently identify points in the

β

-Coverage region (Lines 13 and 18). The

k d

-tree provides an index into a set of d-dimensional points that can be used to rapidly look up the nearest neighbors of any point. Substantial gains in efficiency can be achieved by approximation using r-approximate nearest neighbors for high-dimensional data. Utilizing the

k d

-tree reduces the average complexity of Algorithm 1 to

O (d

log n), with a worst-case complexity of

O (d n)

.

4.1.3. Geometric Interpretation of $β$ -Criterion

The goal of the

β

-Criterion is to generate a good cover of the point cloud that captures the nerve of the point cloud. Initially, the VR-complex guarantees that its homology is the same as the nerve defined by the union of open spheres with an

ϵ_{m a x}

maximum connectivity distance. At

β = 0

, all d-simplices satisfying the

ϵ_{m a x}

condition are considered valid irrespective of the simplex shape (in term of incident angles on vertices). As

β

increases, some d-simplices are invalidated; as

β

approaches 1, only triangles with empty circum-sphere condition remain; and as

β

increases above 1, additional d-simplices are invalidated (Figure 1). The intuition behind the strategy is to preserve topology-sensitive simplices (reducing the size of the complex) while preserving, as much as possible, the topology of the space.

The requirement that the

β

-Coverage region be empty ensures that the preference is given to smaller well shaped simplices rather than large irregular simplices. If the

β

-Coverage region is not empty, it is possible to cover the space with smaller simplices and, therefore, the current simplex can be invalidated. The approach is simple but powerful enough to provide systematic, continuous, and predictable reductions of the simplicial complex. More importantly, the reduction can be computed independently for any sparsification factor

β

. To conquer large datasets, where a VR complex fails due to its exponential size, increasing

β

can provide an opportunity to study large datasets with a significantly smaller complex size. Finally, if needed, the value of the minimum

β

for which the complex is memory-feasible can be determined dynamically.

The following theorem guarantees the preservation of homology of the point cloud for

0 \leq β \leq 1

.

Theorem 1.

For all values of

0 \leq β \leq 1

, the Delaunay d-simplices is a subset of the β-sparsified d-uniform hypergraph, and thus preserves the homology of the point cloud.

Proof.

For

β = 1

, the

β

-sparsified d-uniform hypergraph becomes equivalent to Delaunay Triangulation of the space. From the construction in Algorithm 1, for any

β_{i} < β_{j}

, the

β_{j}

-sparsified d-uniform hypergraph will be a subset of the

β_{i}

-sparsified d-uniform hypergraph. Therefore, for any

0 \leq β \leq 1

, the Delaunay simplices will be included, thereby ensuring the Delaunay homology is captured. The persistence barcodes transit smoothly from barcodes obtained from Delaunay Complex to barcodes obtained from VR-complex as

β

moves toward 0 from 1. □

For

β > 1

, the paper proposes an identification of the

β_{c r i t i c a l}

value, up to which only insignificant features with smaller barcodes are lost, while preserving the significant topological features with larger barcodes. Identification of the

β_{c r i t i c a l}

value is sensitive to the geometry of the point cloud in

R^{d}

. The

β_{c r i t i c a l}

value is of much significance for larger point clouds, where any computation for

β < β_{c r i t i c a l}

is a wasteful use of resources. Identification of the

β_{c r i t i c a l}

value is crucial and is an interesting research direction that is beyond the scope of this paper, since the

β

-criterion eliminates simplices based on the proximity to the boundary points. In general, the significant features have boundary points that remain valid for larger

β

values. This is because the significant features in general have no points inside the boundary for larger proximity distances. The insignificant features get eliminated for smaller

1 \leq β \leq β_{c r i t i c a l}

values, while larger features are still preserved.

4.2. Enumeration of d-Simplices

This section discuss the enumeration of d-simplex implemented in Algorithm 2. As characterized by Equation (1), the exploration space of the possible d-simplices grows exponentially with the ambient dimension of the data. Classically, the size of a VR-complex can be reduced by bounding the maximum edge length for a d-simplex, a parameter generally referred to as

ϵ_{m a x}

. Thus, Equation (1) is modified to bound the maximum edge length of a d-simplex:

E d g e_{m a x} (σ_{i}) < ϵ_{m a x}

to form the complex:

S_{d} = {σ_{i} \subseteq P : | σ_{i} | = d + 1, \forall σ_{i} E d g e_{m a x} (σ_{i}) < ϵ_{m a x}} .

(6)

While this works and is widely used to optimize a VR-complex, its application is made without consideration for the topology of the data.

β

-Sparsification works to achieve a similar reduction while considering the topology of the data.

Algorithm 2 Filtered Set of d-Simplices

Input: $β$ , $ϵ_{m a x}$ , P
Output: $β (S_{d})$ ▹ the sparse d-simplex set

1:: $β (S_{d}), \overset{ˇ}{P} \leftarrow \emptyset$
2:: if $0 < β < 1$ then
3:: for $i \leftarrow 1, 2, 3, \dots, n$ do
4:: $\overset{ˇ}{P} \leftarrow \overset{ˇ}{P} ⋃ p_{i}$ ▹ remove $i^{t h}$ point $p_{i}$ from further enumerations
5:: Ball( $p_{i}, ϵ_{m a x}$ ) ← P.neighbors( $p_{i}, ϵ_{m a x}$ )
6:: for $\forall {p_{i_{1}}, p_{i_{2}}, \dots, p_{i_{d}}} \subseteq B a l l (p_{i}, ϵ_{m a x}) ∖ \overset{ˇ}{P}$ do
7:: $σ_{i} \leftarrow {p_{i_{1}}, p_{i_{2}}, \dots, p_{i_{d}}}$
8:: if $β$ -Criterion( $β, σ_{i}, P, ϵ_{m a x}$ ) then
9:: $β (S_{d}) .$ append( $σ_{i}$ )
10:: if $β \geq 1$ then
11:: $M_{d e l} \leftarrow$ computeDelaunay(P)
12:: for $σ_{i} \in M_{d e l}$ do
13:: if $β$ -Criterion( $β, σ_{i}, P, ϵ_{m a x}$ ) then
14:: $β (S_{d}) .$ append( $σ_{i}$ )

Enumerating the d-simplices in a point cloud P and applying the

b e t a

-Criterion test is given in Algorithm 2. For

0 < β \leq 1

, the enumeration is performed at lines 3–7. This enumeration step is accomplished by iterating through each point in P. For each point the neighboring points within a ball of radius

ϵ_{m a x}

(hereafter written

| ϵ_{b a l l_{i}} |

) are examined; points outside the ball are ignored. (This is a standard optimization for building VR complexes in TDA tools. What is unique in this work is that this limit is also applicable for constructions (by

β

-sparsification) of a Delaunay complex that occur at

β = 1

. The standard methods to construct a Delaunay complex do not permit optimization through the use of a bounding

ϵ_{m a x}

.) Figure 4 illustrates the simplex selection process. From each point

p_{i}

, we consider all simplices originating from point

p_{i}

that have

E d g e_{m a x} < ϵ_{m a x}

. To generate these simplices, Algorithm 2 searches for points within

ϵ_{m a x}

vicinity of

p_{i}

(line 5). All processed points

p_{i}

are recorded in

\overset{ˇ}{P}

at line 4 and are removed from

{ϵ_{b a l l_{i}}}

to avoid duplicate enumerations at line 6. For any d-simplex rooted at

p_{i}

, it must contain

p_{i}

along with d other points. These d points are generated using enumerations from all points in

{ϵ_{b a l l_{i}}} ∖ \overset{ˇ}{P}

. This process generates

(\binom{| {ϵ_{b a l l_{i}}} ∖ \overset{ˇ}{P} |}{d})

enumerated d-simplices rooted at

p_{i}

. This process repeats for each point

p_{i}

in P and each d-simplex is evaluated against the

β

-Criterion test. This step provides a significant reduction in the enumeration space and significantly accelerates the approach for smaller

ϵ_{m a x}

thresholds.

The

ϵ_{m a x}

constrained the maximum connectivity distance for complex construction. The complex enables the computation of persistent homology up to

ϵ_{m a x}

value and does not construct the simplices with weight greater than

ϵ_{m a x}

. This is a very useful technique to limit the size of the complex to some

ϵ_{m a x}

threshold value, above which the complex is believed to have no topological features. Figuring out the optimal value of

ϵ_{m a x}

that computes the complete persistence barcodes requires further investigation and is another interesting research direction.

Enumerating the d-simplices for

β \geq 1

and applying the

β

-Criterion test is performed in Algorithm 2 at lines 10–14. In this case, the enumerated d-simplices at dimension d can be reduced to

O (n^{⌈ \frac{d}{2} ⌉})

d-simplices with a Delaunay optimization. Without this optimization the algorithm must examine every possible simplex in the point cloud data; thus, when

ϵ_{m a x} = \infty

, there are

(\binom{n}{d + 1})

combinations (Equation (1)). For

β = 1

, the

β

-Criterion region is equivalent to the simplex circum-sphere. This makes the

β

-Criterion test identical to the Delaunay triangulation criterion. Thus, the computeDelaunay method at line 11 of Algorithm 2 returns the d-simplex set from a Delaunay triangulation (

D e l (P)

) and only d-simplices generated from

D e l (P)

are evaluated against the

β

-Criterion test to enumerate the sparse d-simplex set. However, in cases where Delaunay triangulation is not feasible due to shear size or dimensions of the point cloud, the criterion for

0 \leq β \leq 1

can be extended for

β > 1

without loss of generality.

The complexity of Algorithm 2 depends on the number of enumerated d-simplices. For

0 \leq β < 1

, a maximum of

O (n^{d})

possible d-simplices will be evaluated against the

β

-Criterion test. The average case complexity of Algorithm 1 is

O (d

log

(n))

with

O (d)

for circum-circle and circum-radius computation and

O (

log

(n))

for neighborhood search using

k d

-trees and worst case complexity is

O (d n)

with

k d

-tree linear search behavior in higher dimensions. With

O (n^{d})

enumerated simplices, the average and worst-case complexity of Algorithm 2 is

O (d

log

(n) n^{d})

and

O (d n^{d + 1})

. For

β \geq 1

, the complexity depends on the cost of the Delaunay Triangulation and

β

-Criterion test. The worst-case time complexity of the Quickhull algorithm is

O (n

log

(r))

for

d \leq 3

and

O (n \frac{f_{r}}{r})

for

d > 4

, where

f_{r} = O (\frac{r^{⌊ \frac{d}{2} ⌋}}{⌊ \frac{d}{2} ⌋!})

[86]. The maximum possible d-simplices in this case are

O (n^{⌈ \frac{d}{2} ⌉}

); therefore the average complexity of Algorithm 2 is

O (d

log

(n) n^{⌈ \frac{d}{2} ⌉})

, and the worst-case complexity is

O (d n^{⌈ \frac{d}{2} ⌉ + 1})

.

4.3. The Family of $β$ -Sparsified Complexes

Algorithm 3 processes the sparse d-simplex set from Algorithm 2 and creates a sparsified complex. The algorithm generates a 1-skeleton from the d-simplex set by preserving the associated edges of each d-simplex. From the 1-skeleton, it then expands the complex to the ambient dimension (

d i m_{m a x}

). This results in the

β

-sparsified complex of the point cloud. An overview of this family is discussed below.

Algorithm 3 Simplicial Complex Generation from d-Simplex Set

Input: $β (S_{d})$ ▹ the sparse d-simplex set
Output: $C_{β}$ ▹ the sparse complex

1:: $C_{β} \leftarrow \emptyset$
2:: for each simplex $σ_{i} \in β (S_{d})$ do
3:: $C_{β} [0] .$ insert( $v e r t i c e s_{σ_{i}}$ )
4:: $C_{β} [1] .$ insert( $e d g e s_{σ_{i}}$ )
5:: for $2 \leq d \leq d i m_{m a x}$ do
6:: for each simplex $σ_{i} \in C_{β} [d - 1]$ do
7:: for each vertex $v_{i} \in C_{β} [0] \land v_{i} \notin σ_{i}$ do
8:: if $\forall e d g e s (v e r t i c e s (σ_{i}), v_{i}) \in C_{β} [1]$ then
9:: $σ_{d} \leftarrow σ_{i} \cup v_{i}$
10:: $C_{β} [d] .$ insert( $σ_{d}$ )

For some values of

β

, the sparsified 1-skeleton is equivalent to other well-known graphs. For example, when

β = 0

, no simplices will be removed; thus all d-simplices satisfying the

ϵ_{m a x}

criterion are included as part of the sparse d-simplex set and are thus equivalent to the Vietoris–Rips Complex. For any

0 \leq β < 1

, the 1-skeleton generated is a sub-graph of a complete graph and the corresponding sparsified 1-skeletons form a subset relation such that:

\forall 0 \leq β < 1, S k l_{1}^{1} \subset S k l_{β}^{1} \subset S k l_{0}^{1} \equiv K

(7)

where

S k l_{β}^{1}

denotes the

β

-sparsified 1-skeleton computed at

β

and

K

denotes the complete graph. For

β = 1

, the 1-skeleton obtained represents the Delaunay complex.

β

-Sparsification for

β \geq 1

generates a family of sparsified Delaunay complexes; each have a 1-skeleton that is a subset of

D e l (P)

. This relation is stated formally as:

\forall 1 \leq β < \infty, S k l_{\infty}^{1} \subset S k l_{β}^{1} \subset S k l_{1}^{1} \equiv Del (P)

(8)

Figure 5 presents the 1-skeleton obtained for some

β

values for data sampled from the Lion TM test data [87]. The 1-skeleton at

β = 1

represents the edges corresponding to Delaunay triangulation. The 1-skeleton is obtained from the sparsified d-simplex set and is utilized to generate a simplicial complex from 1-skeleton graph. The size of simplicial complex thus obtained is proportional to the number of edges in the 1-skeleton.

4.4. Filtered Simplicial Complex

The simplicial complex generated by Algorithm 3 must be filtered based on the weights of the simplices. The 1-skeleton obtained in the Section 4.3 is used to mark the 2-simplices (edges) in the d-simplex set at particular

β

as mentioned in Equations (7) and (8). Based on the 1-skeleton a fully expanded flag complex is constructed to compute the persistent homology. The weight of each simplex is assigned as the maximum weight among its edges. The simplices in the complex are ordered by simplex weight to create a nested sequence of complexes known as a filtration

K_{F}

formed from that data, connected at increasing distances

ϵ = (ϵ_{0}, ϵ_{1}, \dots, ϵ_{\infty})

such that:

\emptyset \subseteq K_{ϵ_{0}} \subseteq K_{ϵ_{1}} \subseteq \dots \subseteq K_{ϵ_{\infty}} = K_{F} .

(9)

The persistent homology of the complex is computed on the filtered simplicial complex. The experimental results studying the effect of sparsification on the generated representation is evaluated in the next section.

5. Experimental Results

The experimental study in this paper examines the topological impacts and memory usage of

β

-sparsified simplicial complexes when computing PH. VR and Delaunay complexes are considered the basis to compare accuracy. To ensure the accuracy is compared on a uniform scale, the maximum edge length is recorded as the weight of the simplex for both the VR and Delaunay complexes. This section demonstrates the impact of

β

-Sparsification through experiments designed to achieve three main objectives, namely: (a) to evaluate the impact that

β

-Sparsification has on complex size; (b) to study the impact on run-time of the algorithm; (c) to understand how well

β

-Sparsification preserves the underlying topology of the space. The evaluation uses both synthetic and real-world test data sets.

The synthetic test data is composed of a set of d-spheres in

R^{2}

–

R^{6}

generated using the tadasets Python library. The d-spheres have 50 points each with a noise parameter of

0.2

. While using only 50 points of data may seem small, it is necessary to maintain memory constraints, which enables the computation of PH at higher dimensions. Although not detailed here, additional experiments with 1-spheres and 2-spheres containing larger vertex populations confirm that the basic trends observed with 50 points hold for larger datasets.

The real-world test data is composed of two main sets. The first set is taken from the Triangulated Meshes (TM) database [87]. In particular, four models are used, namely Camel, Flamingo, Lion, and Elephant (Figure 6). The second set of real-world test data consists of 7 data sets sourced from the Mouse Lung Anatomy and Particle Deposition (LAPD) archive [88]. The lung data sets provide high-resolution airway geometries for the study of respiratory systems including toxicology risk assessments and tobacco smoke exposure. Table 1 provides a summary characterization of the lung data. The sex and strain attributes are descriptors of mouse type, while the outlet areas and branches represent geometric and topological descriptors of the lung. The outlet areas denote the terminal alveolar openings representing the

H_{1}

feature of the space. Understanding the

H_{1}

features or loops present on the lung surface enabled the study to analyze the effects sparsification has on loop structures as

β

varies. The branches denote various lung bronchioles and represent the voids in the lung structure. Figure 7 shows the site-specific particle deposition data for the M02 data set.

Some of the real-world test data is too large to be analyzed with current PH tools. As a result, some of the data has been sampled using k-means++. While several techniques exist for sampling point cloud data to compute PH [35,89,90], the use of k-means++ for data reduction has proven effective in preserving the significant topological features in point cloud data [91,92]. The number of points in the original and reduced (sampled) data sets for the real world test data is reported in Table 2. Due to memory constraints, the d-spheres and the sampled Lion₂₀₀ test data are used instead of the full Lion dataset when

β \geq 0

. When

β \geq 1

, both the sampled LAPD and the original TM data are used. All PH computations are performed using the LHF persistent homology tool chain [93] and testing was performed on an Intel(R) Xeon(R) CPU E5-1620 @ 3.70 GHz with 128 GB of RAM.

Experimental results were captured to compare the topological impact and runtime of

β

-Sparsification. The topological impact is measured using the Sliced-Wasserstein (SW) distance metric [94], which compares the results from a

β

-sparsified complex to either the VR complex or the Delaunay complex. The SW distance is sensitive to the scale of data and must be interpreted relative to original scale of the data. In general, the SW comparison will be computed and reported by homology dimension.

5.1. $β$ -Sparsified Complex: Space Analysis

The advantage of

β

-Sparsification lies in its controllable parameter (

β

), which allows for selective adjustment of sparsification used to reduce the size of the complex (larger values of

β

lead to larger reductions). This subsection reports the experimental results with both the synthetic and real-world test data sets. In particular, the counts for the size of the

β

-Sparsified complexes for some values of

β

are plotted in Figure 8, Figure 9 and Figure 10. From these studies, the following observations can be made:

Observation 1 Reduction is sensitive to dimension: Figure 8 plots the reduction in the 1-skeleton size for d-spheres in

R^{2}

–

R^{6}

as

β

increases. The reduction in the count of 1-skeleton is sensitive to the ambient dimension of the data; higher-dimensional data experiences a lower reduction curve compared to lower-dimensional data. This is to be expected, as increasing dimensions causes the possibility of a multi-faceted interaction to increase, thereby making the complex more dense. This phenomenon is analogous to observing that as the sizes of a VR complex and Alpha complex become similar, the number of vertices in a data set approaches (from above) the dimension of the data (assuming the topology of the data is not confined to a subspace of the ambient dimension).

Observation 2 The largest reductions occur for

0 < β \leq 1

: For

β

values close to 0, larger simplices are immediately removed and the complex size drops rapidly (Figure 8). This is true for most datasets except pure hyper-spherical distributions. This is expected as

β

-Criterion region begins to grow around the simplex circum-center and simplex is dropped only if there is a point in

β

-Coverage region. For pure hyper-spherical datasets this

β

-Coverage region is empty for every enumerated simplex until

β = 1

. As

β > 1

, all the simplices become invalidated, resulting in a steep vertical reduction curve. This phenomenon can be observed in reduction curves of Figure 8 where the reduction is much steeper when

β

approaches 1. The perfect vertical reduction is not observed due to the noise added to d-sphere data.

Observation 3 There exists a (data-dependent) value β where the rate of reduction becomes vanishingly small: The reduction curve eventually saturates, when minimal addition reduction is achieved with increasing

β

. Figure 5 depicts the 1-skeleton reductions for the Lion₂₀₀ data set as

0 \leq β \leq 5

. Furthermore, the overall complex size is proportional to the 1-skeleton size and reduces with increasing

β

(Figure 10). The reduction rate is high for

0 \leq β < 1

, where the 1-skeleton has loses most of its edges. For

β \geq 1

, the 1-skeleton reduces at a lower rate and reaches a saturation point at which very little additional reduction is observed. The plots in Figure 9a,b show the reduction in 1-skeleton size for the larger LAPD and TM data sets for

1 \leq β \leq 20

(note that the x-axis and y-axis are both log scale). This

β

saturation value depends on the underlying geometry of the point cloud.

In general, saturation is achieved more quickly for datasets with perfect hyper-spherical boundaries. All of the TM data sets have abdominal and head voids as their most persistent feature. Alternatively, in the LAPD data, the various branches constitute a void and none of the branches significantly define a feature. This fundamental structural difference leads to a flatter curve for TM data (Figure 9b) compared to the LAPD data (Figure 9a). This effect is also visible among TM data sets where the Flamingo and Camel data have a sharper boundary for the

H_{2}

void feature. This results in a flatter curve for Flamingo and Camel compared to Lion and Elephant. Furthermore, when compared to the Camel data, the more prominent head

H_{2}

features of the Flamingo dateset allows it to experience an even flatter curve.

5.2. $β$ -Sparsified Complex: Time Analysis

As discussed in Section 4, the run-time complexity for building a

β

-Sparsified complex is

O (n^{d})

for

0 \leq β < 1

and

O (n^{⌊ \frac{d}{2} ⌋})

for

β \geq 1

, which is independent of

β

. Therefore, it is unnecessary to experimentally evaluate the run-time costs for multiple values of

β

. However, since the

β

-Sparsification algorithm discussed here has different computational solutions for

β < 1

and

β \geq 1

, showing performance results from each range of

β

may be somewhat useful.

The runtime and (approximate) maximum memory (The approximate maximum memory use is captured for the process id (PID) with the linux command: ps -o rss= -p PID.) to construct a VR, Delaunay, and

β

-Sparsified complex for the real-world test data is reported in Table 3. Unfortunately, except for the 200-point sampled Lion data, construction of the VR complex fails due to memory limitations. Despite the

β

-Sparsified complex taking longer than the Delaunay complex to construct, it uses significantly less memory. Unfortunately, the

β

-Sparsified complex for sparsification parameter

0.5

takes far too long to construct for larger datasets and did not finish within a reasonable run-time, despite using very little memory. A parallel approach will significantly reduce the time required and will make computations on larger point clouds feasible.

Finally, there are parallelization opportunities that can improve the run-time costs for

β

-sparsification. For example, enumerating the sparse d-simplex set (Algorithm 2) can be easily parallelized. More precisely, the order of evaluation of simplices for

β

evaluation does not matter. Since the number of simplices to be evaluated also increases exponentially, parallelism alone may be insufficient for bounding the memory and runtime complexities for constructing the complex. As a result it may be necessary to use

β

-Sparsification in conjunction with a limiting value for

ϵ_{m a x}

. As suggested in Section 4.2 and Algorithm 2, setting bounds on

ϵ_{m a x}

can significantly reduce the runtime. However, these additional optimizations are not studied in this paper.

5.3. The Topological Impact of $β$ -Sparsification

This section explores the impact that

β

-sparsification has on the topological features in the data. In particular, this section explores the impact that

β

-Sparsification has on the output of a PH computation, comparing results of PH computation on a VR complex and a Delaunay complex with those obtained using a

β

-Sparsified complex (for some values of

β

).

The output of a PH computation is a set of Persistence Intervals (PIs). Each PI characterizes a topological feature found in a filtration of the data as a 3-tuple of the form

〈 d i m, ϵ_{b i r t h}, ϵ_{d e a t h} 〉

, where

d i m

is the dimension of the feature,

ϵ_{b i r t h}

is the

ϵ

distance of the complex in the filtration where the feature first appears, and

ϵ_{d e a t h}

is the

ϵ

distance of the complex in the filtration where the feature last appears. The comparison is made by computing the Sliced-Wasserstein (SW) distance [95] for the dimensionally separated PIs, comparing VR

\to β

-Sparsified and Delaunay

\to β

-Sparsified. These comparisons are made for several values of

β

and are summarized by the plots of Figure 11, Figure 12, Figure 13 and Figure 14. From these plots, the following observations can be made:

Observation 1: Higher-dimensional features are most impacted: The features affected the most in the sparsification process are the highest-dimension feature; this impact diminishes with each lower-dimensional feature, with

H_{0}

being the least affected feature for the same value of

β

.

Observation 2: Lower-dimensional features are least affected: The

H_{0}

SW distance is relatively insensitive to dimension; d-spheres in high dimensions show little increase in the

H_{0}

SW measure, even for large

β

values. One can observe that the blue line (solid and dashed) corresponding to

H_{0}

feature of d-sphere remains at the bottom and is the last curve to depart away from the x-axis. The

H_{1}

feature of d-sphere, represented by the red line, remains close to the x-axis and is the second curve to depart away from the x-axis. Similarly, this trend remains true for the

H_{2}

,

H_{3}

and

H_{4}

features.

Observation 3: Over-sparsification with larger values of β results in a significant degradation of feature PIs: These curves follow an upward trend followed by a downward trend for the

β > 1

range. The upward trend occurs when the feature persistence interval undergoes a degradation due to sparsification. The sudden downward trend indicates that the degradation has finally led to the collapse of the feature, as it is no longer reported by PH computation. The increase and subsequent decrease in the SW distance can be observed in Figure 11, Figure 12a and Figure 13a.

Observation 4: Over-sparsification results in a decrease in Betti count: Figure 12a and Figure 13a show that the

H_{1}

features for the LAPD data set remain stable for

β

values close to 2. The graphs in Figure 12b and Figure 13b show that the loop count for each data set remains close to their actual numbers (shown by the outlet area attribute in Table 1). This suggests that the

H_{1}

features consistently remain unaffected by the sparsification process until

β

becomes greater than 2.

M 08

has the largest number of outlet areas, followed by

M 11

,

M 02

,

M 09

,

M 13

,

M 07

and

M 10

. This relative order is significantly preserved in identified counts of

H_{1}

features.

Observation 5: Similar geometries show similar SW and Betti count curves: Examining both the LAPD and TM results suggests that data sets with similar topological and geometric properties follow similar curves for

β > 1

. From the plots in Figure 12a,b for the lung dataset and Figure 13a,b for the TM dataset, the

H_{1}

features show an upward hump in the range of

1 < β \leq 3

for both the SW-distance and feature count. As

β

increases, the sparsification process continues to reduce edges from the complex; this results in an increase in overall

H_{1}

feature count in these cases. This

H_{1}

feature SW distance and feature count follows a decreasing trend for

β > 3

as the loops start disconnecting. This pattern continues for

H_{2}

,

H_{3}

and higher-dimensional features, as can be seen in Figure 11.

Figure 14 reports the SW measures shown by homology dimension for the Lion₂₀₀ data set. The comparisons are made for both the VR complex (solid lines) and the Delaunay complex (dashed lines). For

0 \leq β < 1

, the SW distance is 0 for

H_{0}

features. This is because the 1-skeleton always contains the minimum spanning tree for the underlying point cloud. For

H_{0}

features, the Delaunay and VR features are exactly the same, as both capture the minimum spanning tree. For

β \geq 1

, the

β

sparsified complex begins to lose edges from the minimum spanning tree; therefore, the SW distance compared to the Delaunay and VR complexes increases. The PIs for

H_{1}

features match the VR complex for

β = 0

and quickly shift close to the Delaunay Complex as

β

approaches 1. This effect is depicted by the green solid (VR) and dotted (Delaunay) lines. Compared to the Delaunay complex, the SW distance for

H_{1}

and

H_{2}

features is not 0 at

β = 1

(recall that

β

-Sparsification produces a Delaunay complex at

β = 1

). This variation happens because Delaunay triangulations of a point cloud are not unique and the sparsified complex at

β = 1

contains edges corresponding to all such triangulations, whereas the Delaunay complex only contains edges corresponding to one such triangulation. The Delaunay triangulation is not unique only if the point cloud has co-spherical points. However, the change in persistence intervals is not very significant, as Delaunay also tends to maintain well shaped simplices as compared to every possible triangle in VR complex.

β

-Sparsification reduces the complex size as the values of

β

increase. However, as the

β

value increases above 1, there is a value of

β

such that the sparsification begins to have non-trivial impacts on the output results from a PH computation. We will refer to the

β

value where the SW distance degrades significantly as the Critical β Value. Since the impact of

β

varies by dimension, the Critical

β

value will be presented by dimension as

β_{c r i t i c a l}^{H_{i}}

for

i \geq 0

. A summary of the approximate critical

β

values for

H_{0}

,

H_{1}

, and

H_{2}

homology features for the test data sets used in this study is shown in the rightmost columns of Table 2. These critical values are observed from the plots of the SW distances in this paper and should not be indiscriminately applied to other data sets. Unfortunately, there is no hard limit for defining the values for

β_{c r i t i c a l}^{H_{i}}

. Small

β

values with near-zero SW distances should always be preferred. However, given the computational computability of PH, it may not be feasible to use small

β

-values with large data sets.

6. Discussion

The space complexity of VR complexes and Alpha complexes inhibits the computation of PH on large datasets. The principle motivation behind this work is the reduction of the simplicial complex with a scalable

β

-Sparsification step to support the computation of PH on larger point clouds than currently possible. A critical factor for this is determining a suitable value for

β

. Unfortunately the size of a simplicial complex is not completely defined by the size or dimension of a point cloud but also by the relative positions of the vertices in a point cloud. Consequently, defining a suitable value for

β

to scale a simplicial complex is highly dependent on the size, dimension, and structure of the point cloud.

For large datasets where computing PH from a VR complex or Alpha complex is infeasible, one can explore

β

-Sparsification with increasing values of

β > 1

where the computation of PH becomes feasible. Unfortunately this must be done experimentally with an understanding that larger values of

β

can cause the

β

-Sparsified complex to lose topological features in the data. Fortunately the first features to be lost are smaller features with short persistent intervals; the loss will grow to larger features as values of

β

continue to be increased. Furthermore, within the TDA/PH communities, the short persistent intervals are often considered insignificant [47,96]. That said, there are some applications for which the short persistent intervals are significant [97,98]. Thus, the application of

β

-Sparsification must be carefully considered by a domain expert with experience regarding the significance of the persistence intervals output from a PH computation.

Finally, a

β

-Sparsified complex can be used as input to an edge collapse algorithm [50,51] to determine the minimal core complex. Edge collapse has been studied on a 1-skeleton of the VR complex and may become infeasible for large datasets. Furthermore, edge collapse on random simplicial complexes has been studied [52]. It may be worth considering how the edge collapse and underlying skeleton generated by

β

-Sparsification come together. Unfortunately, the only public implementation of edge collapse is contained in GUDHI [99] and it cannot be combined with a computation of PH using the current GUDHI library resources.

7. Conclusions

This paper presents a sparsification approach based on d-uniform hypergraphs to reduce the size of a simplicial complex. The method, called

β

-Sparsification, provides a scalable parameter (

β \geq 0

) to produce a family of increasingly sparse simplicial complexes that shrink with increasing

β

.

β

-Sparsification produces a VR complex when

β = 0

and a Delaunay complex when

β = 1

. With careful selection of

β

values, the sparsification process can enable the computation of PH on moderately large data sets with significantly reduced memory usage.

The

β

-Sparsification technique provides a topology-sensitive removal of less significant simplices from a simplicial complex. The value of the sparsification factor,

β

, can be tuned as per the point cloud size and the availability of memory resources. For

0 \leq β \leq 1

the sparsification reduces the complex size without significantly affecting the underlying topology. As

β

increases above 1, there are values of

β

(called

β_{c r i t i c a l}

value) where the sparsification tends to have a significant negative impact on the homology of the point cloud. Unfortunately the

β_{c r i t i c a l}

value varies both by data set and by homology dimension. In general, the

β_{c r i t i c a l}

value decreases for higher-dimensional homology features. Identification of the

β_{c r i t i c a l}

value for a point cloud in question is an interesting research direction that requires thorough mathematical and statistical investigation of the point cloud and is beyond the scope of this paper.

Author Contributions

Conceptualization, R.P.S., N.O.M. and P.A.W.; methodology, R.P.S., N.O.M. and P.A.W.; software, R.P.S., N.O.M. and R.R.; validation, R.P.S.; formal analysis, R.P.S. and R.R.; investigation, R.P.S., N.O.M., R.R. and P.A.W.; data curation, R.P.S., N.O.M. and P.A.W.; writing—original draft preparation, R.P.S., R.R. and P.A.W. writing—review and editing, R.P.S. and P.A.W.; visualization, R.P.S.; supervision, P.A.W.; project administration, P.A.W. funding acquisition, P.A.W. All authors have read and agreed to the published version of the manuscript.

Funding

Support for this work was provided in part by the National Science Foundation under grant IIS-1909096.

Data Availability Statement

The Triangulated Meshes dataset is available from the Mesh Data from Deformation Transfer for Triangle Meshes [87]. The Mouse Lung dataset is available from the Mouse Lung Anatomy and Particle Deposition (LAPD) archive [88].

Conflicts of Interest

Author Dr. Nicholas O. Malott was employed by the company Convergent Analytics. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Carlsson, G. Topology and Data. Bull. Am. Math. Soc. 2009, 46, 255–308. [Google Scholar] [CrossRef]
Chazal, F.; Michel, B. An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists. Front. Artif. Intell. 2021, 4, 667963. [Google Scholar] [CrossRef]
Ghrist, R. Barcodes: The Persistent Topology of Data. Bull. Am. Math. Soc. 2008, 45, 61–75. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Harer, J. Persistent Homology—A Survey. Surv. Discret. Comput. Geom. 2008, 453, 257–282. [Google Scholar]
Otter, N.; Porter, M.A.; Tillmann, U.; Grindrod, P.; Harrington, H.A. A Roadmap for the Computation of Persistent Homology. EPJ Data Sci. 2017, 6, 17. [Google Scholar] [CrossRef]
de Silva, V.; Morozov, D.; Vejdemo-Johansson, M. Dualities in Persistent (Co)Homology. Inverse Probl. 2011, 27, 124003. [Google Scholar] [CrossRef]
Riihimäki, H.; Chachólski, W.; Theorell, J.; Hillert, J.; Ramanujam, R. A topological data analysis based classification method for multiple measurements. BMC Bioinform. 2020, 21, 336. [Google Scholar] [CrossRef]
Hensel, F.; Moor, M.; Rieck, B. A Survey of Topological Machine Learning Methods. Front. Artif. Intell. 2021, 4, 52. [Google Scholar] [CrossRef] [PubMed]
Berry, E.; Chen, Y.C.; Cisewski-Kehe, J.; Fasy, B.T. Functional summaries of persistence diagrams. J. Appl. Comput. Topol. 2020, 4, 211–262. [Google Scholar] [CrossRef]
Moon, C.; Lazar, N.A. Hypothesis Testing for Shapes using Vectorized Persistence Diagrams. arXiv 2019, arXiv:2006.05466. [Google Scholar] [CrossRef]
Maroulas, V.; Nasrin, F.; Oballe, C. A bayesian framework for persistent homology. SIAM J. Math. Data Sci. 2020, 2, 48–74. [Google Scholar] [CrossRef]
Curry, J.; Mukherjee, S.; Turner, K. How many directions determine a shape and other sufficiency results for two topological transforms. arXiv 2019, arXiv:1805.09782. [Google Scholar] [CrossRef]
Turner, K.; Mukherjee, S.; Boyer, D.M. Persistent homology transform for modeling shapes and surfaces. Inf. Inference A J. IMA 2014, 3, 310–344. [Google Scholar] [CrossRef]
Bukkuri, A.; Andor, N.; Darcy, I.K. Applications of Topological Data Analysis in Oncology. Front. Artif. Intell. 2021, 4, 38. [Google Scholar] [CrossRef]
Cámara, P.G. Topological methods for genomics: Present and future directions. Curr. Opin. Syst. Biol. 2017, 1, 95–101. [Google Scholar] [CrossRef]
Joshi, M.; Joshi, D. A Survey of Topological Data Analysis Methods for Big Data in Healthcare Intelligence. Int. J. Appl. Eng. Res. 2019, 14, 584–588. [Google Scholar]
Mandal, S.; Guzmán-Sáenz, A.; Haiminen, N.; Basu, S.; Parida, L. A topological data analysis approach on predicting phenotypes from gene expression data. In Proceedings of the Algorithms for Computational Biology; Martín-Vide, C., Vega-Rodríguez, M.A., Wheeler, T., Eds.; Lecture Notes in Computer Scince; Springer International Publishing: Cham, Switzerland, 2020; Volume 12099, pp. 178–187. [Google Scholar] [CrossRef]
Rizvi, A.H.; Camara, P.G.; Kandror, E.K.; Roberts, T.J.; Schieren, I.; Maniatis, T.; Rabadan, R. Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nat. Biotechnol. 2017, 35, 551–560. [Google Scholar] [CrossRef] [PubMed]
Sauerwald, N.; Shen, Y.; Kingsford, C. Topological Data Analysis Reveals Principles of Chromosome Structure in Cellular Differentiation. In Proceedings of the 19th International Workshop on Algorithms in Bioinformatics (WABI 2019); Huber, K.T., Gusfield, D., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2019; Volume 143, pp. 23:1–23:16. [Google Scholar] [CrossRef]
Amenta, N.; Bern, M.; Eppstein, D. The Crust and the β-Skeleton: Combinatorial Curve Reconstruction. Graph. Model. Image Process. 1998, 60, 125–135. [Google Scholar] [CrossRef]
Alonso, L.; Méndez-Bermúdez, J.A.; Estrada, E. Geometrical and spectral study of β-skeleton graphs. Phys. Rev. E 2019, 100, 062309. [Google Scholar] [CrossRef] [PubMed]
Bobrowski, O.; Kahle, M. Topology of Random Geometric Complexes: A Survey. J. Appl. Comput. Topol. 2018, 1, 331–364. [Google Scholar] [CrossRef]
Kahle, M. Random Geometric Complexes. Discret. Comput. Geom. 2011, 45, 553–573. [Google Scholar] [CrossRef]
Osting, B.; Palande, S.; Wang, B. Spectral Sparsification of Simplicial Complexes for Clustering and Label Propagation. J. Comput. Geom. 2020, 11, 176–211. [Google Scholar] [CrossRef]
Chung, F.R.K.; Graham, R.L. Cohomological aspects of hypergraphs. Trans. Am. Math. Soc. 1992, 334, 365–388. [Google Scholar] [CrossRef]
Ren, S.; Wu, J. Stability of persistent homology for hypergraphs. arXiv 2020, arXiv:2002.02237. [Google Scholar] [CrossRef]
Ren, S.; Wang, C.; Wu, C.; Wu, J. On The Discrete Morse Functions for Hypergraphs. arXiv 2021, arXiv:2108.02384. [Google Scholar] [CrossRef]
Liu, X.; Feng, H.; Wu, J.; Xia, K. Computing hypergraph homology. Found. Data Sci. 2024, 6, 172–194. [Google Scholar] [CrossRef]
Diestel, R. Homological aspects of oriented hypergraphs. arXiv 2021, arXiv:2007.09125. [Google Scholar] [CrossRef]
Dey, T.K.; Wang, Y. Computational Topology for Data Analysis; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar] [CrossRef]
Zomorodian, A. Fast construction of the Vietoris–Rips complex. Comput. Graph. 2010, 34, 263–271. [Google Scholar] [CrossRef]
Dey, T.K.; Shi, D.; Wang, Y. SimBa: An Efficient Tool for Approximating Rips-filtration Persistence via Simplicial Batch-collapse. In Proceedings of the 24th Annual European Symposium on Algorithms (ESA 2016); Sankowski, P., Zaroliagis, C., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2016; Volume 57, pp. 35:1–35:16. [Google Scholar] [CrossRef]
Dey, T.K.; Shi, D.; Wang, Y. SimBa: An Efficient Tool for Approximating Rips-Filtration Persistence via Simplicial Batch Collapse. ACM J. Exp. Algorithmics 2019, 24, 1.5:1–1.5:16. [Google Scholar] [CrossRef]
Bauer, U. Ripser: Efficient computation of Vietoris-Rips persistence barcodes. J. Appl. Comput. Topol. 2021, 5, 391–423. [Google Scholar] [CrossRef]
de Silva, V.; Carlsson, G. Topological estimation using witness complexes. In Proceedings of the Eurographics Symposium on Point-Based Graphics; Gross, M., Pfister, H., Alexa, M., Rusinkiewicz, S., Eds.; SPBG ’04; DEU: Goslar, Germany, 2004; pp. 157–166. [Google Scholar] [CrossRef]
Arafat, N.A.; Basu, D.; Bressan, S. ϵ-net Induced Lazy Witness Complexes on Graphs. arXiv 2020, arXiv:2009.13071. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Kirkpatrick, D.G.; Seideln, R. On the Shape of a Set of Points in the Plane. IEEE Trans. Inf. Theory 1983, 29, 551–559. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Mücke, E.P. Three-Dimensional Alpha Shapes. ACM Trans. Graph. 1994, 13, 43–72. [Google Scholar] [CrossRef]
Edelsbrunner, H. Shape reconstruction with Delaunay complex. In Proceedings of the Latin American Symposium on Theoretical Informatics; Lucchesi, C.L., Moura, A.V., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 119–132. [Google Scholar] [CrossRef]
Kerber, M.; Sharathkumar, R. Approximate Čech complex in low and high dimensions. In Proceedings of the International Symposium on Algorithms and Computation; Cai, L., Cheng, S.W., Lam, T.W., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2013; Volume 8283, pp. 666–676. [Google Scholar] [CrossRef]
Sheehy, D.R. Linear-Size Approximations to the Vietoris–Rips Filtration. Discret. Comput. Geom. 2013, 49, 778–796. [Google Scholar] [CrossRef]
Chari, M.K. On discrete Morse functions and combinatorial decompositions. Discret. Math. 2000, 217, 101–113. [Google Scholar] [CrossRef]
Kozlov, D.N. Discrete Morse theory for free chain complexes. Comptes Rendus Math. 2005, 340, 867–872. [Google Scholar] [CrossRef]
Malgouyres, R.; Francés, A.R. Determining whether a simplicial 3-complex collapses to a 1-complex is NP-complete. In Proceedings of the International Conference on Discrete Geometry for Computer Imagery; Coeurjolly, D., Sivignon, I., Tougne, L., Dupont, F., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; Volume 4992, pp. 177–188. [Google Scholar] [CrossRef]
Tancer, M. Recognition of collapsible complexes is NP-complete. Discret. Comput. Geom. 2016, 55, 21–38. [Google Scholar] [CrossRef]
Paolini, G. Collapsibility to a Subcomplex of a Given Dimension is NP-Complete. Discret. Comput. Geom. 2018, 59, 246–251. [Google Scholar] [CrossRef]
Zomorodian, A.; Carlsson, G. Computing Persistent Homology. Discret. Comput. Geom. 2005, 33, 249–274. [Google Scholar] [CrossRef]
Boissonnat, J.D.; Pritam, S.; Pareek, D. Strong Collapse for Persistence. In Proceedings of the 26th Annual European Symposium on Algorithms (ESA 2018); Azar, Y., Bast, H., Herman, G., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2018; Volume 112, pp. 67:1–67:13. [Google Scholar] [CrossRef]
Boissonnat, J.D.; Pritam, S. Computing Persistent Homology of Flag Complexes via Strong Collapses. In Proceedings of the 35th International Symposium on Computational Geometry (SoCG 2019); Barequet, G., Wang, Y., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2019; Volume 129, pp. 55:1–55:15. [Google Scholar] [CrossRef]
Glisse, M.; Pritam, S. Swap, Shift and Trim to Edge Collapse a Filtration. In Proceedings of the 38th International Symposium on Computational Geometry (SoCG 2022); Goaoc, X., Kerber, M., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2022; Volume 224, pp. 44:1–44:15. [Google Scholar] [CrossRef]
Boissonnat, J.D.; Pritam, S. Edge Collapse and Persistence of Flag Complexes. In Proceedings of the 36th International Symposium on Computational Geometry (SoCG 2020); Cabello, S., Chen, D.Z., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2020; Volume 164, pp. 19:1–19:15. [Google Scholar] [CrossRef]
Boissonnat, J.D.; Dutta, K.; Dutta, S.; Pritam, S. Strong Collapse of Random Simplicial Complexes. arXiv 2023, arXiv:2301.03514. [Google Scholar] [CrossRef]
Dey, T.K.; Edelsbrunner, H.; Guha, S.; Nekhayev, D.V. Topology Preserving Edge Contraction. Publ. L’Institut Mathématique 1998, 66, 23–45. [Google Scholar]
Allili, M.; Kaczynski, T.; Landi, C. Reducing complexes in multidimensional persistent homology theory. J. Symb. Comput. 2017, 78, 61–75. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Harer, J. Computational Topology, An Introduction; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
Dey, T.K.; Fan, F.; Wang, Y. Graph Induced Complex on Point Data. In Proceedings of the Twenty-Ninth Annual Symposium on Computational Geometry; SoCG ’13; ACM: New York, NY, USA, 2013; pp. 107–116. [Google Scholar] [CrossRef]
Kurlin, V. A One-Dimensional Homologically Persistent Skeleton of an Unstructured Point Cloud in Any Metric Space. Comput. Graph. Forum 2015, 34, 253–262. [Google Scholar] [CrossRef]
Kališnik, S.; Kurlin, V.; Lešnik, D. A higher-dimensional homologically persistent skeleton. Adv. Appl. Math. 2019, 102, 113–142. [Google Scholar] [CrossRef]
Kahle, M. Topology of Random Simplicial Complexes: A Survey. In Proceedings of the Contemporary Mathematics; Tillmann, U., Galatius, S., Sinha, D., Eds.; American Mathematical Society: Providence, RI, USA, 2014; Volume 620, pp. 201–222. [Google Scholar] [CrossRef]
Elkin, Y.; Liu, D.; Kurlin, V. A fast approximate skeleton with guarantees for any cloud of points in a Euclidean space. arXiv 2020, arXiv:2007.08900. [Google Scholar] [CrossRef]
Aronshtam, L.; Linial, N.; Łuczak, T.; Meshulam, R. Collapsibility and Vanishing of Top Homology in Random Simplicial Complexes. Discret. Comput. Geom. 2013, 49, 317–334. [Google Scholar] [CrossRef]
Aronshtam, L.; Linial, N. When does the top homology of a random simplicial complex vanish? Random Struct. Algorithms 2015, 46, 26–35. [Google Scholar] [CrossRef]
Bobrowski, O.; Weinberger, S. On the vanishing of homology in random Čech complexes. Random Struct. Algorithms 2017, 51, 14–51. [Google Scholar] [CrossRef]
Iyer, S.K.; Yogeshwaran, D. Thresholds for vanishing of ‘Isolated’ faces in random Čech and Vietoris–Rips complexes. Ann. l’Institut Henri Poincaré Probabilités Stat. 2020, 56, 1869–1897. [Google Scholar] [CrossRef]
Delgado-Friedrichs, O.; Robins, V.; Sheppard, A. Skeletonization and partitioning of digital images using discrete morse theory. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 654–666. [Google Scholar] [CrossRef]
Carriere, M.; Oudot, S. Structure and stability of the one-dimensional mapper. Foudations Comput. Math. 2018, 18, 1333–1396. [Google Scholar] [CrossRef]
Ge, X.; Safa, I.; Belkin, M.; Wang, Y. Data Skeletonization via Reeb Graphs. In Proceedings of the 24th International Conference on Neural Information Processing Systems; NIPS’11; Curran Associates Inc.: Red Hook, NY, USA, 2011; pp. 837–845. Available online: https://dl.acm.org/doi/10.5555/2986459.2986553 (accessed on 9 April 2026).
Dey, T.K.; Wang, Y. Reeb graphs: Approximation and persistence. Discret. Comput. Geom. 2013, 49, 46–73. [Google Scholar] [CrossRef]
Beltramo, G.; Skraba, P. Persistent Homology in ℓ_∞ Metric. arXiv 2020, arXiv:2008.02071. [Google Scholar] [CrossRef]
Singh, G.; Memoli, F.; Carlsson, G. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. In Proceedings of the Eurographics Symposium on Point-Based Graphics; Botsch, M., Pajarola, R., Chen, B., Zwicker, M., Eds.; The Eurographics Association: Eindhoven, The Netherlands, 2007; pp. 91–100. [Google Scholar] [CrossRef]
Jaromooczyk, J.W.; Toussaint, G.T. Relative neighborhood graphs and their relatives. Proc. IEEE 1992, 80, 1502–1517. [Google Scholar] [CrossRef]
Munch, E.; Wang, B. Convergence between Categorical Representations of Reeb Space and Mapper. In Proceedings of the 32nd International Symposium on Computational Geometry; Fekete, S., Lubiw, A., Eds.; Leibniz International Proceedings in Informatics (LIPIcs); Dagstuhl Publishing: Dagstuhl, Germany, 2016; Volume 51, pp. 53.1–53.15. [Google Scholar] [CrossRef]
Toussaint, G.T. The relative neighbourhood graph of a finite planar set. Pattern Recognit. 1980, 12, 261–268. [Google Scholar] [CrossRef]
Matula, D.W.; Sokal, R.R. Properties of Gabriel graphs relevant to geographic variation research and the clustering of points in the plane. Geogr. Anal. 1980, 12, 205–222. [Google Scholar] [CrossRef]
Bose, P.; Devroye, L.; Evans, W.; Kirkpatrick, D. On the spanning ratio of gabriel graphs and β-skeletons. In Proceedings of the Latin American Symposium on Theoretical Informatics; Rajsbaum, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2002; pp. 479–493. [Google Scholar] [CrossRef]
Wang, W.; Li, X.Y.; Moaveninejad, K.; Wang, Y.; Song, W.Z. The spanning ratio of β-skeletons. In Proceedings of the Canadian Conference on Computational Geometry (CCCG), Halifax, NS, Canada, 11–13 August 2003; pp. 35–38. [Google Scholar]
Hiyoshi, H. Greedy Beta-Skeleton in Three Dimensions. In Proceedings of the 4th International Symposium on Voronoi Diagrams in Science and Engineering, ISVD 2007, Glamorgan, UK, 9–11 July 2007; pp. 101–109. [Google Scholar] [CrossRef]
Adamatzky, A. On excitable β-skeletons. J. Comput. Sci. 2010, 1, 175–186. [Google Scholar] [CrossRef][Green Version]
Adamatzky, A. On Growing Connected β-Skeletons. Comput. Geom. Theory Appl. 2013, 46, 805–816. [Google Scholar] [CrossRef]
Adamatzky, A. How β-skeletons lose their edges. Inf. Sci. 2014, 254, 213–224. [Google Scholar] [CrossRef]
Mulzer, W.; Rote, G. Minimum-Weight Triangulation is NP-Hard. J. ACM 2008, 55, 1–29. [Google Scholar] [CrossRef]
Cheng, S.W.; Xu, Y.F. On β-skeleton as a subgraph of the minimum weight triangulation. Theor. Comput. Sci. 2001, 262, 459–471. [Google Scholar] [CrossRef]
Cheng, S.W.; Xu, Y.F. Approaching the Largest β-Skeleton within a Minimum Weight Triangulation. In Proceedings of the Twelfth Annual Symposium on Computational Geometry; SCG ’96; ACM: New York, NY, USA, 1996; pp. 196–203. [Google Scholar] [CrossRef]
Mishra, A.; Motta, F.C. Stability and machine learning applications of persistent homology using the Delaunay-Rips complex. Front. Appl. Math. Stat. 2023, 9, 1179301. [Google Scholar] [CrossRef]
Brown, R.A. Building a Balanced k-d Tree in O(kn log n) Time. arXiv 2020, arXiv:1410.5420. [Google Scholar]
Klee, V. Convex Polytopes and Linear Programming; Technical Report; Boeing Scientific Research Labs: Seattle, WA, USA, 1964. [Google Scholar]
Sumner, R.W.; Popovic, J. Mesh Data from Deformation Transfer for Triangle Meshes. ACM Trans. Graph. (TOG) 2004, 23, 399–405. [Google Scholar] [CrossRef]
Beichel, R.R.; Glenny, R.W.; Bauer, C.; Krueger, M.A. Lung Anatomy + Particle Deposition (lapd) Mouse Archive; University of Iowa: Iowa City, IA, USA, 2019. [Google Scholar] [CrossRef]
Chazal, F.; Fasy, B.T.; Lecci, F.; Michel, B.; Rinaldo, A.; Wasserman, L. Subsampling Methods for Persistent Homology. In Proceedings of the International Conference on Machine Learning, ICML 2015, Lille, France, 6–11 July 2015; pp. 2143–2151. [Google Scholar]
Moitra, A.; Malott, N.; Wilsey, P.A. Cluster-based Data Reduction for Persistent Homology. In Proceedings of the 2018 IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA, 10–13 December 2018; pp. 327–334. [Google Scholar] [CrossRef]
Malott, N.O.; Sens, A.; Wilsey, P.A. Topology Preserving Data Reduction for Computing Persistent Homology. In Proceedings of the Int. Workshop on Big Data Reduction, Atlanta, GA, USA, 10–13 December 2020; pp. 2681–2690. [Google Scholar] [CrossRef]
Sens, A. Topology Preserving Data Reductions for Computing Persistent Homology. Master’s Thesis, Department of Electrical Engineering and Computer Science, University of Cincinnati, Cincinnati, OH, USA, 2021. Available online: https://etd.ohiolink.edu/acprod/odb_etd/ws/send_file/send?accession=ucin1627658247850018&disposition=inline (accessed on 9 April 2026).
Researchers at The High Performance Computing Laboratory. LHF: Lightweight Homology Framework. 2020. Available online: https://github.com/wilseypa/lhf (accessed on 9 April 2026).
Carriere, M.; Cuturi, M.; Oudot, S. Sliced Wasserstein Kernel for Persistence Diagrams. arXiv 2017, arXiv:1706.03358. [Google Scholar] [CrossRef]
Rabin, J.; Peyré, G.; Delon, J.; Bernot, M. Wasserstein Barycenter and Its Application to Texture Mixing. In Proceedings of the Scale Space and Variational Methods in Computer Vision; Bruckstein, A.M., ter Haar Romeny, B.M., Bronstein, A.M., Bronstein, M.M., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2012; Volume 6667, pp. 435–446. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Letscher, D.; Zomorodian, A. Topological Persistence and Simplification. In Proceedings of the 41st Annual Symposium on Foundations of Computer Science, FOCS ’00, Redondo Beach, CA, USA, 12–14 November 2000; pp. 454–463. [Google Scholar] [CrossRef]
Adams, H.; Emerson, T.; Kirby, M.; Neville, R.; Peterson, C.; Shipman, P.; Chepushtanova, S.; Hanson, E.; Motta, F.; Ziegelmeier, L. Persistence Images: A Stable Vector Representation of Persistent Homology. J. Mach. Learn. Res. 2017, 18, 218–252. Available online: https://jmlr.org/papers/v18/16-337.html (accessed on 9 April 2026).
Bendich, P.; Marron, J.S.; Miller, E.; Pieloch, A.; Skwerer, S. Persistent homology analysis of brain artery trees. Ann. Appl. Stat. 2016, 10, 198–218. [Google Scholar] [CrossRef]
Carrière, M. Cover Complexes User Manual—Gudhi Documentation. Available online: https://gudhi.inria.fr/python/3.6.0/nerve_gic_complex_user.html (accessed on 9 April 2026).

Figure 1. Illustration of the

β

-Sparsification exclusion region for an acute 2-simplex at some values of

β

.

Figure 1. Illustration of the

β

-Sparsification exclusion region for an acute 2-simplex at some values of

β

.

Figure 2. An example illustration of the

β_{l i n e}

vectors,

β_{c e n t e r}

points, and hyper-spheres used by the

β

-Criterion function to define if a d-simplex is valid. For a simplex

σ_{i}

, the dark shaded area illustrates the

β

-Sparsification exclusion region for

σ_{i}

. Thus, if any point from

p \in {P ∖ σ_{i} . v e r t i c e s}

lies in the shaded area, the simplex

σ_{i}

is discarded; otherwise,

σ_{i}

is added to the sparse d-simplex set.

Figure 2. An example illustration of the

β_{l i n e}

vectors,

β_{c e n t e r}

points, and hyper-spheres used by the

β

-Criterion function to define if a d-simplex is valid. For a simplex

σ_{i}

, the dark shaded area illustrates the

β

-Sparsification exclusion region for

σ_{i}

. Thus, if any point from

p \in {P ∖ σ_{i} . v e r t i c e s}

lies in the shaded area, the simplex

σ_{i}

is discarded; otherwise,

σ_{i}

is added to the sparse d-simplex set.

Figure 3. The circum-center positioning out of the d-simplex requires directional adjustment for

β

-centers corresponding to facet with largest circum-d-sphere.

β_{l i n e}

(red line) outside of the d-simplex indicates conditional evaluation at Line 7 of Algorithm 1.

Figure 3. The circum-center positioning out of the d-simplex requires directional adjustment for

β

-centers corresponding to facet with largest circum-d-sphere.

β_{l i n e}

(red line) outside of the d-simplex indicates conditional evaluation at Line 7 of Algorithm 1.

Figure 4. Enumerating simplices with a vertex

p_{j} \in P

that are contained within the

ϵ_{b a l l}

of

p_{i}

; simplices enumerated are depicted in green; non-enumerated simplices are depicted in red.

Figure 4. Enumerating simplices with a vertex

p_{j} \in P

that are contained within the

ϵ_{b a l l}

of

p_{i}

; simplices enumerated are depicted in green; non-enumerated simplices are depicted in red.

Figure 5. 1-skeleton size for the sampled Lion₂₀₀ data set at different sparsification factors

β

.

Figure 5. 1-skeleton size for the sampled Lion₂₀₀ data set at different sparsification factors

β

.

Figure 6. Flamingo, Elephant, Camel, and Lion test data.

Figure 7. Reduced mouse lung point cloud for M02.

Figure 8.

β

-sparsified 1-Skeleton size for the set of d-spheres in

R^{2} - R^{6}

(N = 50, noise =

0.2

). (Note: a d-sphere is a sphere in

R^{d + 1}

).

Figure 8.

β

-sparsified 1-Skeleton size for the set of d-spheres in

R^{2} - R^{6}

(N = 50, noise =

0.2

). (Note: a d-sphere is a sphere in

R^{d + 1}

).

Figure 9.

β

-sparsified 1-Skeleton size for LAPD and TM datasets.

Figure 9.

β

-sparsified 1-Skeleton size for LAPD and TM datasets.

Figure 10.

β

-sparsified complex size for the Lion₂₀₀ data set.

Figure 10.

β

-sparsified complex size for the Lion₂₀₀ data set.

Figure 11. Dimension-wise Sliced-Wasserstein distance (compared to the VR and Delaunay Complex) for the set of d-spheres in

R^{2} - R^{5}

(50 points with noise of 0.2). Higher-dimensional data tends to preserve lower-order homological features as

β

increases.

Figure 11. Dimension-wise Sliced-Wasserstein distance (compared to the VR and Delaunay Complex) for the set of d-spheres in

R^{2} - R^{5}

(50 points with noise of 0.2). Higher-dimensional data tends to preserve lower-order homological features as

β

increases.

Figure 12. Results for the LAPD dataset.

Figure 13. Results for the TM dataset.

Figure 14. Sliced Wasserstein distances for the Lion₂₀₀ data set comparing the

β

-sparsified complex with both the Delaunay complex and the VR complex.

Figure 14. Sliced Wasserstein distances for the Lion₂₀₀ data set comparing the

β

-sparsified complex with both the Delaunay complex and the VR complex.

Table 1. Summary of the mouse lung data attributes and features.

Data	Sex	Strain	Outlet Areas	Branches
M02	F	$B 6 C 3 F 1$	1817	1556
M07	M	$C 57 B L / 6$	1680	1564
M08	F	$C 57 B L / 6$	1995	1718
M09	F	$B A L B / C$	1810	1246
M10	F	$B A L B / C$	1797	1339
M11	M	$B 6 C 3 F 1$	1990	1449
M13	F	$C D$ -1	1866	1613

Table 2. Data sets used for experimental results and their

β_{c r i t i c a l}^{H_{i}}

values. Some data is sampled using k-means++ to reduce the size of the data (columns Original and Sampled show these respective sizes; non-sampled data does not show a sampled count). The

β_{c r i t i c a l}^{H_{i}}

values are discussed in Section 5.3.

Table 2. Data sets used for experimental results and their

β_{c r i t i c a l}^{H_{i}}

values. Some data is sampled using k-means++ to reduce the size of the data (columns Original and Sampled show these respective sizes; non-sampled data does not show a sampled count). The

β_{c r i t i c a l}^{H_{i}}

values are discussed in Section 5.3.

Data	Original	Sampled	$β_{critical}^{H_{0}}$	$β_{critical}^{H_{1}}$	$β_{critical}^{H_{2}}$
Lung Data Sets
M02	603,768	20,126	∼2.33	∼1.52	∼1.175
M07	551,640	18,388	∼2.52	∼1.61	∼1.217
M08	622,041	20,735	∼2.36	∼1.44	∼1.175
M09	602,081	20,070	∼2.28	∼1.43	∼1.172
M10	555,937	18,532	∼2.47	∼1.49	∼1.213
M11	591,759	19,726	∼2.38	∼1.46	∼1.185
M13	514,427	17,148	∼2.73	∼1.83	∼1.235
Triangulated Mesh Data Sets
Lion	4999		∼4.21	∼1.82	∼1.35
Camel	21,886		∼2.23	∼1.49	∼1.29
Flamingo	26,906		∼2.92	∼1.42	∼1.31
Elephant	42,320		∼1.62	∼1.21	∼1.28
Scaled Triangulated Mesh Data Sets
Lion₂₀₀	4999	200	∼2.16	∼1.45	∼1.32

Table 3. Average run-time (seconds) and approximate memory use (MB) to build VR, Delaunay, and

β

-Sparsified complexes computed from 5 trials. OOM: Out Of Memory.

Table 3. Average run-time (seconds) and approximate memory use (MB) to build VR, Delaunay, and

β

-Sparsified complexes computed from 5 trials. OOM: Out Of Memory.

Data	VR		Delaunay		$β = 1$		$β = 1.5$
Data	Sec	MB	Sec	MB	Sec	MB	Sec	MB
Lung Data Sets
M02	—	OOM	$822.86$	$4117.81$	$1168.67$	$3342.76$	$1534.56$	$3339.91$
M07	—	OOM	$662.97$	$3506.95$	$1127.43$	$2806.61$	$1590.82$	$2804.74$
M08	—	OOM	$858.57$	$4329.88$	$1340.99$	$3537.85$	$1957.30$	$3538.10$
M09	—	OOM	$799.36$	$4333.86$	$1320.82$	$3331.89$	$1880.12$	$3329.75$
M10	—	OOM	$653.55$	$3571.31$	$863.17$	$2849.99$	$1092.32$	$2850.10$
M11	—	OOM	$800.80$	$4147.63$	$1247.32$	$3211.46$	$1678.80$	$3209.79$
M13	—	OOM	$595.48$	$3089.32$	$1014.42$	$2449.84$	$1411.58$	$2447.86$
Triangulated Mesh Data Sets
Lion	—	OOM	$1.89$	$203.38$	$9.55$	$222.38$	$15.61$	$221.84$
Camel	—	OOM	$31.63$	$4108.71$	$101.65$	$3845.21$	$226.43$	$3835.74$
Flamingo	—	OOM	$46.77$	130,576.27	$202.00$	$5810.29$	$465.35$	$5802.56$
Elephant	—	OOM	$111.27$	130,535.75	$304.67$	$3843.82$	$710.80$	$14,238.83$
Scaled Triangulated Mesh Data Sets
Lion₂₀₀	$56.67$	$5.42$	$0.05$	$5.52$	$0.13$	$5.77$	$0.16$	$5.41$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Singh, R.P.; Malott, N.O.; Rafeek, R.; Wilsey, P.A. Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes. Mathematics 2026, 14, 1339. https://doi.org/10.3390/math14081339

AMA Style

Singh RP, Malott NO, Rafeek R, Wilsey PA. Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes. Mathematics. 2026; 14(8):1339. https://doi.org/10.3390/math14081339

Chicago/Turabian Style

Singh, Rohit P., Nicholas O. Malott, Raihan Rafeek, and Philip A. Wilsey. 2026. "Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes" Mathematics 14, no. 8: 1339. https://doi.org/10.3390/math14081339

APA Style

Singh, R. P., Malott, N. O., Rafeek, R., & Wilsey, P. A. (2026). Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes. Mathematics, 14(8), 1339. https://doi.org/10.3390/math14081339

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes

Abstract

1. Introduction

2. Background

3. Related Work

4. Overview of the Approach

4.1. $β$ -Criterion

4.1.1. Positioning $β_{c e n t e r}$ for $0 \leq β \leq 1$

4.1.2. Positioning $β_{c e n t e r}$ for $β \geq 1$

4.1.3. Geometric Interpretation of $β$ -Criterion

4.2. Enumeration of d-Simplices

4.3. The Family of $β$ -Sparsified Complexes

4.4. Filtered Simplicial Complex

5. Experimental Results

5.1. $β$ -Sparsified Complex: Space Analysis

5.2. $β$ -Sparsified Complex: Time Analysis

5.3. The Topological Impact of $β$ -Sparsification

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Topological Study of β-Sparsified d-Uniform Hypergraph-Based Simplicial Complexes

Abstract

1. Introduction

2. Background

3. Related Work

4. Overview of the Approach

4.1. β -Criterion

4.1.1. Positioning β c e n t e r for 0 ≤ β ≤ 1

4.1.2. Positioning β c e n t e r for β ≥ 1

4.1.3. Geometric Interpretation of β -Criterion

4.2. Enumeration of d-Simplices

4.3. The Family of β -Sparsified Complexes

4.4. Filtered Simplicial Complex

5. Experimental Results

5.1. β -Sparsified Complex: Space Analysis

5.2. β -Sparsified Complex: Time Analysis

5.3. The Topological Impact of β -Sparsification

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1. $β$ -Criterion

4.1.1. Positioning $β_{c e n t e r}$ for $0 \leq β \leq 1$

4.1.2. Positioning $β_{c e n t e r}$ for $β \geq 1$

4.1.3. Geometric Interpretation of $β$ -Criterion

4.3. The Family of $β$ -Sparsified Complexes

5.1. $β$ -Sparsified Complex: Space Analysis

5.2. $β$ -Sparsified Complex: Time Analysis

5.3. The Topological Impact of $β$ -Sparsification