1. Introduction
Data analysis is the process of inspecting, cleaning, transforming, and modeling data to extract useful information and support decision-making. However, handling data with many features—especially high-dimensional data—remains one of its central challenges. TDA emerged to address this difficulty by using tools from topology to study the “geometry” of complex, high-dimensional, nonlinear, heterogeneous, and noisy data, including shape, structure, and connectivity. Algebraic topology is a core component of this approach, and its contribution to modern mathematics is widely recognized [
1]. In many applications, data exhibit geometric and topological patterns that are difficult to capture with classical statistical models or purely geometric approaches. TDA addresses this gap by identifying structures that are robust to perturbations and invariant under continuous deformations. It does not replace classical data analysis; rather, the two approaches are complementary.
Table 1 presents a basic comparison that highlights the strengths and weaknesses of each field.
At the heart of TDA is persistent homology, which tracks the appearance and disappearance of topological features in a filtered simplicial complex and is supported by a substantial foundational literature [
2,
3,
4]. It yields descriptors such as barcodes and persistence diagrams that summarize the geometry underlying the data in a concise way. Additional techniques—including Mapper (a graph-based summary built from a cover and a clustering step), zigzag persistence, multiparameter persistence, and graph and cubical variants—expand the TDA toolbox for analyzing dynamical systems, networks, images, and time series data. In particular, alpha complexes and cubical persistent homology are practical complements to Vietoris–Rips constructions, respectively offering geometry-faithful sparsification for Euclidean point clouds and native support for image or voxel data.
Table 2 presents the main axes of this toolbox, while
Table 3 provides a comparative taxonomy of representative methods and
Figure 1 provides a compact end-to-end pipeline view from raw data to downstream learning and interpretation, which is discussed in detail in
Section 4.
The literature collected for this review reflects an especially active period in the development and application of TDA. Recent works employ TDA to model financial markets and detect structural changes in time series [
5,
6,
7,
8], to identify clinically meaningful patient subgroups in precision medicine [
9,
10], to characterize fluid flows, material roughness, and other physical systems [
11,
12,
13], and to assess resilience of infrastructure networks [
14]. In parallel, there is accelerating integration of TDA with advanced learning architectures, including graph neural networks, interpretable DL methods, and federated or incremental learning frameworks [
15,
16,
17,
18,
19].
The objectives of this review are threefold. First, we discuss the mathematical and algorithmic foundations of TDA in an expository and pedagogical manner. Second, we synthesize recent methodological advances, including computational improvements, statistical inference, and links to ML. Third, we provide a guided survey of applications in science, engineering, economics, biomedicine, and AI, highlighting emerging themes and open problems. The material assumes familiarity with basic linear algebra and ML concepts but does not require prior background in algebraic topology. This review is intended to be useful to both newcomers and experts seeking an update on TDA in 2024 and 2025.
To ensure consistent terminology throughout, we use TDA after its first occurrence, write Vietoris–Rips uniformly for complexes and filtrations, and use time series as the default wording.
The survey contribution is as follows:
A curated and up-to-date synthesis of TDA developments, with primary emphasis on work published in 2024–2025.
A unified presentation of mathematical foundations and computational advances, linking formal concepts to practical implementation choices.
A cross-domain survey of applications (finance, biomedicine, engineering, dynamical systems, and AI), highlighting recurring design patterns.
A pipeline-oriented perspective that clarifies where modeling decisions are made and how they affect robustness, interpretability, and performance.
A critical discussion of open challenges—including scalability, statistical validity, and integration with modern AI—together with actionable future directions.
A Unifying Design Framework for TDA Systems
This survey introduces a unifying five-stage pipeline that clarifies how TDA-based systems are designed, implemented, and integrated with ML workflows.
Figure 1 provides a compact visual summary of this stage-wise design view, while
Table 3 summarizes representative methodological trade-offs. The framework makes modeling decisions explicit at each stage—from data representation and filtration construction to topological summarization, feature vectorization, and downstream learning—so that methodological assumptions can be traced and justified. It also enables systematic comparison of alternative design choices (e.g., complex family, filtration function, representation map, and learner) in terms of robustness, interpretability, and computational cost across application domains.
To position this manuscript against representative prior syntheses,
Table 4 compares scope, years covered, domains, methods, and key strengths.
Topological data analysis provides a complementary way to study complex data by focusing on shape, connectivity, and multiscale structure. In practical terms, it acts as a robust feature-extraction layer that can reveal patterns often missed by purely linear tools. The strongest results usually appear when TDA is combined with domain knowledge and modern learning models rather than used in isolation. The next sections explain the core mathematics, computational workflow, and evidence from applications.
2. Mathematical Background
The mathematical framework of TDA is inextricably linked with algebraic topology. Here, we present the main concepts commonly used in related studies. For more details, we refer to standard expositions and computational surveys [
2,
3,
4]. This section introduces the core vocabulary needed to read TDA work: simplicial complexes, filtrations, homology, persistence, and Mapper. The practical idea is to observe data across multiple scales and retain features that remain stable as scale changes. Persistent summaries such as barcodes and diagrams convert complicated geometry into compact objects that can be compared across datasets. These concepts are flexible enough to support analysis of point clouds, images, networks, and time series.
2.1. Simplicial Complexes and Filtrations
A central objective in TDA is to approximate a dataset by a discrete combinatorial object that faithfully captures its underlying topology. This is achieved through simplicial complexes, which provide a flexible and computationally tractable representation of the geometric structure.
Definition 1 (Abstract simplicial complex)
. Let S be a finite set of vertices. An abstract simplicial complex K on S is a collection of subsets such that, whenever and , we also have . The elements of K are called simplices. A simplex with vertices is called a k-simplex and has dimension k. The dimension of K, denoted by , is Definition 2 (Geometric simplicial complex). A geometric simplex is the convex hull of a finite affinely independent set of points, and a face of such a simplex is the convex hull of a nonempty subset of its vertices. A geometric simplicial complex is a family P of geometric simplices satisfying the following two conditions:
- (i)
if and σ is a face of τ, then ;
- (ii)
if , then is either empty or a face of both simplices.
Throughout this section, geometric complexes are considered in an ambient Euclidean space , with subspaces endowed with the induced topology.
In applications, the vertex set S typically consists of sampled data points equipped with a chosen metric . This metric fixes the pairwise distances used to build complexes and filtrations. Common constructions include the following.
Definition 3 (Nerve of a cover)
. Let X be a topological space and let be a finite cover of X, meaning . The nerve is the abstract simplicial complex with vertex set I in which spans a k-simplex if and only ifThus, a simplex in the nerve records that the corresponding cover elements have a common intersection. The nerve theorem is a central result because it allows complex, high-dimensional shapes to be replaced by simpler combinatorial structures under appropriate cover conditions. Recent work has investigated several versions and extensions of this theorem [
21].
Theorem 1 (Nerve theorem). If is a finite good cover of a topological space X—that is, every nonempty finite intersection is contractible—then the underlying space of the nerve is homotopy equivalent to X.
This is the standard nerve theorem; for a modern unified treatment and related variants, see Bauer et al. [
21].
Other common constructions include:
The Vietoris–Rips complex, where a set of points forms a simplex if all pairwise distances are at most a threshold .
The Čech complex, where simplices correspond to nonempty intersections of closed balls of radius .
Delaunay triangulations, in which a simplex with vertices in a finite point set P is included when its vertices lie on the boundary of a ball whose interior contains no point of P; the resulting triangulation covers the convex hull of P under standard general-position assumptions.
Alpha complexes, derived from Delaunay triangulations and sensitive to local geometry; for a scale , they retain Delaunay simplices whose circumscribing balls have a radius of at most .
Cubical complexes, defined as collections of multidimensional cubes, such as vertices, edges, squares, and voxels, together with their faces.
For scales
, varying
produces a nested sequence of complexes,
called a
filtration. In Vietoris–Rips filtrations, for example,
denotes the Vietoris–Rips complex formed at distance threshold
. These filtrations are especially common because they depend only on pairwise distances, although they may grow exponentially in size [
2,
3]. Formally, Vietoris–Rips filtrations are nested collections of Vietoris–Rips complexes indexed by an increasing scale-distance parameter. Recent work has focused on refining these constructions to better reflect intrinsic geometric relationships, such as the topological overlap-based enrichments introduced in [
10], which improve performance in high-dimensional genomic applications.
Table 5 summarizes common simplicial and cubical constructions in TDA, describing how they are built, where they are used, and their typical size behavior.
2.2. Homology
Homology provides algebraic invariants that detect connected components, cycles, and holes. Given a simplicial complex
K and a coefficient field
, the chain group
is the vector space of formal finite linear combinations of
k-simplices with coefficients in
[
2,
3]. For each
k, the simplicial boundary map
sends an oriented
k-simplex to the alternating sum of its
-dimensional faces. The boundary maps satisfy the chain-complex condition
which permits the definition of the
k-th homology group as
The ranks of these groups, the
Betti numbers , count
k-dimensional topological features: connected components (
), loops or tunnels (
), voids or enclosed cavities (
), and so forth. Homology can also be defined for cubical complexes, which arise naturally in image analysis, often yielding computational benefits. For example, ref. [
22] uses cubical persistent homology and statistical methods to classify plant images.
2.3. Persistent Homology
Persistent homology extends classical homology by tracking the evolution of topological features across a filtration. Persistent homology extends classical homology by tracking the evolution of topological features across a filtration [
2,
3].
Definition 4 (Persistent homology, barcode, and persistence diagram)
. Let be a filtration of simplicial complexes. For , the inclusion map induces a linear map on k-homology groups for each dimension k:The k-th persistent homology group is the image of this induced map, and the corresponding k-th persistent Betti number is its rank. A homology class that appears (is born) at scale α and disappears (dies) at scale β is represented by an interval . A barcode is the multiset of such intervals, and the interval length records the lifetime of the corresponding topological feature. Equivalently, a persistence diagram is the multiset of points that record the same birth and death scales. Persistence diagrams summarize the multiscale topology of data and enjoy fundamental stability theorems, guaranteeing robustness under perturbations of the input. These diagrams can be vectorized into forms suitable for statistical learning, such as persistence landscapes, persistence images, Betti curves, and entropy measures. Algebraically, a one-parameter persistence module over a field
is a sequence of vector spaces
together with linear maps
for
that are compatible with composition. It is of finite type when the vector spaces are finite-dimensional and only finitely many birth or death events occur. The structure theorem for such modules explains why persistent homology can be represented by barcodes or persistence diagrams; standard proofs and algorithmic treatments are given in [
2].
Theorem 2 (Structure theorem for one-parameter persistence). Any finite-type -indexed persistence module over a field decomposes as a direct sum of interval modules.
Recent extensions include multiparameter persistence for data with several filtering functions [
23], zigzag persistence for filtrations that grow and shrink over time, and Morse-theoretic approaches for structured signals such as corrosion-related ultrasonic waveforms [
24]. From a purely algebraic perspective, Bjerkevik develops a stability theory for decompositions of multiparameter persistence modules that extends the one-parameter behavior described by the structure theorem [
25]. Dey and Xin present an efficient algorithm for computing decompositions of finite multiparameter persistence modules [
26]. Analytical work on random topology provides null models for interpreting topological signal strength in high dimensions [
27].
2.4. Mapper and Related Constructions
Reeb graphs are useful TDA tools and are often regarded as precursors of the widely used Mapper algorithm. They visualize the core structure of an image, object, or dataset. Let
X be a topological space and let
be a continuous function. Define an equivalence relation on
X by
The
Reeb graph is the quotient space
equipped with the induced function
defined by
, where
is the equivalence class of
x.
Recent work uses Reeb graphs to study medical artifacts and quality criteria in cerebral vascular trees affected by noise and subtle topological changes. Lepaire et al. introduce a Local-to-Global Reeb Graph (LGRG) variant [
28], while Rahman et al. introduce Gradient-Aware Shortest Path (GASP), an algorithm for producing Reeb graph visualizations that satisfy boundary constraints, compactness, and gradient alignment [
29].
The
Mapper algorithm can be viewed as an applied nerve construction built from the pullback cover of a filtered topological space. Suppose that
is continuous and that
is a finite open cover of
. The pullback cover of
X is
This cover is refined by splitting each set
into its path components; denote the resulting cover by
. The Mapper complex associated with
is the nerve of
. Consequently, Mapper provides a graph-based representation of data by combining:
- 1.
a filter (or lens) function;
- 2.
a cover of the filter range;
- 3.
clustering within the preimage of each cover element.
The resulting graph captures connectivity between clusters and provides interpretable summaries of complex structures. Mapper has been especially successful in biomedical data analysis. For example, Loughrey et al. [
9] introduce a hotspot detection framework for Mapper graphs that automates parameter selection and identifies clinically meaningful cancer subgroups. Kondapalli and Azarudeen [
30] use Mapper to identify topological features of cancer cells that help distinguish healthy profiles from malignant ones. In addition, Bui et al. [
31] use Mapper to visualize the evolution of neural network weights. A comprehensive review of Mapper and its applications is provided in [
32].
Related constructions include:
Ball Mapper, useful for point clouds with heterogeneous density and applied in credit risk and supply chain finance [
33].
Extensions such as V-Mapper for data with velocity fields [
34] and Fuzzy-Mapper (F-Mapper), which handles cover intervals with an arbitrary overlap percentage [
35].
Euler-characteristic-based descriptors that integrate combinatorial counts with persistent information [
36]. For a
k-dimensional simplicial complex
K, let
denote the number of
n-simplices. The
Euler characteristic is
Equivalently, when homology is computed over a field, it can be written in terms of Betti numbers as
(see, for example, [
37]).
These tools extend the reach of TDA to settings where filtrations alone may not capture the full shape of the data.
This section introduced the core vocabulary needed to read TDA work: simplicial complexes, filtrations, homology, persistence, and Mapper. The practical idea is to observe data across multiple scales and retain features that remain stable as scale changes. Persistent summaries such as barcodes and diagrams convert complicated geometry into compact objects that can be compared across datasets. These concepts are flexible enough to support analysis of point clouds, images, networks, and time series.
3. Algorithmic and Computational Advances
Theoretical foundations are a major strength of TDA for understanding complex datasets through algebraic-topological principles. At the same time, TDA has developed into a practical, scalable, and noise-robust framework for data analysis. Algorithmic and computational advances are central to this applied perspective.
3.1. Matrix Reduction and Cohomological Algorithms
The standard algorithm for computing persistent homology reduces a boundary matrix via column operations. Although worst-case complexity is cubic in the number of simplices, practical performance is often better, and many optimizations reduce both runtime and memory usage. Cohomology-based methods, such as persistent cohomology, are often faster in practice due to their dual formulation via cochain complexes. A concise review of persistent homology and persistent cohomology appears in [
38]. Applications include the description of nongeometric information in molecular structures [
39], while ref. [
40] presents a new distributed algorithm for persistent cohomology.
Several strategies enhance large-scale computation:
Clearing and compression techniques accelerate matrix reduction by eliminating unnecessary operations.
Chunking partitions the boundary matrix to improve cache performance and parallelization.
Ordering heuristics select simplex orderings that reduce fill-in and improve algorithmic stability.
Beyond classical matrix reduction, gradient-based optimization for functions defined on persistence diagrams has advanced significantly. For instance, Leygonie et al. propose gradient sampling methods for stratifiable objective functions derived from extended persistent homology over lower-star filtrations [
41]. These methods enable more direct integration of persistent homology into optimization pipelines.
Furthermore, theoretical work on the sensitivity and stability of persistence-based descriptors has enabled the development of differentially private TDA mechanisms with near-optimal accuracy [
42]. Complementary approaches, such as Euler-characteristic-based transforms for compressed topological signals [
36], demonstrate that topological information can be extracted efficiently in low-memory environments.
3.2. Data Reduction and Stability
In TDA, stability is fundamental because it ensures that small perturbations of the input data produce small changes in the resulting topological summaries. Thus, if a dataset is slightly affected by noise, its persistence diagram should remain close to the diagram of the original dataset. This principle motivates stability theorems for filtrations, tame functions, and Lipschitz functions. For example, Atienza et al. study conditions under which persistent homology and persistent entropy remain stable [
43]. A representative statement is the following.
Stability Theorem: Let K be a simplicial complex and let be monotone functions. For each dimension p, the bottleneck distance between the persistence diagrams and is bounded above by the -distance between the functions:
In addition, data reduction is essential for scaling TDA to modern datasets, which may contain millions of points or high-dimensional embeddings. Choi et al. introduce the
Characteristic Lattice Algorithm (CLA), a principled reduction method that preserves both geometric and topological structure while providing stability bounds on the bottleneck distance between barcodes before and after reduction [
44]. Such techniques significantly decrease computational burden without sacrificing essential topological information.
Data reduction also arises naturally in time-delay embeddings of time series. Sliding-window embeddings can produce large point clouds, making parameter selection (particularly the time delay) crucial. Myers et al. propose using persistent homology to select optimal delay parameters for permutation entropy calculations, replacing ad hoc choices with a topologically grounded criterion [
45].
3.3. Enhanced Complexes and Overlapping Measures
Recent work has focused on modifying simplicial complexes to better reflect underlying data geometry. Mashatola et al. [
10] introduce
topological overlapping measures that enrich Vietoris–Rips complexes by incorporating shared-neighborhood information. In high-dimensional genomic settings, such enhancements yield persistent features that are more discriminative and more stable under noise.
Other methodological developments include:
Expected-topology analysis, which studies the typical behavior of persistent diagrams for random point clouds and offers benchmarks for assessing significance [
27].
Shape-aware complex constructions that embed additional geometric information, improving sensitivity in engineered systems and biological structures [
46].
Cubical complexes for voxel and pixel data, which provide computational benefits for image-based TDA, as demonstrated in biomedical and material imaging contexts [
22,
47].
3.4. Statistical Limits and Inference
Despite its success, the statistical foundations of TDA are still under development. Vishwanath et al. demonstrate that persistent homology does not always provide sufficient statistics for inference, identifying conditions under which topological summaries are informative and models where they are fundamentally insufficient [
48]. This work underscores the importance of understanding the theoretical limits of TDA-based inference.
Complementary contributions address inference and simulation, particularly in the analysis of dynamic dependence networks, such as those arising in neuroscience. El-Yaagoubi et al. introduce simulation frameworks for generating multivariate time series with known topological characteristics, enabling hypothesis testing and confidence-set construction for persistent homology applied to evolving networks [
49,
50].
3.5. Quantum Algorithms and Complexity
Quantum computation offers a potential pathway to accelerate persistent homology, particularly for datasets with large ambient dimensions. Berry et al. refine quantum TDA algorithms by improving Dicke-state preparation and projector-based eigenvalue estimation [
51]. Their results show the possibility of superpolynomial speedups for structured instances with large Betti numbers.
However, complexity-theoretic work by Schmidhuber et al. reveals fundamental limitations: computing exact Betti numbers is #P-hard and even multiplicative approximation remains NP-hard in favorable regimes [
52]. These results indicate that quantum speedups, while possible, are restricted to specific data regimes and cannot generally overcome worst-case combinatorial complexity. Hybrid quantum–classical TDA pipelines remain a promising but bounded open direction.
Algorithmic progress is what makes TDA usable on real datasets rather than only on toy examples. Improvements in reduction methods, cohomology-based computation, sparsification, and statistical tooling help reduce runtime and improve reliability. At the same time, complexity results show that some computational barriers are fundamental, even with quantum ideas. For practitioners, the key is to balance scalability and accuracy while documenting computational choices clearly.
4. A Unifying Pipeline Taxonomy for TDA-Based Systems
Across domains, most TDA systems follow a common computational pattern. To make the literature comparable, we organize TDA-based approaches into a five-stage pipeline, with optional feedback loops between stages.
Figure 2 shows the TDA workflow of this five-stage procedure. The key differences between papers typically arise from (i) how the raw data are encoded prior to topology, (ii) how the filtration is designed, and (iii) how topological information is represented for learning and decision-making.
The first stage converts domain data into a geometric or combinatorial object on which topology can be computed. Common choices include point clouds (e.g., measurements or embeddings), graphs (e.g., interaction networks), and images or fields (e.g., grayscale images or scalar fields). For time series, state-space reconstructions such as time-delay embeddings or sliding-window point clouds are typical, while for networks, one may work directly with weighted graphs or simplicial complex constructions that encode higher-order interactions [
53,
54]. This stage fixes the
ambient object and therefore determines which topological summaries are meaningful and computationally feasible.
A filtration specifies how structure is revealed across scales and is often the most consequential modeling choice. Metric filtrations (e.g., Vietoris–Rips) are natural for point clouds, whereas function-based filtrations are standard for images and scalar fields (e.g., sublevel or superlevel filtrations). Temporal and time-resolved constructions provide topology-aware views of evolving systems and are widely used in change detection and monitoring applications, including finance [
55,
56]. Filtration design implicitly defines the notion of “scale” or “importance” and can encode application-specific priors (density, intensity, correlation strength, velocity, etc.).
Given a filtration, one computes topological descriptors such as persistence barcodes and diagrams and derived invariants (Betti numbers across scale, representative cycles, etc.). This stage is mathematically uniform across application areas, but its computational burden varies substantially with the chosen complex and dataset size. For example, cubical complexes make image-based persistence tractable, whereas large Vietoris–Rips complexes may require sparsification or subsampling. This stage remains central for extracting meaningful topological evidence in applications. For instance, Wang et al. [
57] propose a TDA-based framework for extracting structural features of materials, providing informative insights into structure–property relationships and predictive strategies.
Persistence diagrams are not directly compatible with most learning algorithms; hence, vectorization is required. Representative families include functional summaries such as persistence landscapes, information-theoretic summaries such as persistent entropy [
43], and fixed-length embeddings such as persistence codebooks [
58]. More algebraic constructions map persistence outputs to paths and apply signature features to obtain expressive, stable representations [
59]. On the algorithmic side, improved diagram-comparison routines (e.g., efficient bottleneck-distance computation in special cases) support scalability when topology is used via distances [
60]. Surveys and recent work further discuss how these representations can be combined with deep architectures and hybrid pipelines [
61].
Finally, topological representations are consumed by a downstream task: classification, clustering, regression, change-point detection, optimization, or monitoring. In finance and other dynamical settings, topology is frequently used for regime and change detection [
6,
55,
56]. In biomedical and precision-medicine contexts, topology contributes robust, interpretable features for patient stratification and longitudinal analysis [
62,
63,
64]. In engineering, topology can also constrain decision-making by enforcing validity domains during optimization [
65] or by supporting monitoring of complex trajectories [
66]. In [
67], the use of TDA has been identified in smart manufacturing and industrial production.
Beyond the five stages, we find three cross-cutting axes that help classify the literature. First, the
role of topology ranges from a descriptive summary (exploration and visualization) to a feature extractor for predictive models, a distance metric for comparing samples or trajectories, or an explicit constraint for optimization. Second, the
coupling strength between topology and learning ranges from post hoc analysis (TDA computed independently) to hybrid pipelines (TDA features combined with standard machine learning) and, increasingly, to learned or end-to-end representations [
61]. Third,
temporal awareness distinguishes static pipelines from sliding-window approaches and explicitly time-indexed filtrations, which are central in monitoring and change-point problems [
55,
56].
As summarized in
Table 6, differences between approaches primarily arise at the representation and filtration stages rather than in the topological computation itself. Thus, in this table, we focus on Stages I, II, IV, and V, as Stage III is widely applied for getting more topological and computational processes which are uniform in various applications.
Method-Selection Guide (Data Type to Complex or Filtration to Topological Summary to Machine Learning Model
To make the pipeline actionable for practitioners,
Table 7 provides a compact starting guide that maps common data modalities to typical topological design choices and downstream learning models. The guide is not prescriptive; it is intended as a first-pass design template to be refined with domain-specific validation and sensitivity analysis.
How to use the guide. In practice, the following decision rules can be applied:
- (i)
Select the simplest complex or filtration that matches the data geometry.
- (ii)
Start with stable summaries, such as landscapes, Betti curves, or persistence images, before using higher-capacity representations.
- (iii)
Benchmark topology-augmented models against strong non-topological baselines.
- (iv)
Report sensitivity analyses for filtration and embedding hyperparameters.
Common failure modes remain data-dependent: embedding-window sensitivity in time series, preprocessing dependence in imaging, combinatorial cost for large point clouds, graph-construction noise in network studies, cohort shift in clinical data, and drift in streaming environments.
The pipeline taxonomy turns the broad literature into a concrete checklist: encode data, build a filtration, compute topology, represent features, and perform learning or inference. This perspective clarifies where conclusions can change because of preprocessing, metric choice, or hyperparameters. It also explains why many successful papers report gains from thoughtful stage-by-stage integration rather than from a single algorithmic trick. Reproducible TDA therefore depends on transparent reporting of all pipeline stages, not only final accuracy scores.
5. TDA, ML, and DL
AI, including ML, remains one of the most intensively studied areas in data science. In parallel, TDA is increasingly integrated with ML and DL, including multilayer neural architectures for tasks such as classification and regression.
5.1. Vectorizations of Persistence and Topological Feature Engineering
Persistent homology produces barcodes and persistence diagrams that typically need to be translated into fixed-dimensional vectors in order to be employed in machine learning. A wide range of vectorizations has been developed for persistence diagrams, such as persistence landscapes, persistence images, Betti curves, silhouette functions, kernel embeddings, and algebraic summary statistics. While there has been considerable debate on the relative merits of different persistence diagram representations in machine learning tasks, existing studies have found that barcode statistics perform competitively.
Recent work demonstrates the strong performance of TDA-derived features in noisy and nonlinear classification tasks. For example, TDA features extracted from voltage–current trajectories improve appliance identification in non-intrusive load monitoring over Principal Component Analysis (PCA)-based approaches, especially in high-noise conditions [
73]. Similar effects are observed in chemical sensing: TDA-based features substantially improve accuracy in low-cost electronic nose systems, as demonstrated in [
74]. In customer analytics, barcode statistics and persistence images have led to enhanced churn prediction without extensive hyperparameter tuning [
75]. In materials science, persistence images derived from roughness surfaces enable quantitative comparison of micro-crack patterns in bonded assemblies [
11].
Other developments include Euler-characteristic-based transforms, efficient compressed-vector schemes and differentially private embeddings of persistence diagrams [
36,
42]. These innovations expand the range of ML-compatible topological descriptors and support applications that require both privacy and interpretability.
A central challenge in topological data analysis is the transformation of persistence diagrams into representations suitable for statistical learning and downstream inference. Several theoretically grounded vectorization strategies have been proposed. Persistence codebooks provide a dictionary-based embedding of diagrams that enables efficient comparison and learning while preserving discriminative topological information [
58]. Signature-based approaches interpret persistence diagrams as paths and apply tools from rough-path theory, yielding stable and expressive representations with strong theoretical guarantees [
59].
Information-theoretic summaries have also been studied extensively. In particular, the stability of persistent entropy with respect to perturbations of the input data has been formally established, supporting its use as a robust scalar descriptor in noisy settings [
43]. From a computational perspective, advances in efficient bottleneck-distance computation have significantly reduced the cost of diagram comparison, making TDA more scalable in practice [
60]. Finally, comprehensive surveys have reviewed the integration of persistence-based representations with deep learning architectures, highlighting both theoretical foundations and practical design patterns [
61].
Critical limitations: Persistence vectorization studies can conflate topological information with choices made later in the ML pipeline. Landscapes, images, kernels, entropy summaries, and codebooks require bandwidths, grids, weights, thresholds, or dictionary sizes, and these choices may either suppress localized features or create high-dimensional descriptors that overfit small benchmarks. Reported gains are therefore most convincing when accompanied by ablations over vectorization parameters, comparisons with non-topological feature sets, calibration checks, and external or out-of-distribution validation.
5.2. Topology-Aware Deep Learning and Interpretability
Deep neural networks often lack interpretability and TDA has emerged as a powerful lens for probing their internal structure. Zhang et al. provide a unifying survey of TDA for DL explainability, highlighting applications ranging from analyzing data manifolds through persistent homology to visualizing decision boundaries via Mapper-based constructions [
15]. At the activation level, persistent homology applied to correlation graphs of neuronal activations can predict the generalization gap of deep networks without requiring a test set [
19]. These results suggest that network complexity and overfitting tendencies are reflected in persistent topological patterns.
Beyond interpretability, TDA has been incorporated directly into neural architectures. Examples include:
Neural models augmented with topological features extracted from spatial cell layouts, achieving performance comparable to convolutional neural network (CNN) baselines in stem cell classification [
22].
TDA-SegUNet, a topological variant of the U-Net convolutional encoder–decoder segmentation architecture, which integrates persistence images as multiscale descriptors for brain tumor segmentation and achieves state-of-the-art results on brain tumor segmentation (BraTS) datasets [
76].
Hybrid geometric DL methods for protein pocket classification that combine global topological invariants with local graph neural network (GNN)-based representations [
77].
These avenues reflect a broader trend: topology is increasingly viewed not just as an interpretability tool but as an architectural bias that enhances model robustness and representation quality.
Critical limitations: Topology-aware DL results are often sensitive to architecture, preprocessing, loss-weight selection, and dataset-specific annotation quality. Topological losses or persistence-image channels may add substantial computational cost, and apparent interpretability can remain qualitative unless it is tested against saliency, shape, or uncertainty baselines. Claims of state-of-the-art performance should therefore report runtime, memory, ablations without the topological component, and validation on independent cohorts or tasks.
5.3. Graph Neural Networks and Topology-Driven Representation Learning
Graphs play a central role in many domains, and graph neural networks (GNNs) often struggle to capture higher-order or multiscale topological structures. In parallel, Gavris et al. study a machine learning approach that uses message-passing GNNs to reduce non-physical transition zones in topology-optimization problems [
78]. TDA offers complementary descriptors that can augment or constrain GNN message passing.
Pham introduces a fuzzy neural network coupled with topological graph learning for molecular property prediction, integrating uncertainty-aware modules with persistent homology-based descriptors [
16]. Extending this idea, Pham et al. survey the rapidly growing literature on TDA-enhanced GNNs, including topological regularization, persistence-informed pooling, and homology-based graph kernels [
17]. These methods often outperform classical GNN baselines, particularly on tasks that involve long-range dependencies or non-Euclidean structures.
TDA has also enriched learning paradigms beyond supervised settings. In federated and incremental learning, Gong et al. develop a TDA-based stability loss that mitigates catastrophic forgetting by preserving a global topological structure across client updates [
18]. In multimodal recommendations, Bachiri et al. use persistent homology to construct robust cross-modality graph representations with improved ranking metrics and superior cold-start performance [
79].
Beyond node- and edge-level descriptors, TDA provides tools for capturing higher-order connectivity patterns in complex networks. Simplicial and cell-complex representations enable the definition of topological centrality measures that generalize classical graph metrics to higher dimensions. Such constructions have been shown to reveal mesoscopic organization and functional roles in real-world networks, complementing standard GNN pipelines [
53,
54].
Together, these contributions illustrate how topological insights can guide the design of graph-based learning systems, leading to models that are both more expressive and more stable.
In ML and DL, TDA is most effective as a source of structured features or as a regularization signal. It can improve robustness and interpretability, especially in noisy, nonlinear, or data-scarce settings. However, gains are not automatic and depend on representation design and model–task alignment. A practical strategy is to compare topology-augmented models against strong baselines and report ablation analyses.
Critical limitations: TDA-enhanced GNN studies depend strongly on graph construction, edge weighting, filtration design, homology dimension, and pooling strategy. Improvements can reflect richer preprocessing or larger feature budgets rather than genuinely topological information. For large, dynamic, or heterogeneous graphs, scalability and stability under missing edges, noisy labels, and distribution shift remain only partially tested. Strong evidence requires comparisons with modern GNN baselines, parameter-sensitivity analysis, and task-level ablations that isolate the topological contribution.
Before moving to domain-specific sections,
Figure 3 summarizes the application landscape covered in this review and highlights where topological summaries most often interact with modern learning pipelines.
6. Time Series and Dynamical Systems
Time series and dynamical systems are two strongly interacting fields where TDA is widely applied, ranging from financial environments to time series clustering pipelines. In this section, we review recent TDA applications in these areas.
6.1. Change-Point Detection and Financial Dynamics
Time series can be embedded into high-dimensional spaces using sliding windows or Takens’ delay embedding, after which persistent homology can capture geometric signatures of dynamics. Yao et al. apply this approach to change-point detection in financial markets, defining TDA-based volatility indicators that align with known extreme events such as the European debt crisis, Brexit, the COVID-19 pandemic, and the Russia–Ukraine energy crisis [
6]. Their TDA indicators outperform classical univariate and multivariate CPD methods in F1 score across different tolerance windows.
Nie studies nonlinear serial dependence in stock returns using topological cross-correlation measures and rolling-window TDA, revealing how geopolitical and policy events (e.g., Russia–Ukraine conflict, trade policies) impact the dependence structure of equity returns [
5]. De Jesus et al. enhance univariate time series forecasting for financial instruments by integrating TDA-derived features (entropy, amplitude, counts from persistence diagrams) into N-BEATS models, achieving consistent improvements across cryptocurrencies and traditional assets [
7]. Kulkarni et al. use persistent homology alongside geometry-inspired network measures (such as Ricci curvature) to assess fragility and systemic risk in Indian stock markets, finding persistent entropy to be a robust and informative topological measure [
8]. Related work exploits topological properties of LPPLS-type trajectories to explain why TDA is particularly effective at detecting financial bubbles and early-warning signals [
80] and extends persistence-based analyses to Chinese equity markets under major public events [
81]. Md. Morshed Bin Shiraj et al. study the integration of the Mapper algorithm with DBSCAN clustering to detect anomalies in financial time series, showing that TDA provides a robust anomaly-detection framework, particularly in high-dimensional settings where classical clustering methods may fail to capture the global structure [
82].
TDA has been increasingly applied to financial time series to capture structural changes that are difficult to detect using purely linear models. Persistent homology-based indicators have been proposed as early-warning signals for market crashes and regime shifts, demonstrating sensitivity to precursory changes in correlation structure and volatility patterns [
55,
56].
In related work, topological summaries have been incorporated into time series clustering and classification pipelines, where they improve discrimination between market states and asset behaviors [
83,
84]. TDA has also been explored in portfolio construction and enhanced indexing strategies, where topological features provide complementary information to classical risk–return metrics [
85,
86].
Beyond finance, Adami et al. study avalanche-size sequences in sandpile models via visibility graphs and persistent homology, uncovering scale-free behavior and power-law distributions for simplices and Betti numbers, with potential applications to earthquakes and other self-organized critical systems [
87]. More generally, engineered topological features have been shown to outperform standard statistical and wavelet-based descriptors in classifying stochastic processes and time series with different noise properties [
88,
89].
Critical limitations: Consistent with the cautionary patterns summarized in
Table 8, financial change-point and time-series studies are often retrospective and may tune embedding windows, distance choices, or persistence thresholds around historically known events. Several comparisons are made against classical CPD or forecasting baselines, but the evidence is less conclusive when stronger multivariate econometric models, modern sequence learners, transaction-cost-aware portfolio tests, or cross-market external validation are required. Reported improvements should therefore be interpreted as evidence that topology provides a useful structural signal, not as proof that TDA alone is a universally superior financial predictor.
6.2. Real-Time and Nonstationary Systems
Real-time state estimation in nonstationary mechanical systems poses difficult challenges. Razmarashooli et al. use TDA features derived from sliding-window embeddings to estimate moving boundary conditions in a testbed system, showing that maximum persistence in low-dimensional homology groups provides stable state estimates and can outperform short-time Fourier transform in rapidly changing regimes [
92]. Similarly, Razmarashooli’s work demonstrates the utility of TDA features for detecting impact-induced noise through higher-dimensional homology.
In building entropy-based measures of signal complexity, Myers et al. integrate TDA into permutation entropy frameworks to automatically select delay parameters, yielding parameter choices that align with expert recommendations and optimized settings across diverse dynamical systems [
45]. Other pipelines exploit zigzag persistence and persistence-diagram-based change-point detection, such as PERsistence-diagram-based ChangE-PoinT detection (PERCEPT), a named online change-point detection framework for monitoring high-dimensional streams [
93] and multilayer zigzag architectures for spatio-temporal meteorological forecasting [
71].
Critical limitations: For real-time and nonstationary systems, the main weakness is that many evaluations rely on controlled testbeds, simulated regimes, or a small number of benchmark systems, so robustness under sensor drift, missing data, changing sampling rates, and strict latency constraints remains only partially established. Comparisons with short-time Fourier, entropy-based, and GNN baselines are informative, but they do not always include full ablations over embedding dimension, window length, homology degree, and vectorization. As in
Table 8, the method is most credible when online computational cost and sensitivity analyses are reported explicitly.
6.3. Trajectory Analysis and Monitoring
Trajectory data (e.g., from hurricanes, animals, or vehicles) naturally encode spatio-temporal behavior. TDA offers a way to characterize the shape of trajectories rather than just pointwise positions. Esteve and Falco demonstrate that TDA-based representations can significantly improve trajectory classification accuracy, especially in hurricane trajectories and simulated scenarios [
94]. In related work, they introduce tramoTDA, a named Python library for TDA-based trajectory monitoring that provides user-friendly tools for visual and topological analysis of trajectories [
95].
Topological methods have also been applied to Lagrangian orbits in convection flows, where persistent homology of alpha complexes reveals toroidal structures and transitions in flow regimes [
12] and to orbit structures of more general flows on spherical surfaces via discrete representations and graph encodings that support TDA-driven classification [
13].
Critical limitations: Trajectory studies demonstrate intuitive shape-based advantages, but their conclusions can depend strongly on resampling, normalization, distance metrics, and the availability of representative trajectory classes. Some results are obtained on simulated or domain-specific datasets, and comparisons with dynamic time warping, hidden Markov models, recurrent neural networks, or transformer-based sequence models are not always equally strong. External validation on different sensors, geographic regions, or flow regimes is needed before these topological descriptors can be regarded as generally transferable.
For time series and dynamical systems, TDA helps identify regime changes, transitions, and recurrent structure that can be difficult to capture with standard statistics alone. Embedding and windowing choices are central because they define the geometry on which topology is computed. When tuned carefully, topological summaries support forecasting, monitoring, and anomaly detection across finance, sensing, and physical systems. In practice, TDA works best as a complement to classical signal-processing and statistical methods.
7. Biomedical, Biological, and Neuroscientific Applications
The connection between mathematics and the life sciences is evident in many applications, including algebraic and vector-based representations of biological data that helped shape modern bioinformatics. This motivates the study of TDA in biological and medical settings, as reviewed in this section.
7.1. Precision Medicine and Gene Expression
TDA has become an important tool in biomedical data analysis, especially where heterogeneous and high-dimensional data are involved. Loughrey et al. develop a method for subgroup discovery in precision medicine using Mapper, with a focus on breast cancer data [
9]. Their hotspot detection algorithm identifies homogeneous and geometrically compact subsets of patients with distinct clinical or molecular profiles and incorporates hotspot existence into Mapper parameter selection. The method reveals subgroups of estrogen receptor-positive patients with poor prognosis and specific expression signatures, validated on an independent dataset.
Mashatola et al. apply enhanced Vietoris–Rips complexes with topological overlapping measures to cancer gene expression data, showing that the resulting persistent features improve cancer phenotype prediction by up to 20% across multiple cancer types [
10]. Narender et al. combine TDA, graph convolutional networks, and support vector machines (SVMs) for genomic expression classification-based phenotype prediction, achieving improvements over traditional approaches and highlighting TDA’s role in high-dimensional feature extraction and network-based modeling [
96]. Further applications include TDA-guided drug repurposing to tackle antibiotic resistance [
97] and the analysis of antibody dynamics to stratify COVID-19 severity levels [
98]. In [
99], TDA delineates known breast cancer subtypes and identifies a new subtype within luminal B, together with its defining features. In [
100], a TDA-radiomics investigation of ultrasound data suggests that a quantitative ultrasound risk-stratification score (US RSS) may improve the preoperative prediction of follicular carcinoma.
In biomedical data analysis, TDA offers a natural framework for modeling heterogeneous and longitudinal data. Temporal filtrations and topological summaries have been used to study single-cell dynamics, enabling the identification of cell-state transitions beyond traditional clustering methods [
63]. In clinical settings, TDA has been applied to electronic health records to construct pseudo-time representations that capture disease progression trajectories [
62].
Persistent homology features have also been employed for outcome prediction, including relapse risk in acute lymphoblastic leukemia patients [
101]. Hybrid approaches combining neural networks with TDA have been proposed for tasks such as induced pluripotent stem cell colony classification, improving robustness and interpretability [
102]. Additionally, classifiers based on topological summaries of repeated-measurement data have demonstrated strong performance in longitudinal biomedical studies [
64].
In population genetics, TDA has been explored for quantifying recombination and cross-population gene flow by combining graph constructions (e.g., minimal spanning networks) with topological filtering and cycle-based summaries [
103].
TDA has also been proposed as a theoretical and computational framework for vascular disease characterization, where persistent homology-derived indices can complement descriptors of stenosis geometry and vessel morphology [
104]. In cardiovascular research, TDA supports the analysis of signals, such as electrocardiography, photoplethysmography, and arterial stiffness, and may improve diagnosis and prognosis [
105]. TDA has also been used to assess coronary atherosclerosis by providing new techniques for characterizing calcified and noncalcified plaques [
106].
Critical limitations: Precision-medicine and gene-expression applications are especially vulnerable to small cohort sizes, batch effects, missing clinical covariates, class imbalance, and multiple-testing bias. Mapper-based subgroups may change with the lens, cover, overlap, and clustering algorithm, while persistence-based predictors may capture cohort-specific artifacts rather than biology. Clinical claims should therefore be supported by independent validation cohorts, survival or outcome analyses, biological plausibility checks, and transparent sensitivity studies for all topological hyperparameters.
7.2. Imaging and Segmentation
TDA also plays a role in biomedical imaging. De Benedictis et al. combine TDA and low-rank tensor decomposition to enhance magnetic resonance imaging (MRI)-based brain tumor detection and classification, using persistent homology to identify critical regions and improve interpretability of ML predictions, achieving high classification accuracy [
107]. Rahman et al. propose TDA-SegUNet, a U-Net-based segmentation model that integrates persistence images derived from 0-dimensional and 1-dimensional homology to encode local and global shape information for brain tumor segmentation, outperforming state-of-the-art models on brain tumor segmentation (BraTS) datasets [
76]. Hybrid pipelines that pair persistent homology with DL backbones have also achieved state-of-the-art performance in skin cancer diagnosis, as in basal cell carcinoma classification using telangiectasia and lesion topology [
108].
More broadly, Paige and Patrangenaru demonstrate how cubical persistent homology and statistical methods on non-Euclidean spaces can distinguish and classify images of leaves with minimal preprocessing, with cubical homology yielding superior performance compared with alternative descriptors [
22], while Percival et al. use TDA to reveal conserved heteroblastic and ontogenetic programs in vining plant leaves that were not visible to PCA or linear discriminant analysis [
109]. Eremeev uses TDA-based barcodes and tree representations to detect repeated structures in satellite images, showing that persistent features can drive robust image analysis pipelines [
110]. At finer scales, cubical persistent homology supports feature detection and hypothesis testing in extremely noisy Transmission Electron Microscopy (TEM) images of nanoparticles [
47] and automatic recognition of morphological structures in 3D vertebra models [
111].
Topological features have been integrated into image analysis pipelines to enhance robustness to noise and geometric variability. In segmentation tasks, fractional calculus-based descriptors combined with persistent homology have been shown to improve boundary detection and region characterization [
112]. In digital pathology, TDA has been applied to quantify immune-cell spatial organization, providing interpretable biomarkers that correlate with clinical outcomes [
68]. Additionally, topological representations have been used as imaging biomarkers for ultrasound tumor diagnosis. Wei et al. propose wavelet-transform topological descriptors (WT-TD), an ultrasound topological representation method for distinguishing benign from malignant tumors and supporting clinical decision-making [
113]. TDA has also been applied in radiological imaging, including tumor characterization, cardiovascular imaging, and COVID-19 detection [
114].
Critical limitations: Imaging and segmentation results depend heavily on preprocessing, segmentation quality, image resolution, intensity normalization, scanner/site effects, and annotation protocols. Topological descriptors can improve shape sensitivity, but they do not by themselves guarantee clinical robustness or calibration. Benchmark improvements should be interpreted cautiously unless the same train–test protocol is used for all baselines, scanner or institution shifts are tested, and ablations show that topology adds information beyond standard CNN, radiomic, and morphology features.
7.3. Neuroscience and EEG
Electroencephalography (EEG) analysis has traditionally relied on statistical, spectral, and ML methods that may be sensitive to artifacts and noise. Ling et al. review TDA applications in EEG signal processing, summarizing TDA-based pipelines for disease diagnosis, brain state recognition, and perception evaluation, and highlighting strengths and limitations of TDA compared to conventional methods [
115]. Zheng et al. [
116] propose a TDA-based pipeline for multi-channel EEG, using Hilbert–Huang transforms to obtain instantaneous frequency and amplitude curves and then extracting TDA features for classification tasks. Their method achieves superior performance in Brain–Computer Interface (BCI) competitions and other EEG datasets, suggesting that topological features can capture informative structures in complex signals.
More specialized applications include emotion recognition from functional brain networks constructed via phase-locking values and analyzed with persistent homology to extract rich multiband topological descriptors [
117], brain functional network analysis for image-quality assessment using Grey-TDA models [
118], and TDA-based early prediction of ventricular fibrillation from ECG dynamics [
119]. Catanzaro et al. show that TDA-based summaries of task-driven functional magnetic resonance imaging (fMRI) signals in the anterior cingulate cortex can outperform conventional vectorizations when classifying motor-task conditions [
120]. Structural brain analyses have used TDA to quantify altered white-matter covariance networks in maltreated children [
121]; related persistence-based survival models have also been used in political science, for example, in democracy-survival analysis with persistence homology-informed functional PCA [
122]. Moreover, TDA has been applied to multi-channel EEG alterations in attention-deficit/hyperactivity disorder (ADHD) [
123], EEG signals in children with sleep apnea [
124], resting-state EEG data for Parkinson’s disease classification using entropy-based topological features [
125], and schizophrenia classification in adolescents via EEG signal embeddings [
126].
In neuroscience and biosignal analysis, TDA has been used to extract robust features from high-dimensional and noisy recordings. Studies on EEG data have investigated principled parameter selection strategies for time-delay embeddings, improving the stability of topological features across subjects and sessions [
127].
Persistent homology-derived descriptors have also been applied to cardiac electrophysiology, enabling the detection of ventricular fibrillation and tachycardia through topological signatures of ECG signals [
128]. In affective computing, TDA-based representations have been proposed as interpretable alternatives to black-box deep models, supporting explainable emotion recognition from physiological signals [
129].
Critical limitations: EEG, ECG, and fMRI studies are highly sensitive to filtering, artifact rejection, channel montage, referencing, time-delay parameters, window length, and subject-level leakage. Small cohorts and repeated measurements can inflate apparent accuracy if splits are not subject-independent. Credible evaluations should include artifact controls, cross-subject, and cross-device validation, comparisons with spectral, entropy-based, and deep sequence baselines, and interpretation that is linked to neurophysiological or physiological mechanisms rather than only classification scores.
7.4. Proteins, Molecular Structure, and Interaction Networks
At the molecular level, TDA has been used to analyze protein binding pockets, cryptic sites, and interaction networks. Jiang and Lugo-Martinez integrate TDA with geometric deep learning to characterize protein pockets, combining global topological invariants from TDA with local structural representations from GNNs to identify niches within pockets and improve classification tasks [
77]. Koseki et al. introduce a TDA-based framework for quantifying structural and interaction changes due to amino acid mutations, capturing persistent changes in protein–protein interfaces [
130], and leverage related ideas to detect cryptic binding sites via mixed-solvent simulations and TDA summaries [
131].
Beyond individual proteins, Karthick et al. propose quantum graph-based differential models with fractional calculus and TDA to study dynamic protein–protein interaction networks, extracting persistent topological features and detecting critical transitions in network structures, with implications for systems biology and drug discovery [
132]. At the cellular scale, TDA has been used to quantify collective motion patterns in mesenchymal cell populations via time-varying point clouds and Bayesian calibration of agent-based models [
133], as well as to track emergent ring structures, filament organization, and remodeling in developmental and wound-healing contexts [
134].
In computational biology, TDA has been used to compare and track evolving morphologies and emergent structures. Examples include distinguishing parameter regimes in angiogenesis simulations [
135], tracking collective cell-motion interfaces over time [
136] and detecting the onset/timing of ring-like structures in filament networks via time-resolved topological features [
137]. TDA has also been applied to organismal behavior, where topological summaries of posture/locomotion enable quantitative comparisons across conditions and support interpretable behavioral motifs [
138].
Critical limitations: Molecular and interaction network applications depend on atom selection, distance metrics, conformational sampling, solvent and protonation assumptions, graph construction, and threshold choices. A model may learn family-, assay- or database-specific biases rather than transferable molecular mechanisms. Strong validation should therefore use held-out protein families or interaction contexts, compare against established docking, affinity, or structural descriptors, test conformational ensembles, and connect persistent features to experimentally meaningful sites or functions.
Biomedical and biological datasets often contain heterogeneous subgroups, and this is where TDA is especially useful. The reviewed studies show that topological summaries can reveal clinically meaningful clusters, robust biosignal patterns, and interpretable molecular structure descriptors. These benefits are strongest when findings are validated on independent cohorts or supported by complementary biological evidence. For non-specialists, TDA can be viewed as a structured way to uncover hidden organization in complex life science data.
8. Engineering, Physical, and Infrastructural Systems
It is commonly accepted that the fields of engineering and physical systems involve essential subbranches for investigation such as energy grids and infrastructure networks. At the same time, TDA is a robust mathematical framework increasingly applied to such areas, as the present section proves in the following.
8.1. Fluid Dynamics and Pattern Formation
In fluid dynamics and pattern formation, TDA offers a way to characterize complex spatial structures and their evolution. A work analyzes convection-driven flows in cylindrical geometries via Poincaré maps and persistent homology, characterizing transitions from quasi-periodic to chaotic regimes in terms of torus knots and cycle statistics [
12]. Mototake et al. propose a TDA-based procedure, combined with machine learning, to interpret pattern formation in magnetic domain systems, linking TDA features to underlying physical mechanisms and suggesting reduced models that capture observed dynamics [
139]. Topology-aware analyses of flows on spherical surfaces similarly exploit discrete encodings of orbit structures to support TDA-based comparisons and classification [
13].
In fluid mechanics, TDA has been used to extract coherent structures from high-dimensional flow data. Persistent homology-based analyses have been applied to turbulent jet flows, where topological features capture the evolution and interaction of vortical structures across scales [
140].
Critical limitations: Fluid-dynamics and pattern-formation studies often rely on simulated, controlled, or narrowly parameterized regimes. Persistent features can change with sampling density, mesh resolution, Poincaré-section choice, vortex extraction, time discretization, and noise level. Topological conclusions are strongest when they are checked against physical conservation laws, classical diagnostics such as POD/DMD or vorticity measures, parameter sweeps, and experimental data rather than being treated as standalone evidence of a flow mechanism.
8.2. Mechanical Systems and Structural Roughness
Structural health monitoring and materials characterization are natural domains for TDA. Canot et al. analyze roughness surfaces of bonded assemblies using persistent homology and persistence images, quantifying voids, micro-cracks, and peak amplitudes and relating them to adhesion and fracture resistance of adhesives used in aeronautics [
11]. Pei et al. use TDA with Morse theory to extract topological features from ultrasonic-guided wave signals for corrosion characterization of steel strands, showing that topological features correlate linearly with cross-section loss and outperform traditional time, frequency, and time–frequency domain features in capturing corrosion development [
24]. Miller et al. apply persistent homology to pulmonary arterial trees in murine models of pulmonary hypertension, revealing pruning and remodeling signatures in vascular topology [
141].
Condition monitoring of rotating machinery has also benefited from TDA: Jeung and Kwon design a robust multivariate time series classification model that uses TDA to extract consistent features from multi-sensor vibration data, enabling fault-tolerant condition monitoring even when some sensor channels fail [
142]. TDA-based feature extraction on stator current signals has been applied to induction motor eccentricity fault detection, with persistent homology-derived features feeding machine learning models that generalize across unseen fault levels [
143]. At the instrumentation level, TDA combined with Takens embeddings has been used to visualize and classify instrument outputs and rigid-body dynamics based on orbit topologies [
144].
TDA has proven effective in analyzing complex dynamical systems encountered in engineering. Persistent homology features have been used to detect chaos and regime transitions in nonlinear mechanical systems [
145], as well as to characterize human balance dynamics from motion data [
146].
In robotics and autonomous systems, topological descriptors have been employed for trajectory monitoring and anomaly detection [
66], including driver-assistance and human–machine interaction scenarios [
147]. Related work has applied TDA to cluster parametric vibration modes and operational states [
148]. Moreover, topology-aware validity-domain constraints have been introduced to guide optimization and model-based design under uncertainty [
65].
Critical limitations: Mechanical, vibration, and roughness studies are frequently evaluated on laboratory faults, balanced datasets, or controlled operating conditions. In deployed systems, sensor drift, missing channels, changing loads, unobserved fault classes, and strict latency constraints can alter the topology of embedded trajectories. Practical claims should therefore report window length and sampling sensitivity, cross-machine or run-to-failure validation, robustness to sensor loss, computational cost, and comparisons with established time–frequency, physics-based, and condition-monitoring baselines.
8.3. Infrastructure Networks and Resilience
Complex infrastructures such as water distribution networks and power grids are critical and increasingly stressed by disturbances. Selicato et al. apply persistent homology to water distribution networks, proposing a new resilience metric based on topological features that complements existing graph-theoretic measures and provides a richer characterization of system robustness under failure scenarios [
14]. Wang et al. integrate TDA with deep belief networks and decentralized control strategies to build short-term voltage prediction models in wind-integrated power systems, improving resilience assessment and control of small disturbances in renewable-rich grids [
72]. Rail infrastructure has been analyzed with Mapper and Betti numbers to understand track geometry anomalies and maintenance needs [
149], while TDA-based multimodal change detection methods have been used to track ecosystem state transitions in large river systems such as the Upper Mississippi [
150].
Topological approaches have also been proposed for resilience assessment in infrastructure systems. By combining persistent homology with Wasserstein distances, TDA enables quantitative comparison of system states before and after disruptions, supporting the analysis of robustness and recovery dynamics in complex engineered networks [
151].
In materials science and chemistry, persistent homology features have been used as compact structure descriptors for prediction and comparison tasks, including studies on high-temperature cuprate superconductors [
152], comparisons of different graphene forms [
153], and structure–property analysis of endohedral metallofullerenes [
154], whereas in [
155], a different investigation of automatically segmented large porous structures into local geometric features was performed; the shape and size of a pore or the curvature of a solid ligament, which affects the macroscopic properties of the material, has been succeeded using Morse theory. In [
156], TDA has been treated as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries via persistent homology, proving a hierarchical classification scheme.
Critical limitations: Infrastructure and resilience analyses depend on how physical systems are abstracted into graphs, weighted networks, or point clouds. Persistence or Wasserstein distances may miss hydraulic constraints, capacities, control policies, cascading effects, maintenance costs, or regulatory requirements. Materials descriptors similarly need physical validation rather than only classification accuracy. Reliable studies should compare topological summaries with domain simulators and graph-theoretic baselines, propagate uncertainty in failure scenarios, and show that topological features correspond to actionable engineering mechanisms.
In engineering and physical sciences, TDA provides compact descriptors of evolving structure, from flow regimes to network resilience and material morphology. This is valuable when behavior is nonlinear and cannot be summarized well by single-point indicators. Topological features often improve monitoring and control when fused with domain-specific models. The practical message is that topology adds system-level context to conventional engineering analytics.
9. Finance, Economics, and Social Systems
In the modern era, the challenges are many, difficult, and constantly increasing. To a large extent, societies are faced with economic issues and the consequences of social phenomena like public health and social behavior. Hence, the workplace of TDA has its own applications and contributions to all these manifestations.
9.1. Financial Markets and Risk
As discussed earlier, TDA is increasingly used in financial time series analysis and market structure characterization. In addition to works on change-point detection and forecasting [
6,
7,
8,
80,
81], Mojdehi et al. use Ball Mapper and GNNs for credit-risk assessment in supply chain finance, demonstrating that topological features and network-based representations improve accuracy and F1 scores in bankruptcy prediction [
33]. Kheneifar and Amiri extend this line of work to maritime finance, using TDA over correlation-based networks of shipping firms to extract persistence features that capture nonlinear risk patterns and enable more accurate loan default prediction [
157]. Topological descriptors have also supported portfolio-level risk modeling, bubble detection, and systemic fragility analysis through their sensitivity to clustering and voids in correlation structures [
80,
81].
Beyond finance, TDA has been applied to behavioral data such as clickstreams, where session dynamics are modeled as Markov chains and persistent homology summaries help identify intervention points and discriminate buyer vs. non-buyer browsing patterns [
158].
Related work also uses TDA to refine risk stratification for firm distress and default. By mapping firms into the feature space of classical credit-risk indicators (e.g., Altman-style factors) and visualizing the resulting point cloud, TDA can reveal heterogeneous failure regions that are not easily separated by simple threshold rules [
159].
Critical limitations: Financial market and risk applications face nonstationarity, survivorship bias, look-ahead leakage, class imbalance, and changing regulatory or macroeconomic regimes. Topological features extracted from correlations, credit indicators, or transaction behavior can be unstable when the market universe, sampling window, or normalization changes. Evidence should include strictly temporal validation, stress-period tests, transaction-cost or default-cost-aware metrics, calibrated probabilities, and comparisons with strong econometric, credit-scoring, and modern sequence-learning baselines.
9.2. Elections, Public Health, and Social Behavior
Mancilla et al. integrate TDA with machine learning and geostatistics to predict voting preferences in a gubernatorial election, constructing geospatial and non-geospatial models that incorporate TDA-derived features and achieving successful prediction of the election winner while enabling spatial exploration of voting patterns [
160]. Dey and Kundu analyze county-level COVID-19 vaccine acceptance in the United States using network-based models and TDA-derived clustering methods, uncovering macro-level communities with distinct vaccination patterns and linking them to sociodemographic factors such as education, income, and region [
161]. TDA-based analyses of COVID-19 incidence and antibody dynamics highlight non-binary severity structures and multiscale spatial patterns, supporting more nuanced epidemiological modeling [
98]. A related study applies TDA, specifically the Mapper algorithm, to COVID-19 data from China [
162].
Mobility and migration have also been studied with TDA. Vittorietti et al. develop a topological measure for the attitude to mobility of Italian students and graduates, representing educational and occupational trajectories as graphs and ranking them via distances between persistence diagrams [
163]. At the decision-making level, circular intuitionistic fuzzy TDA and related multi-criteria models have been used for AI-assisted evaluation in healthcare supply chains and uncertain logistics [
164]. In political science, persistence homology-informed Bayesian survival models reveal topological heterogeneity in democracy survival data and support new regularization schemes for deep survival networks [
122].
Finally, TDA has been explored in diverse interpretability-oriented settings: as a tool for solving or structuring certain classes of visual reasoning tasks (e.g., Bongard problems) [
165], for uncovering organic relationships in legal corpora via case/precedent structure [
166], and in decision-analytic settings where topology-inspired constructions are combined with fuzzy multi-criteria methods [
167].
Critical limitations: Social, electoral, mobility, and public health studies are vulnerable to ecological fallacy, spatial aggregation effects, missing covariates, measurement bias, and privacy constraints. Topological clusters may reflect data collection, normalization, or geography rather than causal social mechanisms. Results should therefore be presented as exploratory or hypothesis-generating unless supported by causal designs, out-of-region validation, uncertainty quantification, demographic fairness checks, and careful communication of ethical limits.
In finance, economics, and social applications, TDA is mainly used to capture collective structures such as clustering, regime transitions, and systemic fragility. It has supported forecasting and risk analysis as well as community-level studies in public health and mobility. The evidence is promising, but interpretation depends on aligning topological patterns with domain mechanisms and context. For non-specialists, TDA is best seen as an early-warning and structure-discovery lens rather than a stand-alone predictor.
10. Security, Adversarial Machine Learning, and Anomaly Detection
Data poisoning and adversarial attacks pose substantial threats to ML-based systems. Monkam et al. propose a TDA-based approach to detect data poisoning in network intrusion detection systems, using topological features and clustering to isolate clusters of poisoned data before training, thereby improving security without relying solely on classifier robustness [
70]. Ferrara’s game-theoretic analysis, mentioned earlier, provides a complementary theoretical perspective [
91].
Topological features also appear in anomaly detection more broadly, such as in power system monitoring, meteorological forecasting, and sensor networks. For example, Ma et al. use zigzag persistence and supra-graph constructions in the ZPDSN model, the authors’ named topological graph neural network architecture for spatio-temporal meteorological forecasting, capturing high-order structural information and outperforming conventional GNN-based methods on multiple meteorological variables [
71]. Wang et al. incorporate TDA-derived information into voltage prediction and reactive power control strategies for wind farms, enhancing detection and mitigation of small disturbances [
72]. At the data-management level, TDA has been explored as a tool to automatically detect data quality faults and duplicate entities in large databases, providing an unsupervised complement to rule-based and record-matching approaches [
168].
Topological summaries have also been explored in security-related analytics. For example, TDA has been used to visualize and characterize malicious packet patterns in darknet monitoring data, providing an interpretable global picture of attack activity beyond conventional traffic statistics [
169]. In Natural Language Processing (NLP) security, TDA-derived features have been investigated as a complement to deep models for fake news detection, with particular gains reported in regimes where labeled training data are scarce [
170].
Critical limitations: Security and anomaly-detection studies often rely on fixed benchmark datasets, known attack types, or controlled simulations, whereas deployed adversaries adapt to detection rules and concept drift. Topological summaries may be costly to update online and may generate false positives when benign traffic changes shape. Operational evaluations should include latency, memory use, adaptive attacks, drift scenarios, false-alarm costs, and comparisons with production intrusion detection, graph analytics, and sequence model baselines.
In security workflows, TDA helps characterize global structural patterns that can be missed by local feature checks. This can improve detection of poisoning behavior and spatio-temporal anomalies in complex systems. The approach is particularly useful when labels are limited or attack patterns evolve over time. In practice, topology is most effective as a complementary signal inside a broader defense pipeline.
11. Software Ecosystem and Practical Considerations
The growing TDA ecosystem includes both general-purpose libraries and domain-specific tools. In addition to widely used packages such as the Geometry Understanding in Higher Dimensions (GUDHI) library, Ripser, and Giotto-TDA, recent developments include:
tramoTDA, a Python library for trajectory monitoring and classification that leverages persistent homology and TDA-based distances, designed for both technical and non-technical users [
95].
The multipers library and Core Delaunay constructions for multiparameter persistence, used in shape recognition experiments that distinguish synthetic shapes such as circles, spheres, tori, and coffee cups [
23].
Specialized pipelines that integrate TDA with wavelet transforms, Hilbert–Huang transforms, and domain-specific preprocessing steps in EEG, financial, and environmental applications [
7,
116,
171].
Practical deployment of TDA-based workflows raises several issues:
- (a)
Parameter selection: Choosing scale parameters, filter functions, and embedding windows is nontrivial. Methods such as TDA-based delay selection [
45] and Mapper parameter exploration guided by hotspot detection [
9] represent important steps, but automated and principled parameter selection remains an open challenge.
- (b)
Scalability: Data reduction methods such as CLA [
44] and enhanced complexes [
10] help manage computational load, but scaling to truly massive datasets (e.g., billions of points, large streaming graphs) is still difficult.
- (c)
Interpretability: While persistence diagrams and Mapper graphs are conceptually interpretable, translating them into domain-specific insights often requires collaboration with subject-matter experts and careful experimental design.
On the software side, giotto-tda provides a scikit-learn-compatible toolkit that makes common TDA pipelines (preprocessing, persistent homology, and vectorizations) easier to integrate into ML workflows [
172]. For interactive analysis at scale, topology-driven visualization and aggregation methods have been proposed to enable exploration of high-dimensional model behavior on very large scientific datasets [
173].
The software ecosystem is now mature enough for non-specialists to prototype TDA using standard data-science tools. However, libraries do not remove the need for careful parameter choice and sensitivity checks. Reproducible conclusions still depend on transparent preprocessing, filtration design, and evaluation. A practical starting point is to use simple documented pipelines before moving to specialized high-complexity variants.
12. Additional Recent Applications of TDA
To avoid a long catalog of loosely connected examples, this section now shows only representative cases that clarify the five-stage design patterns in
Section 4; the broader reference list is moved to
Appendix A. The first pattern is
topology as a task-specific classifier. Kindelan et al. provide a compact example because the topological component is part of the classification strategy itself rather than only an auxiliary visualization [
174]. Related imaging and manufacturing examples, such as Matrix-Assisted Laser Desorption/Ionization (MALDI) tumor typing and wafer-defect recognition, are treated as domain-specific variants of the same pattern [
69,
175].
The second pattern is
topology as a morphology-aware biomedical or signal descriptor. Representative fMRI, ECG/sleep, and functional network studies show the same workflow: construct a biologically meaningful image, signal, or network representation; choose a filtration compatible with that representation; vectorize persistence outputs; and test whether topology adds information beyond conventional descriptors [
117,
119,
176,
177]. These cases are more useful for the main argument than a longer biomedical list because they expose where design choices enter the pipeline.
The third and fourth patterns are
topology as a global transition detector and
topology as an exploratory lens for structured non-Euclidean data. Physical, ecological, geophysical, political, and mobility studies illustrate transition, roughness, connectivity, and regime heterogeneity questions [
11,
122,
139,
150,
163,
178]; spatial, cultural, and symbolic data studies show how the same pipeline can be transferred once a defensible representation and filtration are defined [
179,
180,
181,
182]. The important point is not the number of application domains, but whether each example clarifies a reusable design pattern and is supported by appropriate validation.
13. From Catalog to Critical Synthesis: What Works, What Fails, and Where
The domain-by-domain review above can be condensed into a smaller set of recurring empirical patterns. Across finance, biomedicine, engineering, and security applications, TDA is most reliable when it is used as a
structural prior that complements statistical or neural models, rather than as a stand-alone replacement for the full learning pipeline [
5,
9,
70,
71]. The purpose of this section is to synthesize consistent evidence and to separate robust effects from context-dependent claims.
Table 8 presents the corresponding special investigation of this critical synthesis.
13.1. What Consistently Works
Multiscale summarization of complex structure: persistent homology and related summaries repeatedly provide compact, noise-tolerant descriptors of nontrivial geometry in time series, graphs, and imaging data [
7,
10,
69].
Hybridization with classical or deep models: the most stable performance gains occur when topological summaries are fused with domain features and learned representations, especially in forecasting, diagnosis, and anomaly detection [
8,
16,
18,
70].
Interpretability at mesoscopic scale: Mapper/persistence-based analysis is consistently useful for identifying subgroups, transition regimes, and failure regions that are difficult to detect with purely local statistics [
9,
159,
161].
13.2. What Repeatedly Fails or Becomes Fragile
Hyperparameter sensitivity: filtration design, metric choice, and embedding parameters can change conclusions qualitatively; this is a major source of instability when parameter sweeps are weakly justified [
183,
184].
Scalability bottlenecks: large Vietoris–Rips constructions remain expensive, and aggressive approximations can remove informative fine-scale structures [
44,
185].
Signal dilution after vectorization: converting diagrams into fixed-length vectors is often necessary for ML pipelines, but can weaken geometric meaning and reduce portability across datasets [
90,
186].
Evaluation fragility: positive results are frequently reported on small or domain-specific benchmarks with limited ablation and sensitivity analysis, making cross-study comparisons difficult [
187].
13.3. In Which Settings Each Pattern Is Most Reliable
Across domains, the strongest evidence supports using TDA as a complementary structural component, not as a universal stand-alone solution. Reliable gains appear when datasets are noisy, multiscale, or weakly labeled and when topological features are integrated with domain-aware models. Fragility appears when hyperparameters are weakly justified, evaluations are narrow, or scalability shortcuts remove informative structures. The practical rule is to use TDA to enrich existing pipelines and verify robustness with clear baselines and sensitivity analyses.
13.4. Expanded Discussion of Cross-Application Results
The applications reviewed in this survey point to a common interpretation: TDA is most useful when the research question concerns structure, transitions, or heterogeneity rather than only pointwise prediction accuracy. In financial and other nonstationary time series settings, the main result is not simply that persistence-based features can improve forecasting, but that sliding-window topology can expose regime changes, market instability, and evolving dependence patterns that are difficult to summarize with local statistics alone [
5,
6,
7,
8]. The same principle appears in dynamical systems and sensor applications, where topological summaries function as early-warning or monitoring descriptors rather than as isolated classifiers [
71,
92,
94].
In biomedical and biological applications, the strongest contribution of TDA is its ability to reveal subgroup structure and shape-driven biomarkers under high dimensionality, class imbalance, and noisy measurement. Mapper-based precision-medicine studies, enhanced Vietoris–Rips constructions for gene-expression data, and imaging or segmentation pipelines all illustrate that topology can support stratification and morphology-aware learning [
9,
10,
69,
76]. However, these results should be interpreted as evidence for hypothesis generation and model enrichment unless they are supported by external validation cohorts, stable preprocessing, and clinically meaningful sensitivity analyses.
Engineering, physical science, infrastructure, and security studies show a second recurring pattern: topology is valuable when global connectivity or multiscale geometry carries operational meaning. In materials, fluid flows, surface roughness, and structural monitoring, persistent summaries can encode geometric organization that is not captured by scalar descriptors alone [
11,
12,
92,
155]. In infrastructure and cybersecurity, graph-topological features help characterize resilience, anomaly patterns, or poisoned clusters by preserving higher-order relationships among system components [
14,
70,
72,
91]. These applications indicate that TDA is most convincing when topological features can be linked back to a domain mechanism, such as connectivity, recurrence, roughness, or failure propagation.
Across ML and DL applications, the main result is that topology usually works best as an auxiliary representation, regularizer, or interpretability layer. Persistence images, landscapes, graph-topological features, and differentiable topological losses can improve robustness or reveal model behavior, but their benefit depends strongly on architecture, vectorization, and evaluation design [
15,
16,
17,
18,
19]. Consequently, the overall conclusion from the application survey is balanced: TDA offers a transferable language for complex structures, but its practical value is highest when it is paired with domain knowledge, strong non-topological baselines, repeated sensitivity checks, and transparent reporting of computational choices.
14. Design Choices, Limitations, and Common Pitfalls
Beyond the application-driven studies discussed above, several works address methodological, theoretical, and evaluation-oriented aspects of topological data analysis. These include analyses of robustness and uncertainty in topological summaries [
186,
188], as well as investigations into stability and convergence properties of persistence-based constructions under different sampling and noise regimes [
183,
184]. Related contributions examine comparative evaluation and benchmarking considerations for TDA pipelines in applied settings [
90,
187] and the integration of topological descriptors into multi-criteria decision-making frameworks [
189]. Together, these studies complement application-focused work by clarifying the limits, assumptions, and practical reliability of topology-based methods.
While topological data analysis has been shown to have a wide range of applications in many fields, its successful application depends on a number of non-technical choices. This section will highlight the common difficulties that have been identified in the literature, grouped according to the steps in the pipeline discussed in the previous section.
14.1. Sensitivity to Data Representation and Filtration Design
The choice of data encoding (Stage I) and filtration (Stage II) often controls downstream performance. Small changes in distance metrics, density estimation, or thresholding can lead to qualitatively different topological patterns. This is particularly evident in time series embeddings and graph filtrations, where scale parameters are implicitly tied to modeling choices. Filtration designs are rarely portable across datasets, which limits out-of-the-box use.
14.2. Stability–Expressivity Trade-Offs
Persistent homology is robust to perturbations of the input data. However, this robustness comes at the cost of reduced expressivity, as there are topological representations that ignore geometric and metric details potentially useful for learning. In contrast, more expressive representations, such as learned or high-dimensional vector representations, might improve performance but are likely to break the robustness guarantee of persistence. Therefore, balancing robustness, discriminative ability, and interpretability is still an open problem.
14.3. Scalability and Computational Constraints
Despite progress in algorithmic improvements, computational scalability remains a bottleneck, especially for large point clouds and high-dimensional data used to build Vietoris–Rips complexes. In practice, pipelines often rely on subsampling, sparsification, or approximate algorithms, which may introduce bias or lose fine-scale information. Although cubical complexes alleviate these issues for image data, scalable algorithms for general metric spaces remain an active research area.
14.4. Vectorization and Loss of Interpretability
Most learning applications rely on vectorizing persistence outputs (Stage IV). While summaries such as landscapes, entropy, and codebooks enable compatibility with conventional ML frameworks, they can weaken the geometric interpretability of topological features. This trade-off directly affects one of the original motivations of topological data analysis.
14.5. Evaluation, Validation, and Reproducibility
A persistent concern in the literature is the lack of standardized evaluation protocols. Reported performance is often constrained by small or domain-specific datasets. Moreover, sensitivity analyses for filtration parameters and embedding choices are not always well documented. Progress in this area requires clearer reporting of design decisions and benchmark datasets tailored to topological approaches.
Table 9 summarizes a minimum reporting standard for practical reproducibility in TDA studies.
To make
Table 9 operational rather than merely prescriptive,
Table 10 applies it to three representative articles that are central to the application survey and correspond to references [
9,
10,
56] in the current reference list. The audit is intentionally conservative: “not clearly reported” means that the item is not sufficiently explicit in the article-level reporting reviewed here to permit direct reproduction by an independent reader, not that the information was necessarily unavailable to the original authors.
Across this small audit, all three representative studies describe the scientific dataset or domain setting and all provide at least a conceptual description of the TDA construction. However, none clearly satisfy the full checklist in
Table 9: the open executable code, random seeds, complete parameter grids, hardware/runtime information, and scripts for regenerating the main figures or tables are not consistently reported. This reinforces the practical message of the survey: transparent TDA reporting should include not only diagrams and accuracy values but also the exact filtration design, stochastic choices, computational budget, and reproducible analysis scripts.
These limitations show that TDA is usually best treated as an inductive bias rather than a replacement for full statistical modeling. Small design choices in representation, filtration, and vectorization can materially change conclusions. Credible TDA studies therefore require transparent reporting, sensitivity analyses, and external validation whenever possible. For non-specialists, the central message is that methodological discipline matters as much as the topological tool itself.
15. Open Challenges and Future Directions
Despite rapid progress, several fundamental questions remain. In this section, we organize the main open challenges into thematic categories.
15.1. Open Problems and Research Directions
We group current research priorities into five interconnected themes.
15.1.1. Scalability
A central challenge is to make TDA reliable for large datasets through sparse or approximate constructions, GPU-accelerated computation, and streaming TDA pipelines that update summaries online.
15.1.2. Statistical Foundations
Key questions concern calibrated confidence sets for persistence diagrams and rigorous hypothesis testing under finite-sample and dependent-data regimes.
15.1.3. Integration with AI
An active direction is coupling TDA components with DL and graph neural networks so that topological signals improve generalization without destabilizing training.
15.1.4. Interpretability
Topological descriptors can support explainable machine learning by linking model predictions to robust geometric and structural patterns in data.
15.1.5. Multiparameter Persistence
Major open problems include representational choices, invariant design, and severe computational complexity in multifiltration settings.
15.2. Scalability and Streaming Data
Persistent homology computation for large Vietoris–Rips complexes is still a bottleneck. For example, Manu Aggarwal et al. presented an efficient and scalable algorithm for computing persistent homology of sparse Vietoris–Rips complexes on larger datasets, including a high-resolution human-genome application based on a genome-wide Hi-C dataset containing approximately three million points [
185].
The central practical issue is the accuracy–efficiency trade-off. Exact Vietoris–Rips persistence is attractive because it is simple and reproducible, but the number of simplices grows combinatorially with the number of points, the filtration threshold, and the maximum homology dimension. Approximation strategies reduce this burden in different ways: sampling and data reduction methods replace the full point cloud by a smaller representative set; skeletonization fixes a low maximum dimension and discards high-dimensional simplices; sparse Vietoris–Rips constructions prune edges or cofaces; and witness complexes use landmarks to summarize a much larger set of witnesses. These choices improve runtime and memory use, but they may suppress short-lived or spatially localized features. Therefore, approximation should be reported together with stability checks, such as repeated landmark selections, bottleneck or Wasserstein distances between diagrams, and downstream ablation tests [
3,
44,
185].
Table 11 suggests a practical decision rule for end users. Start with an exact Ripser computation in low dimension on a controlled subset to establish a reproducible baseline. If memory or runtime becomes prohibitive, first lower the maximum dimension and filtration diameter, then compare at least two approximations—for example, CLA or farthest-point sampling versus sparse Rips or witness complexes—and keep the cheapest option whose persistence diagrams and downstream scores remain stable. Use GUDHI when the data naturally require alpha, cubical, witness, or custom filtrations, and use giotto-tda when the main requirement is integration with a machine learning pipeline. In all cases, the report should include the number of points, retained landmarks or edges, maximum homology dimension, filtration threshold, software version, hardware, wall-clock time, peak memory, and sensitivity analysis.
In view of these trade-offs, future research directions should include:
Randomized and sketch-based approximations of persistent diagrams.
Multi-resolution and hierarchical methods that focus computational effort on informative regions of parameter space.
Streaming algorithms that update topological summaries incrementally as data arrive.
Integration with hardware accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), and distributed frameworks is likely to be crucial for real-time applications such as online risk monitoring and cyber–physical system control.
15.3. Statistical Foundations and Limits
Works such as [
48] emphasize that TDA-based summaries are not universally informative for statistical inference. Key open problems include:
Developing confidence sets and hypothesis tests for persistence diagrams, Betti curves, and related summaries.
Understanding identifiability and sufficiency conditions for topological statistics.
Designing Bayesian and likelihood-based models that incorporate topological information in a principled way.
Further progress will likely draw on advances in random topology [
27], dependence network simulation and inference [
49,
50], and the study of topology-aware loss functions and regularizers in deep and graphical models [
17,
18].
15.4. Integration with Modern AI Architectures
The rapid development of TDA and AI, both separately and in combination, is creating new opportunities for methodology and applications. The literature surveyed here reports successful integrations of TDA with deep learning, graph neural networks, and hybrid neuro-symbolic architectures [
15,
16,
17,
18,
19,
193]. Future directions include:
Differentiable persistent homology layers with better gradient properties and scalability.
Topological regularizers that enforce global constraints (e.g., connectivity, number of holes) in generative and discriminative models.
Topology-aware transformers and foundation models that operate over graphs, manifolds, and multimodal data.
End-to-end pipelines where topological and neural components co-adapt during training.
Quantum-inspired and quantum-accelerated TDA methods [
51,
52] may also play a role in large-scale or resource-constrained scenarios, provided that data-loading and complexity constraints can be adequately addressed.
Future progress in TDA depends on advances in scalability, stronger statistical foundations, and better integration with modern AI architectures. Faster algorithms alone are not enough unless uncertainty quantification and validation standards also improve. Likewise, topology-aware neural models are promising only if they remain interpretable and computationally practical. The field is moving from proof-of-concept successes toward robust and deployable methodology.
16. Conclusions
From a specialized mathematical toolkit, topological data analysis has evolved into a broader framework used across data science, ML, and scientific computing. Historically, the core notions of persistent homology, persistence diagrams, and barcodes emerged between the 1990s and 2010s, and recent technological advances have made both theoretical and practical study of TDA increasingly important. The literature surveyed here demonstrates the versatility and relevance of TDA, including applications to change-point detection in financial markets, improved forecasting and recommendation systems, analysis of biological and neuronal networks, interpretation of DL systems, defense against poisoning attacks, and resilience analysis of infrastructure networks.
At the same time, a critical synthesis of recent evidence indicates that TDA works most consistently as an inductive bias and structural descriptor, not as a universal stand-alone predictor. Strong outcomes are most frequent in settings with multiscale geometry, heterogeneous subpopulations, and noisy or partially labeled data, especially when topological features are fused with domain-informed models. Conversely, unstable filtration design, limited benchmarking, and scalability bottlenecks remain recurrent failure modes. Addressing these issues requires coordinated progress in algorithms, statistical validation, and topology-aware AI architectures, together with closer collaboration between mathematicians, domain scientists, and ML practitioners.
To conclude with an actionable agenda, we replace broad claims with concrete research questions that can guide near-term work:
- (Q1)
How can persistent homology pipelines for large Vietoris–Rips complexes be made truly scalable (e.g., in streaming and distributed settings) while retaining provable approximation guarantees and bounded memory?
- (Q2)
Which statistical procedures provide calibrated uncertainty quantification for persistence-based summaries (diagrams, landscapes, Betti curves) under realistic assumptions such as dependence, heteroskedastic noise, and finite samples?
- (Q3)
How should filtrations and hyperparameters be selected in a data-driven yet interpretable way, and what sensitivity analysis protocols should be reported as a minimum standard for reproducible TDA studies?
- (Q4)
What benchmark design (datasets, tasks, metrics, ablations) best isolates the added value of topological features over strong non-topological baselines across domains?
- (Q5)
How can differentiable topological modules be integrated into modern deep and graph architectures so that training remains stable, computationally feasible, and scientifically interpretable?
Answering these questions would move TDA from promising demonstrations toward robust, validated, and routinely deployable methodology across scientific and engineering applications. Practitioners should nevertheless treat TDA as a sensitivity-dependent component rather than a plug-and-play predictor: filtration and embedding hyperparameters can change the resulting diagrams, vectorization can dilute localized topological signals, and the absence of standardized validation protocols makes cross-study comparisons fragile.