Topology-Based Machine Learning and Regime Identification in Stochastic, Heavy-Tailed Financial Time Series
Abstract
1. Introduction
2. Theoretical Framework
2.1. Delay Embedding of Stochastically Forced Time Series
2.2. Temporal Topological Networks
2.3. Clustering as a Statistical Operator on Temporal Topology
2.4. Temporal Scales of Topological Organization
- Local scale: Short-range dependence appears as fine-scale fluctuations in , and low-lag autocorrelation of deviation signals as for small . Although computed on short windows, aggregates topological information across landscapes, providing mesoscopic resolution at a local time scale.
- Mesoscopic scale: Intermediate- and long-range dependence governing regime persistence appears as a sustained elevation in , persistent bursts in , and densely connected sub-graphs in linking windows with similar reconstructed geometry.
- Global scale is captured by network-level statistics derived from , including cluster coherence, cross-regime separation, tail dependence, and graph-based metrics.
2.5. Topological Null-Models
2.6. Temporal Topological Null-Models
- (1)
- Geometric structure of individual windows;
- (2)
- Temporal organization of evolving topological features;
- (3)
- Economic significance of clusters.
3. Materials and Methods
- (1)
- Geometric null: Topological features arise from random point-cloud geometry within windows.
- (2)
- Temporal null: Observed structure is indistinguishable from random temporal ordering of features.
- (3)
- Clustering label null: Regime assignments arise from random labeling rather than persistent structure.
3.1. Synthetic Data Generation
3.2. Tests of Local Homeomorphic Delay-Embeddings
- Injectivity: Stability of persistence diagrams, landscapes, and Fill-factor criteria.
- Noise regularity: Smooth variation in delay vectors, autocorrelation, and Mutual Information.
- Finite-dimensional forcing: Inter-window autocorrelation differences.
- Lipschitz stability of topological features: Bottleneck and Hilbert-space distances.
3.3. Jacobian-Based Diagnostics of Local Diffeomorphic Embedding
- Local invertibility: Non-singularity of the embedding Jacobians.
- Bounded distortion: Local bi-Lipschitz behavior quantified via Jacobian condition-number bounds.
- Smooth variation: Temporal stability of differential quantities across windows, ensuring that the local linearization does not fluctuate erratically across the window.
3.4. Computation of Topological Functionals
- (1)
- Vietoris–Rips (VR), yields and persistence; computed using the giotto-tda library version 0.6.2 in Python version 3.10.15;
- (2)
- One-Pass K-Clusters (1PKF) [49]: A graph-based filtration, which by construction yields persistence only.
3.5. Unsupervised Learning
3.6. Geometric Null Hypothesis
3.7. Temporal Topological Null Hypothesis Test
3.8. Topological Cluster-Label Null Hypothesis Tests
- (1)
- Feature-Space or Topological Separability: Rejection of the null indicates that clusters arise from genuine changes in Hilbert-space-embedded topological functionals.
- (2)
- Economic Distinguishability: Rejection of the null indicates that clusters correspond to distinct market regimes and tail-risk patterns, rather than arbitrary temporal partitions.
3.9. Left-Tail Power-Law Fitting and Distributional Analysis of Cluster Tails
4. Results of the Numerical Studies
4.1. Ground-Truth Synthetic Price Trajectories
4.2. Local Homeomorphism Tests for Delay Embeddings
4.2.1. Local Injectivity
4.2.2. Noise Regularity and Smoothness
4.2.3. Finite Dimensional Forcing, Low Dimensional Flow
4.2.4. Lipschitz Stability of Topological Summaries
4.2.5. Summary of Local Validity
4.3. Jacobian-Based Test of Local Diffeomorphism
4.4. Computation of Topological Functionals
4.5. Geometric Null Hypothesis Test
4.6. Temporal Null: Global Autocorrelation Test
4.7. Unsupervised Learning: Heavy-Tailed K-Means Clustering
4.8. Temporal Null Hypothesis Tests
4.8.1. Ground-Truth Alignment
4.8.2. Unsupervised Clustering Metrics
4.8.3. Upper- and Lower-Tail Dependence
4.9. Cluster-Label Null Hypothesis Tests
4.9.1. Feature-Space Validation
4.9.2. Economic Meaning Validation
4.9.3. Empirical Calibration of Cluster-Label Tests
4.10. Left-Tail Power-Law Fitting and Distributional Analysis of Cluster Tails
5. Discussion
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Tests of Local Homeomorphic Delay Embeddings
- Injectivity across embedding dimensions: The embedding dimension is sufficient to avoid geometric self-intersections, so stable reconstruction of topological features remains stable as additional delay coordinates are added.
- Noise regularity and smoothness of delay vectors: Small stochastic perturbations or sliding-window shifts do not produce large changes in the embedded point clouds, ensuring smooth temporal evolution along the observed trajectory.
- Finite-dimensional forcing: The stochastic input acts through an effectively low-dimensional structure over finite windows, preserving the local geometry so that topological summaries accurately reflect an underlying manifold rather than high-dimensional noise.
- Lipschitz stability of topological summaries: Persistence diagrams and landscapes evolve continuously along the trajectory, with small changes in delay vectors inducing small changes in topological features.
Appendix A.1. Theoretical Conditions (Stark et al. [32])
- (i)
- the forcing family is finite-dimensional and
- (ii)
- the noise distribution is absolutely continuous,
- (iii)
- the observation function is generic (residual),
- (iv)
- the parameterization of maps is generic,
- (v)
- the embedding dimension is sufficient (injectivity),
Appendix A.2. Construction of Delay-Embedded Point Clouds
Appendix A.3. Embedding Dimension Condition (Injectivity)
Appendix A.4. Noise Regularity and Smoothness: Choice of Delay
Appendix A.5. Finite-Dimensional Forcing and Embedding Fidelity
Appendix A.6. Lipschitz Stability of Persistence Diagrams, Landscapes and Lp-Norm Functionals
Appendix A.7. Algorithmic Implementation
| Algorithm A1. Tests of Local Homeomorphic Delay-Embeddings |
Input:
|
| Step 1: Select embedding dimension (Injectivity Test) |
and window length W,
|
Step 2: Select delay
: Noise Regularity (Local Smoothness Test)
|
| ensuring that the delay-embedded trajectory varies smoothly in time and the stochastic forcing does not decorrelate the delay window too rapidly. |
|
Step 3: Validate local stationarity (local smoothness and finite-dimensional forcing assumption)
|
|
|
| The proportion of windows satisfying this inequality provides a quantitative measure of local stationarity. |
Step 4: Lipschitz-stability of persistence diagrams and landscapes (Global Smoothness Test)
|
|
| Output: |
| Stable, interpretable topological features , their -norm functionals, and window-wise state-space geometry. |
| The pseudocode in executable style is provided in Supplementary Materials S2. |
Appendix B. Jacobian-Based Window-Wise Diagnostics of Diffeomorphic Delay-Embeddings
Appendix B.1. Motivation and Theoretical Context
- (1)
- Theorem 5 (Embedding/Homeomorphism)
- (2)
- Theorem 6 (Diffeomorphic Embedding)
Appendix B.2. Empirical Interpretation and Diagnostic Approach
- Full-rank differential: The Jacobian of the delay map is non-singular, implying local invertibility via the Inverse Function Theorem.
- Local bi-Lipschitz behavior: The map exhibits bounded geometric distortion.
- Regularity of the Jacobian field: The Jacobian varies smoothly along trajectories, reflecting -regularity.
Appendix B.3. Window-Wise Local Diffeomorphism Diagnostics
- Locally invertibility: By the Inverse Function Theorem, if a map has a non-singular Jacobian at point , i.e., , equivalently if then is locally invertible in the neighborhood of and hence locally one-to-one. Empirically we require where is the smallest singular value of the estimated Jacobian and is a numerical tolerance accounting for estimation error.
- Bounded local distortion: Local bi-Lipschitz behavior is enforced by bounding the Jacobian condition number . Empirically, this is evaluated as which prevents excessive anisotropic stretching or folding and ensures geometrically non-degenerate local behavior.
- Smooth variation in the Jacobian field: Smoothness of the embedding concerns the temporal regularity of the Jacobian field along trajectories rather than pointwise properties. As a practical, robust proxy of -regularity, we evaluate the window-wise variance of the smallest singular value: . This provides a robust empirical measure of the temporal regularity of the differential structure.
Appendix B.4. Jacobian of the Delay-Coordinate Map
Appendix B.5. Local Jacobian Estimation via kNN Linearization
- 1.
- Identify the set of k-nearest neighborsof in Euclidean distance.
- 2.
- Construct local displacement matrices
- 3.
- Jacobian EstimationEstimate the local Jacobian via least-squares regression:This procedure provides a consistent approximation to the Jacobian of the induced delay-coordinate dynamics when the embedding is and the neighborhood is sufficiently small. Boundary points lacking valid successors are excluded to avoid finite-sample artifacts.
Appendix B.6. Singular Value Diagnostics
- controls local invertibility;
- quantifies local geometric distortion (bi-Lipschitz behavior).
Appendix B.7. Diffeomorphism Criteria (Theorem 6 Tests)
- 1.
- Full Rank/Local Invertibilityensuring that the delay map is locally one-to-one.
- 2.
- Lipschitz Bounded Distortionpreventing excessive local stretching or folding.
- 3.
- Smooth Family ConditionEnsuring that the family of local Jacobians varies smoothly across time, as required for a family of diffeomorphisms.
Appendix B.8. Probabilistic vs. Mean-Based Window-Level Criterion
- 1.
- Probabilistic (Exact) Test:
- 2.
- Mean-based approximation (Optional)
| Algorithm A2. Jacobian-Based Window-Wise Diagnostics of Diffeomorphic Delay-Embeddings |
Input:
|
| Step 1: Delay-Coordinate Reconstruction |
| Construct the delay-embedding using the selected parameters . For each time index , define the delay vector |
| Partition the reconstructed trajectory into overlapping sliding windows |
| Each window defines a local delay-embedded point cloud |
| Step 2: Local Jacobian Estimation |
For each delay vector with a valid temporal successor:
|
| Using k-nearest neighbors in , |
|
|
| Points with insufficient neighbors or rank-deficient are excluded. |
| The Jacobian estimates the local linearization of the induced delay-coordinate time-shift map on reconstruction space. |
| Step 3: Singular Value Diagnostics and Local Invertibility |
For each estimated Jacobian ,
|
|
|
|
| Step 4: Temporal Regularity of the Jacobian Field |
| Within each window , assess smoothness of the Jacobian family via the empirical variability of singular values. |
| Define |
| Temporal regularity is deemed satisfied if |
| Indicating a smoothly varying family of local diffeomorphisms consistent with regularity. |
| Step 5: Window-Level Aggregation and Stability Criterion |
| For each sliding window , compute the admissible proportion: |
A window is declared locally diffeomorphic if
|
| Optionally compute the approximation based on window-mean Jacobian values: |
A window is declared locally diffeomorphic if
|
Output:
|
References
- Messina, E.; Toscani, D. Hidden Markov models for scenario generation. IMA J. Manag. Math. 2008, 19, 379–401. [Google Scholar] [CrossRef]
- Palma, G.R.; Skoczen, M.; Maguire, P. Asset price movement prediction using empirical mode decomposition and Gaussian mixture models. arXiv 2025, arXiv:2503.20678. [Google Scholar] [CrossRef]
- Horvath, B.; Zacharia, I.; Aitor, M. Clustering Market Regimes Using the Wasserstein Distance. arXiv 2021, arXiv:2110.11848. [Google Scholar] [CrossRef]
- Luan, Q.; Hamp, J. Automated Regime Detection in Multidimensional Time Series Data Using Sliced Wasserstein K-Means Clustering. arXiv 2023, arXiv:2310.01285. [Google Scholar] [CrossRef]
- Cao, Y.; Leung, P.; Monod, A. K-Means Clustering for Persistence Homology. Adv. Data Anal. Classif. 2025, 19, 95–119. [Google Scholar] [CrossRef]
- Sayde, M.; Fahs, J.; Abou-Faycal, I. Heavy-Tailed Linear Regression and K-Means. Information 2025, 16, 184. [Google Scholar] [CrossRef]
- Edelsbrunner, H.; Letscher, D.; Zomorodian, A. Topological persistence and simplification. Discret. Comput. Geom. 2002, 28, 511–533. [Google Scholar] [CrossRef]
- Zomorodian, A.; Carlson, G. Computing Persistent Homology. Discret. Comput. Geom. 2005, 33, 249–274. [Google Scholar] [CrossRef]
- Leykam, D.; Angelakis, D. Topological data analysis and machine learning. Adv. Phys. X 2023, 8. [Google Scholar] [CrossRef]
- Fama, E.F. Mandelbrot and the Stable Paretian Hypothesis. J. Bus. 1963, 36, 420–429. [Google Scholar] [CrossRef]
- Longin, F. The Asymptotic Distribution of Extreme Stock Market Returns. J. Bus. 1996, 69, 383–408. [Google Scholar] [CrossRef]
- Cont, R. Empirical properties of asset returns: Stylized facts and statistical issues. Quant. Financ. 2001, 1, 223–236. [Google Scholar] [CrossRef]
- Gabaix, X.; Gopikrishnan, P.; Plerou, V.; Stanley, H.E. A Theory of Limited Liquidity and Large Investors Causing Spikes in Stock Market Volatility and Trading Volume. J. Eur. Econ. Assoc. 2007, 5, 564–573. [Google Scholar] [CrossRef][Green Version]
- Gabaix, X. Power Laws in Economics: An Introduction. J. Econ. Perspect. 2016, 30, 185–206. [Google Scholar] [CrossRef]
- Takens, F. Detecting Strange Attractors in Turbulence. In Dynamical Systems and Turbulence, Warwick 1980; Rand, D.A., Young, L.S., Eds.; Lecture Notes in Mathematics; Springer: Berlin/Heidelberg, Germany, 2006; Volume 898, pp. 366–381. [Google Scholar] [CrossRef]
- Gidea, M.; Katz, Y. Topological Data Analysis of Financial Time Series: Landscapes of Crashes. Phys. A Stat. Mech. Its Appl. 2018, 491, 820–834. [Google Scholar] [CrossRef]
- Gidea, M. Topology Data Analysis of Critical Transitions in Financial Networks. arXiv 2017, arXiv:1701.06081. [Google Scholar] [CrossRef]
- Yen, P.T.W.; Cheong, S.A. Using topological data analysis (TDA) and persistent homology to analyze the stock markets in Singapore and Taiwan. Front. Phys. 2021, 9, 572216. [Google Scholar] [CrossRef]
- Yen, P.T.W.; Xia, K.; Cheong, S.A. Understanding Changes in the Topology and Geometry of Financial Market Correlations during a Market Crash. Entropy 2021, 23, 1211. [Google Scholar] [CrossRef]
- Nie, C.X. Unveiling complex nonlinear dynamics in stock markets through topological data analysis. Phys. A Stat. Mech. Its Appl. 2025, 680, 131025. [Google Scholar] [CrossRef]
- Katz, Y.A.; Biem, A. Time-resolved topological data analysis of market instabilities. Phys. A Stat. Mech. Its Appl. 2021, 571, 125816. [Google Scholar] [CrossRef]
- Akingbade, S.W.; Gidea, M.; Manzi, M.; Nateghi, V. Why topological analysis detects financial bubbles? arXiv 2023, arXiv:2304.06877. [Google Scholar] [CrossRef]
- Ismail, M.R.; Noorani, M.S.M.; Ismail, M.; Razak, F.A.; Alias, M.S. Early warning signals of financial crises using persistent homology. Phys. Stat. Mech. Appl. 2022, 586, 126459. [Google Scholar] [CrossRef]
- Rudkin, S.; Qiu, W.; Dlotko, P. Uncertainty, volatility and the persistence norms of financial time series. Expert Syst. Appl. 2023, 223, 119894. [Google Scholar] [CrossRef]
- Valdivia, A.D. Topological variability in financial markets. Quant. Financ. Econ. 2023, 7, 391–402. [Google Scholar] [CrossRef]
- Ruiz-Ortiz, M.A.; Gómez-Larrañaga, J.C.; Rodríguez-Viorato, J. A persistent-homology-based turbulence index & some applications of TDA on financial markets. arXiv 2022, arXiv:2203.05603. [Google Scholar] [CrossRef]
- Aguilar, A.; Ensor, K. Topology Data Analysis Using Mean Persistence Landscapes in Financial Crashes. J. Math. Financ. 2020, 10, 648–678. [Google Scholar] [CrossRef]
- Aromi, L.L.; Katz, Y.; Vives, J. Topological features of multivariate distributions: Dependency on the covariance matrix. Commun. Nonlinear Sci. Numer. Simul. 2021, 103, 105996. [Google Scholar] [CrossRef]
- Souto, H.G. Topological tail dependence: Evidence from forecasting realized volatility. J. Financ. Data Sci. 2023, 9, 100107. [Google Scholar] [CrossRef]
- Souto, H.G.; Moradi, A. A generalization of the Topological Tail Dependence theory: From indices to individual stocks. Decis. Anal. J. 2024, 12, 100512. [Google Scholar] [CrossRef]
- Gidea, M.; Goldsmith, D.; Katz, Y.; Roldan, P.; Shmalo, Y. Topological recognition of critical transitions in time series of cryptocurrencies. Phys. A Stat. Mech. Its Appl. 2020, 548, 123843. [Google Scholar] [CrossRef]
- Stark, J.; Broomhead, D.S.; Davies, M.E.; Huke, J. Takens embedding theorems for forced and stochastic systems. Nonlinear Anal. Theory Methods Appl. 1997, 30, 5303–5314. [Google Scholar] [CrossRef]
- Stark, J.; Broomhead, D.; Davies, M.; Huke, J. Delay Embeddings for Forced Systems. II. Stochastic Forcing. J. Nonlinear Sci. 2003, 13, 519–577. [Google Scholar] [CrossRef]
- Bubenik, P. Statistical Topological Data Analysis using Persistence Landscapes. J. Mach. Learn. Res. 2015, 16, 77–102. [Google Scholar] [CrossRef]
- Adams, H.; Emerson, T.; Kirby, M.; Neville, R.; Peterson, C.; Shipman, P.; Chepushtanova, S.; Hanson, E.; Motta, F.; Ziegelmeier, L. Persistence images: A stable vector representation of persistent homology. J. Mach. Learn. Res. 2017, 18, 1–35. [Google Scholar] [CrossRef]
- Bobrowski, O.; Skraba, P. A universal null-distribution for topological data analysis. Sci. Rep. 2023, 13, 12274. [Google Scholar] [CrossRef]
- Tan, E.; Algar, S.D.; Corrêa, D.; Stemler, T.; Small, M. Network representations of attractors for change point detection. Commun. Phys. 2023, 6, 340. [Google Scholar] [CrossRef]
- Myers, A.; Muñoz, D.; Khasawneh, F.A.; Munch, E. Temporal network analysis using zigzag persistence. EPJ Data Sci. 2023, 12, 6. [Google Scholar] [CrossRef]
- Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
- Jia, H.; Ding, S.; Xu, X.; Nie, R. The latest research progress on spectral clustering. Neural Comput. Appl. 2014, 24, 1477–1486. [Google Scholar] [CrossRef]
- Li, H.; Sewell, D.K. Model-based edge clustering for weighted networks with a noise component. Comput. Stat. Data Anal. 2025, 209, 108172. [Google Scholar] [CrossRef]
- Kontak, M.; Vidal, J.; Tierny, J. Statistical Parameter Selection for Clustering Persistence Diagrams. arXiv 2019, arXiv:1910.08398. [Google Scholar] [CrossRef]
- Carriere, M.; Cuturi, M.; Oudot, S. Sliced Wasserstein Kernel for Persistence Diagrams. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; pp. 664–673. [Google Scholar] [CrossRef]
- Davies, T.; Aspinall, J.; Wilder, B.; Tran-Thanh, L. Fuzzy c-Means Clustering for Persistence Diagrams. arXiv 2020, arXiv:2006.02796. [Google Scholar] [CrossRef]
- Cont, R.; Potters, M.; Bouchaud, J.-P. Scaling in Stock Market Data: Stable Laws and Beyond. arXiv 1997, arXiv:cond-mat/9705087. [Google Scholar] [CrossRef]
- Bouchaud, J.P.; Potters, M. Apparent Multifractality in Financial Time Series. arXiv 1999, arXiv:cond-mat/9906347. [Google Scholar] [CrossRef]
- Ding, Z.; Clive, W.J.; Granger, C.W.J.; Engle, R.F. A long memory property of stock market returns and a new model. J. Empir. Financ. 1993, 1, 83–106. [Google Scholar] [CrossRef]
- Robinson, A.; Turner, K. Hypothesis testing for topological data analysis. J. Appl. Comput. Topol. 2017, 1, 241–261. [Google Scholar] [CrossRef]
- Bobrowski, O.; Skraba, P. Cluster Persistence for Weighted Graphs. Entropy 2023, 25, 1587. [Google Scholar] [CrossRef]
- Rand, W.M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 1971, 66, 846–850. [Google Scholar] [CrossRef]
- Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
- Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
- Caliński, T.; Harabasz, J. A dendrite method for cluster analysis. Commun. Stat. 1974, 3, 1–27. [Google Scholar] [CrossRef]
- Davies, D.L.; Bouldin, D. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 1, 224–227. [Google Scholar] [CrossRef] [PubMed]
- Gretton, A.; Borgwardt, K.M.; Rasch, M.J.; Schölkopf, B.; Smola, A. A Kernel Two-Sample Test. J. Mach. Learn. Res. 2012, 13, 723–773. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar] [CrossRef]
- Clauset, A.; Shalizi, C.R.; Newman, M.E.J. Power-law distributions in empirical data. SIAM Rev. 2009, 51, 661–703. [Google Scholar] [CrossRef]
- Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of Persistence Diagrams. Discret. Comput. Geom. 2007, 37, 103–120. [Google Scholar] [CrossRef]
- Buzug, T.; Pfister, G. Comparison of algorithms calculating optimal embedding parameters for delay time coordinates. Phys. D 1992, 58, 127–137. [Google Scholar] [CrossRef]
















| Regime | Process | Parameter | Value 1 |
|---|---|---|---|
| Bull | Fat-tail | True or False | True = Student t |
| Drift | 0.07 | ||
| Volatility | 0.20 | ||
| Deg. Freedom | 20 | ||
| Jump Intensity | 0.5 | ||
| Jump Mean, std. | 0.01,0.01 | ||
| Bear | Fat-tail | True or False | True = Student t |
| Drift | −0.15 | ||
| Volatility | 0.30 | ||
| Deg. Freedom | 5 | ||
| Jump Intensity | 2.5 | ||
| Jump Mean, std. | −0.02,0.01 |
| Wasserstein Distances Across Embedding Dimensions | ||||||||
|---|---|---|---|---|---|---|---|---|
| Homology | d = 3 | d = 4 | d = 5 | d = 6 | d = 7 | d = 8 | d = 9 | d = 10 |
| 0.002788 | 0.002493 | 0.002318 | 0.00228 | 0.002355 | 0.002528 | 0.002754 | 0.003023 | |
| 0.000411 | 0.000505 | 0.000563 | 0.000588 | 0.0006 | 0.000588 | 0.00058 | 0.000563 | |
| Distances Across Embedding Dimensions | ||||||||
|---|---|---|---|---|---|---|---|---|
| Homology | d = 3 | d = 4 | d = 5 | d = 6 | d = 7 | d = 8 | d = 9 | d = 10 |
| 0.002601 | 0.00282 | 0.002361 | 0.00174 | 0.002152 | 0.002265 | 0.001906 | 0.001577 | |
| 0.000248 | 0.000305 | 0.000328 | 0.000409 | 0.00031 | 0.000259 | 0.000334 | 0.000287 | |
| Window-to-Window Variation | |||||
|---|---|---|---|---|---|
| Mean | Max | 90th perc. | 95th perc. | 99th perc. | |
| 1 | 0.003566 | 0.028987 | 0.006046 | 0.007162 | 0.010146 |
| 2 | 0.003651 | 0.021807 | 0.005936 | 0.006948 | 0.009584 |
| 3 | 0.003653 | 0.022596 | 0.00593 | 0.006932 | 0.009563 |
| 4 | 0.003653 | 0.024183 | 0.00594 | 0.006932 | 0.009482 |
| 5 | 0.003652 | 0.021622 | 0.005939 | 0.006912 | 0.0096 |
| 6 | 0.003647 | 0.022339 | 0.005927 | 0.006961 | 0.009628 |
| 7 | 0.003652 | 0.022091 | 0.005932 | 0.006928 | 0.009602 |
| 8 | 0.003653 | 0.024198 | 0.005936 | 0.006911 | 0.009646 |
| 9 | 0.003653 | 0.023394 | 0.005919 | 0.006961 | 0.009653 |
| 10 | 0.003651 | 0.022713 | 0.005934 | 0.006938 | 0.00968 |
| Percentage of Windows |ΔACF| ≤ 0.15 | |||||
|---|---|---|---|---|---|
| Criteria | Lag 1 | Lag 2 | Lag 3 | Lag 4 | Lag 5 |
| % Windows |ΔACF| ≤ 0.15 | 94.45 | 94.45 | 94.67 | 94.73 | 94.81 |
| Diagnostic | Windows | /Window | Window | Rate | Mean Per Window |
|---|---|---|---|---|---|
| 1.000 | 0.9982 | 13 | 0.0003 | 0.3439 | |
| 0.997 | 0.9952 | 125 | 0.003 | 31.7410 | |
| 1.000 | - | - | - | 0.0496 | |
| All conditions jointly | 0.997 | - | - | - |
| Ensemble | Full | Full | Sample | Sample |
| Number diagrams tested | 39,291 | 39,291 | 4000 | 4000 |
| Total ℓ-values computed | 667,947 | 114,785 | 68,000 | 11,772 |
| Mean p-value | 0.6704 | 0.5557 | 0.5704 | 0.5555 |
| Signal points α = 0.1 (uncorrected) | 0 | 133 | 0 | 9 |
| % Signal points (uncorrected) | 0% | 0.1% | 0% | 0.1% |
| Signal points α = 0.1 (Bonferroni) | 0 | 0 | 0 | 0 |
| % Signal points (Bonferroni) | 0% | 0% | 0% | 0% |
| Functional | Statistic | Observed | Surrogate Mean * | Surrogate Std. * | z-Score | p-Value | Significant Lags | Statistical Significance |
|---|---|---|---|---|---|---|---|---|
| Mean | 1.36 × 10−7 | −1.38 × 10−9 | ±2.07 × 10−8 | 6.64 | 0.0020 | Yes | ||
| 0.7950 | - | - | - | 0.0020 | Yes | |||
| 2.1823 | - | - | - | 0.0020 | Yes | |||
| Mean | 1.55 × 10−4 | 1.69 × 10−4 | ±1.64 × 10−7 | −80.6 | 1.000 | No | ||
| 0.7361 | - | - | - | 0.0020 | Yes | |||
| 1.6291 | - | - | - | 0.0020 | Yes |
| Functional | Statistic | Observed | Surrogate Mean * | Surrogate Std. * | z-Score | p-Value | Significant Lags | Statistical Significance |
|---|---|---|---|---|---|---|---|---|
| Mean | 1.19 × 10−7 | −3.090 × 10−9 | 3.27 × 10−8 | 3.74 | 0.002 | Yes | ||
| 0.9035 | - | - | - | 0.002 | Yes | |||
| 4.4984 | - | - | - | 0.002 | Yes | |||
| Mean | 1.53 × 10−4 | 2.251 × 10−4 | 2.41 × 10−7 | 299.79 | 1.000 | No | ||
| 0.9031 | - | - | - | 0.002 | Yes | |||
| 6.0798 | - | - | - | 0.002 | Yes |
| Windows | Mean Return 5 min | Vol. /Std. | Ann. Mean Return | Ann. Vol | Skew | Kurtosis | |
|---|---|---|---|---|---|---|---|
| Cluster 0 | 26,543 | 5.093 × 10−7 | 0.001445 | 0.01 | 0.201096 | −0.004068 | 0.341958 |
| Cluster 1 | 12,728 | −1.711 × 10−5 | 0.00195 | −0.3362 | 0.274687 | −0.003642 | 4.538515 |
| Windows | Mean Return 5 min | Vol. /Std. | Ann. Mean Return | Ann. Vol | Skew | Kurtosis | |
|---|---|---|---|---|---|---|---|
| Cluster 0 | 25,047 | −0.000004 | 0.001400 | −0.0786 | 0.196630 | 0.004704 | 2.416410 |
| Cluster 1 | 14,245 | −0.000008 | 0.001952 | −0.1573 | 0.273702 | −0.015169 | 4.981497 |
| VR-Filtration | 1PFK-Filtration | |||||
|---|---|---|---|---|---|---|
| Metrics | Observed | Mean Null | p-Value | Observed | Mean Null | p-Value |
| Total Accuracy (TA) | 0.8026 | 0.5708 | 0.0000 | 0.7545 | 0.5549 | 0.0000 |
| Bear-Recall (BR) | 0.7149 | 0.3231 | 0.0000 | 0.6951 | 0.3630 | 0.0000 |
| Adjusted Rand Index (ARI) | 0.3593 | 0.0001 | 0.0000 | 0.2501 | 0.0000 | 0.0000 |
| VR-Filtration | 1PFK-Filtration | |||||
|---|---|---|---|---|---|---|
| Metrics | Observed | Mean Null | p-Value | Observed | Mean Null | p-Value |
| Silhouette (S) | 0.5348 | −0.0003 | 0.000 | 0.5472 | −0.0005 | 0.000 |
| Calinski–Harabasz (CH) | 24,187.0 | 0.9391 | 0.000 | 28,823.3 | 0.9412 | 0.000 |
| Davies–Bouldin Index (DB) | 0.7553 | 570.32 | 0.000 | 0.7186 | 643.32 | 0.002 |
| MMD between cluster separation | 0.70709 | 0.0093 | 0.000 | 0.7596 | 0.0092 | 0.000 |
| MMD within-cluster 0 self-similarity | 0.00148 | 0.0093 | 0.5374 | 0.00121 | 0.0092 | 0.5391 |
| MMD within-cluster 1 self-similarity | 0.00160 | 0.0093 | 0.4609 | 0.00156 | 0.0092 | 0.4534 |
| VR-Filtration | 1PFK-Filtration | |||||
|---|---|---|---|---|---|---|
| Metrics | Observed | Mean Null | p-Value | Observed | Mean Null | p-Value |
| Upper Tail-Dependence | 0.034 | 0.0249 | 0.7850 | 0.0183 | 0.0249 | 0.4930 |
| Lower Tail-Dependence | 0.0127 | 0.025 | 0.1730 | 0.0230 | 0.0256 | 0.3300 |
| VR-Filtration | 1PFK-Filtration | ||||||
|---|---|---|---|---|---|---|---|
| Tests | Metric | Observed | Mean Null | p-Value | Observed | Mean Null | p-Value |
| Kruskal–Wallis | L2_H0 | 25,007 | 0.9950 | 0.0010 | 27,241 | 0.9847 | 0.0010 |
| Kruskal–Wallis | L2_H1 | 223.62 | 0.9636 | 0.0010 | - | - | - |
| Kruskal–Wallis | 504.01 | 1.0003 | 0.0010 | - | - | - | |
| Mann–Whitney U | L2_H0 | 1.69 × 108 | 8.39 × 105 | 0.0010 | 1.78 × 108 | 0.86 × 1010 | 0.0010 |
| Mann–Whitney U | L2_H1 | 1.57 × 107 | 8.45 × 105 | 0.0010 | - | - | - |
| Mann–Whitney U | 2.36 × 107 | 8.68 × 105 | 0.0010 | - | - | - | |
| Kolmogorov–Smirnov | L2_H0 | 0.9990 | 0.0094 | 0.0010 | 1.000 | 0.0090 | 0.0010 |
| Kolmogorov–Smirnov | L2_H1 | 0.1046 | 0.0093 | 0.0010 | - | - | - |
| Kolmogorov–Smirnov | 0.1196 | 0.0094 | 0.0010 | - | - | - | |
| Anderson–Darling | L2_H0 | 18,421.4 | −0.0182 | 0.0010 | 19,029 | −0.0212 | 0.0010 |
| Anderson–Darling | L2_H1 | 313.89 | 0.0170 | 0.0010 | - | - | - |
| Anderson–Darling | 408.87 | −0.0153 | 0.0010 | - | - | - | |
| VR-Filtration | 1PFK-Filtration | ||||||
|---|---|---|---|---|---|---|---|
| Tests | Metric | Observed | Mean Null | p-Value | Observed | Mean Null | p-Value |
| Kruskal–Wallis | Return | 0.3691 | 1.0037 | 0.5774 | 0.3449 | 1.0173 | 0.8362 |
| Mann–Whitney U | Return | 638,142 | 837,865 | 0.5495 | 201,857 | 875,962 | 0.8691 |
| Kolmogorov–Smirnov | Return | 0.0490 | 0.0091 | 0.0010 | 0.0422 | 0.0090 | 0.0010 |
| Anderson–Darling | Return | 122.15 | 0.0163 | 0.0010 | 118.47 | −0.0099 | 0.0010 |
| Levene | Vol roll. | 5646.48 | 1.1666 | 0.0010 | 4,008,94 | 1.0378 | 0.0010 |
| Levene | Vol ann. | 5648.48 | 0.9749 | 0.0010 | 4,008.94 | 1.0041 | 0.0010 |
| Test Class | Metric Type | VR-Filtration | 1PFK-Filtration | Interpretation |
|---|---|---|---|---|
| Feature-Space | Size = 0 | Size = 0 | Strong, filtration robust | |
| Power = 1 | Power = 1 | Geometric separation | ||
| Feature-Space | Size = 0 | Size = 0 | Persistent topological | |
| Power = 1 | Power = 1 | discrimination | ||
| Economic (Returns) | Kruskal | Size controlled | Size controlled | No systematic mean return |
| Mann–Whitney | Power = 0 | Power = 0 | differences | |
| Economic (Returns) | KS, AD | Size = 0 | Size = 0 | Regime-dependent distributional |
| Power = 1 | Power = 1 | and tail behavior | ||
| Economic (Volatility) | Levene | Size = 0 | Size = 0 | Distinct volatility regimes |
| Power = 1 | Power = 1 |
| Test Class | KS Statistic | KS p-Value | AD Statistic | Ad p-Value | Power-Law vs. Exp (R2) | Power-Law vs. Logn (R2) | ||
|---|---|---|---|---|---|---|---|---|
| Cluster 0 | 7.4438 | 0.034 | 0.0430 | 0.577 | 1.146 | 0.290 | −4.1133 | −3.9712 |
| p = 0.016 | p = 0.060 | |||||||
| Cluster 1 | 4.5813 | 0.039 | 0.0269 | 0.953 | 0.441 | 0.803 | 5.9846 | −0.9089 |
| p = 0.223 | p = 0.406 |
| Test Class | KS Statistic | KS p-Value | AD Statistic | AD p-Value | Power-Law vs. Exp (R2) | Power-Law vs. Logn (R2) | ||
|---|---|---|---|---|---|---|---|---|
| Cluster 0 | 12.6486 | 0.035 | 0.0568 | 0.647 | 1.1284 | 0.257 | −1.738 | −2.741 |
| p = 0.018 | p = 0.183 | |||||||
| Cluster 1 | 4.9498 | 0.040 | 0.0234 | 0.963 | 0.3322 | 0.890 | 8.924 | −0.422 |
| p = 0.098 | p = 0.583 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lamothe-Fernández, P.; Rojas, E.; Bayuk, A. Topology-Based Machine Learning and Regime Identification in Stochastic, Heavy-Tailed Financial Time Series. Mathematics 2026, 14, 1098. https://doi.org/10.3390/math14071098
Lamothe-Fernández P, Rojas E, Bayuk A. Topology-Based Machine Learning and Regime Identification in Stochastic, Heavy-Tailed Financial Time Series. Mathematics. 2026; 14(7):1098. https://doi.org/10.3390/math14071098
Chicago/Turabian StyleLamothe-Fernández, Prosper, Eduardo Rojas, and Andriy Bayuk. 2026. "Topology-Based Machine Learning and Regime Identification in Stochastic, Heavy-Tailed Financial Time Series" Mathematics 14, no. 7: 1098. https://doi.org/10.3390/math14071098
APA StyleLamothe-Fernández, P., Rojas, E., & Bayuk, A. (2026). Topology-Based Machine Learning and Regime Identification in Stochastic, Heavy-Tailed Financial Time Series. Mathematics, 14(7), 1098. https://doi.org/10.3390/math14071098

