# A Topological Machine Learning Pipeline for Classification

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Mathematical Background

#### 2.1. Algebraic Topology

#### 2.2. Persistent Homology

## 3. Topological Pipeline

#### 3.1. Data

#### 3.1.1. Point Cloud Data

#### 3.1.2. Images

#### 3.1.3. Graphs

#### 3.2. Filtrations

#### 3.2.1. Filtration for Point Clouds

#### 3.2.2. Filtration for Images

#### 3.2.3. Filtration for Graphs

#### 3.3. Persistence Diagrams

#### 3.4. Vectorization Methods

#### 3.4.1. Persistence Image

#### 3.4.2. Persistence Landscape

#### 3.4.3. Persistence Silhouette

#### 3.4.4. Betti Curve

#### 3.5. Machine Learning Classifiers

#### 3.6. Further Improvements

## 4. Results

#### 4.1. Dynamic Dataset

#### 4.2. Mnist

#### 4.2.1. Height Filtration

#### 4.2.2. Radial Filtration

#### 4.2.3. Density Filtration

#### 4.2.4. Improved Pipeline

#### 4.2.5. Comparison with Other TDA Approaches

#### 4.3. Fmnist

#### 4.4. Collab

## 5. Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag.
**2017**, 34, 18–42. [Google Scholar] [CrossRef] - Monti, F.; Boscaini, D.; Masci, J.; Rodola, E.; Svoboda, J.; Bronstein, M.M. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5115–5124. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst.
**2012**, 25. Available online: https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf (accessed on 1 February 2022). [CrossRef] - Bergomi, M.G.; Frosini, P.; Giorgi, D.; Quercioli, N. Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning. Nat. Mach. Intell.
**2019**, 1, 423–433. [Google Scholar] [CrossRef] - Conti, F.; Frosini, P.; Quercioli, N. On the Construction of Group Equivariant Non-Expansive Operators via Permutants and Symmetric Functions. Front. Artif. Intell.
**2022**, 5, 786091. [Google Scholar] [CrossRef] [PubMed] - Carlsson, G. Topology and data. Bull. Am. Math. Soc.
**2009**, 46, 255–308. [Google Scholar] [CrossRef] - Lum, P.; Singh, G.; Lehman, A.; Ishkanov, T.; Vejdemo-Johansson, M.; Alagappan, M.; Carlsson, J.; Carlsson, G. Extracting insights from the shape of complex data using topology. Sci. Rep.
**2013**, 3, 1236. [Google Scholar] [CrossRef] - Tauzin, G.; Lupo, U.; Tunstall, L.; Pérez, J.B.; Caorsi, M.; Medina-Mardones, A.M.; Dassatti, A.; Hess, K. giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration. J. Mach. Learn. Res.
**2021**, 22, 1–6. [Google Scholar] - Nielson, J.L.; Paquette, J.; Liu, A.W.; Guandique, C.F.; Tovar, C.A.; Inoue, T.; Irvine, K.A.; Gensel, J.C.; Kloke, J.; Petrossian, T.C.; et al. Topological data analysis for discovery in preclinical spinal cord injury and traumatic brain injury. Nat. Commun.
**2015**, 6, 1–12. [Google Scholar] - Chazal, F.; Fasy, B.T.; Lecci, F.; Rinaldo, A.; Wasserman, L. Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the Thirtieth Annual Symposium on Computational Geometry, Kyoto, Japan, 8–11 June 2014; pp. 474–483. [Google Scholar]
- Bubenik, P. Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res.
**2015**, 16, 77–102. [Google Scholar] - Umeda, Y. Time series classification via topological data analysis. Inf. Media Technol.
**2017**, 12, 228–239. [Google Scholar] [CrossRef] - Adams, H.; Emerson, T.; Kirby, M.; Neville, R.; Peterson, C.; Shipman, P.; Chepushtanova, S.; Hanson, E.; Motta, F.; Ziegelmeier, L. Persistence images: A stable vector representation of persistent homology. J. Mach. Learn. Res.
**2017**, 18, 1–35. [Google Scholar] - Chen, C.; Ni, X.; Bai, Q.; Wang, Y. A topological regularizer for classifiers via persistent homology. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Naha, Japan, 16–18 April 2019; pp. 2573–2582. [Google Scholar]
- Pun, C.S.; Xia, K.; Lee, S.X. Persistent-Homology-based Machine Learning and its Applications—A Survey. arXiv
**2018**, arXiv:1811.00252. [Google Scholar] [CrossRef] - Corbet, R.; Fugacci, U.; Kerber, M.; Landi, C.; Wang, B. A kernel for multi-parameter persistent homology. Comput. Graph. X
**2019**, 2, 100005. [Google Scholar] [CrossRef] [PubMed] - Hatcher, A. Algebraic Topology; Cambridge University Press: Cambridge, UK, 2002; p. xii+544. [Google Scholar]
- Verri, A.; Uras, C.; Frosini, P.; Ferri, M. On the use of size functions for shape analysis. Biol. Cybern.
**1993**, 70, 99–107. [Google Scholar] [CrossRef] - Epstein, C.L.; Carlsson, G.E.; Edelsbrunner, H. Topological data analysis. Inverse Probl.
**2011**, 27, 120201. [Google Scholar] - Carlsson, G.; Zomorodian, A.; Collins, A.; Guibas, L. Persistence Barcodes for Shapes. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, Nice, France, 8–10 July 2004; Association for Computing Machinery: New York, NY, USA, 2004; pp. 124–135. [Google Scholar] [CrossRef]
- Frosini, P. A distance for similarity classes of submanifolds of a Euclidean space. Bull. Aust. Math. Soc.
**1990**, 42, 407–415. [Google Scholar] [CrossRef] - Biasotti, S.; Cerri, A.; Frosini, P.; Giorgi, D.; Landi, C. Multidimensional size functions for shape comparison. J. Math. Imaging Vis.
**2008**, 32, 161–179. [Google Scholar] [CrossRef] - Akkiraju, N.; Edelsbrunner, H.; Facello, M.; Fu, P.; Mucke, E.; Varela, C. Alpha shapes: Definition and software. In Proceedings of the 1st International Computational Geometry Software Workshop, Minneapolis, MN, USA, 1995; Volume 63, p. 66. [Google Scholar]
- Kaczynski, T.; Mischaikow, K.M.; Mrozek, M. Computational Homology; Springer: Berlin/Heidelberg, Germany, 2004; Volume 3. [Google Scholar]
- Biasotti, S.; De Floriani, L.; Falcidieno, B.; Frosini, P.; Giorgi, D.; Landi, C.; Papaleo, L.; Spagnuolo, M. Describing shapes by geometrical-topological properties of real functions. ACM Comput. Surv. (CSUR)
**2008**, 40, 1–87. [Google Scholar] [CrossRef] - Carlsson, G.; Zomorodian, A. The theory of multidimensional persistence. Discret. Comput. Geom.
**2009**, 42, 71–93. [Google Scholar] [CrossRef] - Edelsbrunner, H.; Harer, J. Persistent homology-a survey. Contemp. Math.
**2008**, 453, 257–282. [Google Scholar] - Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of persistence diagrams. Discret. Comput. Geom.
**2007**, 37, 103–120. [Google Scholar] [CrossRef] - The GUDHI Project. GUDHI User and Reference Manual, 3.5.0 ed.; GUDHI Editorial Board; 2022; Available online: https://gudhi.inria.fr/doc/3.5.0/ (accessed on 1 May 2022).
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn.
**1995**, 20, 273–297. [Google Scholar] [CrossRef] - Breiman, L. Random forests. Mach. Learn.
**2001**, 45, 5–32. [Google Scholar] [CrossRef] - Hoerl, A.E.; Kennard, R.W. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics
**1970**, 12, 55–67. [Google Scholar] [CrossRef] - Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. (Methodol.)
**1996**, 58, 267–288. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Allen, D.M. The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction. Technometrics
**1974**, 16, 125–127. Available online: http://xxx.lanl.gov/abs/https://www.tandfonline.com/doi/pdf/10.1080/00401706.1974.10489157 (accessed on 1 November 2021). [CrossRef] - Chung, Y.M.; Lawson, A. Persistence Curves: A Canonical Framework for Summarizing Persistence Diagrams. 2021. Available online: http://xxx.lanl.gov/abs/1904.07768 (accessed on 1 February 2022).
- Deng, L. The mnist database of handwritten digit images for machine learning research. IEEE Signal Process. Mag.
**2012**, 29, 141–142. [Google Scholar] [CrossRef] - Garin, A.; Tauzin, G. A Topological “Reading” Lesson: Classification of MNIST using TDA. In Proceedings of the 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), Boca Raton, FL, USA, 16–19 December 2019; pp. 1551–1556. [Google Scholar] [CrossRef]
- Turner, K.; Mukherjee, S.; Boyer, D.M. Persistent Homology Transform for Modeling Shapes and Surfaces. 2014. Available online: http://arxiv.org/abs/1310.1030 (accessed on 1 February 2022).
- Lida, K.; Dłotko, P.; Scolamiero, M.; Levi, R.; Shillcock, J.; Hess, K.; Markram, H. A Topological Representation of Branching Neuronal Morphologies. Neuroinformatics
**2018**, 16, 3–13. [Google Scholar] [CrossRef] - Barnes, D.; Polanco, L.; Perea, J.A. A Comparative Study of Machine Learning Methods for Persistence Diagrams. Front. Artif. Intell.
**2021**, 4, 681174. [Google Scholar] [CrossRef] - Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. 2017. Available online: http://arxiv.org/abs/1708.07747 (accessed on 1 February 2022).
- Yanardag, P.; Vishwanathan, S. Deep Graph Kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1365–1374. [Google Scholar] [CrossRef]
- Carrière, M.; Chazal, F.; Ike, Y.; Lacombe, T.; Royer, M.; Umeda, Y. PersLay: A Neural Network Layer for Persistence Diagrams and New Graph Topological Signatures. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, Palermo, Italy, 26–28 August 2020; Volume 108, pp. 2786–2796. [Google Scholar]
- Kim, T.K. T test as a parametric statistic. Korean J. Anesthesiol.
**2015**, 68, 540–546. [Google Scholar] [CrossRef] [PubMed] - Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods
**2020**, 17, 261–272. [Google Scholar] [CrossRef] [PubMed] [Green Version]

**Figure 1.**The sphere ${S}^{2}$ (

**a**) bounds a 2-dimensional void. The torus ${T}^{2}$ (

**b**) bounds a 2-dimensional void and two 1-dimensional holes. Images: Geek3 and YassinMrabet via Wikipedia.

**Figure 2.**The pipeline for a topological study of digital data in a Machine Learning context. A filtration associates a persistence diagram to the digital data. The persistence diagram is then vectorized by means of various vectorization methods. Finally, the vector is fed to a Machine Learning classifier.

**Figure 3.**Pipeline application for point cloud data (

**a**). The persistence diagram associated (

**b**). In the second and third rows, four different vectorization methods for the same PD, namely Persistence Images (

**c**), Persistence Landscapes (

**d**), Persistence Silhouette (

**e**) and Betti Curves (

**f**).

**Figure 5.**Persistence Diagram (

**a**) and three Persistence Images for $(\sigma ,n)=(0.1,5),(0.1,10),(0.05,25)$ respectively in (

**b**–

**d**).

**Figure 6.**Persistence Diagram (

**a**) and three Persistence Landscapes for $(n,r)=(1,25),(3,50),(5,100)$ respectively in (

**b**–

**d**).

**Figure 7.**Persistence Diagram (

**a**) and three Persistence Silhouettes for $r=25,50,\phantom{\rule{3.33333pt}{0ex}}$ and $\phantom{\rule{3.33333pt}{0ex}}100$, respectively in (

**b**–

**d**).

**Figure 8.**Persistence Diagram (

**a**) and three Betti curves for $r=25,50,\phantom{\rule{3.33333pt}{0ex}}$ and $\phantom{\rule{3.33333pt}{0ex}}100$, respectively, in (

**b**–

**d**).

**Figure 9.**Example of truncated orbits $\{({x}_{n},{y}_{n}),n=0,\cdots ,1000\}$ for the first 1000 iterations of the linked twisted map for different r. From (

**a**) to (

**e**) r is, respectively, $2,3.5,4,4.1,$ and $4.3$.

**Figure 10.**Example of truncated orbits $\{({x}_{n},{y}_{n}),n=0,\cdots ,1000\}$ for the first 1000 iterations of the linked twisted map for $r=4.3$ and five different starting points (

**a**–

**e**).

**Figure 11.**Sample images from MNIST dataset. It can be seen at a glance that the homology of different digits is almost always trivial (for (

**b**,

**c**)) or close to trivial (for (

**a**,

**d**,

**e**)).

**Figure 12.**The eight directions used for the height filtration (

**a**) and resulting filtrated images along four directions (

**b**–

**e**).

**Figure 13.**The nine centers used for the radial filtration (

**a**) and resulting filtrated images with respect to four different centers (

**b**–

**e**).

**Figure 14.**The original digit “8” (

**a**) and the resulting filtered image with respect to the density filtration with radius $r=6$ (

**b**).

**Figure 15.**(

**a**–

**e**) Sample images from FMNIST dataset. Classifying these images is clearly more difficult than with the MNIST dataset.

**Figure 16.**The eight directions used for the height filtration (

**a**) and resulting filtrated image (

**b**). The nine centers used for the radial filtration (

**c**,

**d**). Density filtered image (

**e**).

**Figure 17.**A graph of COLLAB (

**a**) and the corresponding PD (

**b**). For aesthetic reasons, only a small sample of 2-simplexes are shown and the edge weight is not displayed. In figure (

**b**) the PD in dimension 0–2 are visualized in the same plot: points are red for PD in dimension 0, blue for PD in dimension 1 and green for dimension 2.

**Table 1.**Accuracy for the dynamical system dataset. (PI: persistent image; PL: persistent landscape; BC: Betti curve).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.493$ (PI) | $0.960$ (PL) | $0.507$ (PI) | $0.920$ (PL) |

Run 2 | $0.480$ (PI) | $0.880$ (PL) | $0.600$ (PI) | $0.880$ (PL) |

Run 3 | $0.507$ (PI) | $0.933$ (PL) | $0.667$ (PI) | $0.933$ (PL) |

Run 4 | $0.480$ (PI) | $0.907$ (BC) | $0.533$ (PI) | $0.907$ (BC) |

Run 5 | $0.453$ (PI) | $0.960$ (PL) | $0.573$ (PI) | $0.933$ (PL) |

Run 6 | $0.533$ (PI) | $0.920$ (PL) | $0.560$ (PI) | $0.907$ (PL) |

Run 7 | $0.547$ (PI) | $0.960$ (PL) | $0.560$ (PI) | $0.947$ (PL) |

Run 8 | $0.520$ (PI) | $0.947$ (PL) | $0.613$ (PI) | $0.933$ (PL) |

Run 9 | $0.560$ (PI) | $0.907$ (PL) | $0.533$ (PI) | $0.893$ (BC) |

Run 10 | $0.520$ (PI) | $0.933$ (PL) | $0.507$ (PI) | $0.933$ (PL) |

Mean: | $0.509\pm 0.031$ | $0.931\pm 0.026$ | $0.565\pm 0.048$ | $0.919\pm 0.020$ |

Homology | Accuracy | Vectorization | Classifier |
---|---|---|---|

${H}_{0}$ | $0.489$ | Persistence Images | RandomForestClassifier |

${H}_{1}$ | $0.921$ | Persistence Landscapes | SVC(kernel = ‘rbf’, C = 10) |

${H}_{0}+{H}_{1}$ (fused) | $0.553$ | Persistence Images | RandomForestClassifier |

${H}_{0}+{H}_{1}$ (concat) | $0.905$ | Persistence Landscapes | RandomForestClassifier |

**Table 3.**Accuracy for MNIST dataset. (PI: persistent image; PL: persistent landscape; PS: persistent silhouette; BC: Betti curve).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.200$ (PI) | $0.305$ (PL) | $0.355$ (PI) | $0.325$ (PL) |

Run 2 | $0.177$ (PI) | $0.318$ (PL) | $0.346$ (PI) | $0.319$ (PL) |

Run 3 | $0.185$ (PI) | $0.322$ (PL) | $0.349$ (PI) | $0.327$ (PS) |

Run 4 | $0.174$ (PI) | $0.318$ (PL) | $0.354$ (PI) | $0.333$ (PL) |

Run 5 | $0.190$ (PS) | $0.321$ (PL) | $0.353$ (PI) | $0.329$ (PL) |

Run 6 | $0.182$ (PI) | $0.326$ (PL) | $0.356$ (PI) | $0.338$ (PL) |

Run 7 | $0.186$ (PI) | $0.315$ (PL) | $0.342$ (PI) | $0.336$ (PL) |

Run 8 | $0.196$ (PS) | $0.330$ (PL) | $0.364$ (PI) | $0.353$ (PL) |

Run 9 | $0.181$ (PI) | $0.305$ (PL) | $0.355$ (PI) | $0.318$ (BC) |

Run 10 | $0.182$ (PI) | $0.318$ (PL) | $0.355$ (PI) | $0.325$ (PL) |

Mean: | $0.185\pm 0.008$ | $0.318\pm 0.008$ | $0.353\pm 0.006$ | $0.330\pm 0.010$ |

**Table 4.**Accuracy for MNIST dataset of the collapse approach. (PI: persistent image; PL: persistent landscape).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.733$ (PI) | $0.629$ (PL) | $0.732$ (PI) | $0.792$ (PI) |

Run 2 | $0.742$ (PI) | $0.632$ (PI) | $0.742$ (PI) | $0.796$ (PI) |

Run 3 | $0.732$ (PI) | $0.612$ (PI) | $0.762$ (PI) | $0.787$ (PI) |

Run 4 | $0.739$ (PI) | $0.639$ (PL) | $0.753$ (PI) | $0.806$ (PI) |

Run 5 | $0.739$ (PI) | $0.622$ (PI) | $0.741$ (PI) | $0.806$ (PI) |

Run 6 | $0.733$ (PL) | $0.620$ (PL) | $0.738$ (PI) | $0.796$ (PI) |

Run 7 | $0.722$ (PI) | $0.649$ (PL) | $0.752$ (PI) | $0.801$ (PL) |

Run 8 | $0.716$ (PI) | $0.635$ (PL) | $0.738$ (PI) | $0.779$ (PI) |

Run 9 | $0.726$ (PL) | $0.634$ (PL) | $0.770$ (PI) | $0.801$ (PI) |

Run 10 | $0.736$ (PI) | $0.626$ (PI) | $0.743$ (PI) | $0.794$ (PL) |

Mean: | $0.732\pm 0.008$ | $0.630\pm 0.010$ | $0.747\pm 0.011$ | $0.796\pm 0.008$ |

**Table 5.**Accuracy for MNIST dataset of the multivector approach. (PI: persistent image; PL: persistent landscape; PS: persistent silhouette; BC: Betti curve).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.911$ (PL) | $0.614$ (PL) | $0.944$ (PI) | $0.937$ (PL) |

Run 2 | $0.922$ (PL) | $0.620$ (PL) | $0.944$ (BC) | $0.949$ (PL) |

Run 3 | $0.916$ (PL) | $0.610$ (PS) | $0.943$ (BC) | $0.945$ (PL) |

Run 4 | $0.901$ (PL) | $0.619$ (PL) | $0.942$ (BC) | $0.929$ (PS) |

Run 5 | $0.911$ (PL) | $0.601$ (PL) | $0.937$ (BC) | $0.942$ (PL) |

Run 6 | $0.919$ (PL) | $0.616$ (PL) | $0.947$ (BC) | $0.943$ (PL) |

Run 7 | $0.916$ (PL) | $0.630$ (PL) | $0.937$ (PI) | $0.939$ (PL) |

Run 8 | $0.911$ (PL) | $0.615$ (PL) | $0.934$ (BC) | $0.935$ (PS) |

Run 9 | $0.918$ (PL) | $0.617$ (PL) | $0.946$ (PL) | $0.944$ (PL) |

Run 10 | $0.924$ (PL) | $0.625$ (PL) | $0.944$ (BC) | $0.934$ (PL) |

Mean: | $0.915\pm 0.006$ | $0.617\pm 0.007$ | $0.942\pm 0.004$ | $0.940\pm 0.006$ |

Homology | Accuracy | Vectorization | Classifier | Approach |
---|---|---|---|---|

${H}_{0}$ | $0.911$ | PI | SVC(kernel = ‘rbf’, C = 10) | Multivector |

${H}_{1}$ | $0.624$ | PL | SVC(kernel = ‘rbf’, C = 20) | Collapse |

${H}_{0}+{H}_{1}$ (fused) | $0.938$ | PI | SVC(kernel = ‘rbf’, C = 20) | Multivector |

${H}_{0}+{H}_{1}$ (concat) | $0.936$ | PI | SVC(kernel = ‘rbf’, C = 20) | Multivector |

Accuracy | [38] Pipeline | [41] Pipeline |
---|---|---|

Run 1 | $0.945$ | $0.916$ (TF) |

Run 2 | $0.929$ | $0.924$ (TF) |

Run 3 | $0.934$ | $0.926$ (TF) |

Run 4 | $0.946$ | $0.923$ (PI) |

Run 5 | $0.945$ | $0.931$ (TF) |

Run 6 | $0.934$ | $0.925$ (TF) |

Run 7 | $0.956$ | $0.926$ (TF) |

Run 8 | $0.943$ | $0.926$ (PI) |

Run 9 | $0.948$ | $0.927$ (TF) |

Run 10 | $0.933$ | $0.926$ (TF) |

Mean: | $0.941\pm 0.008$ | $0.925\pm 0.004$ |

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.519$ (PI) | $0.390$ (PI) | $0.538$ (PI) | $0.530$ (PI) |

Run 2 | $0.499$ (PI) | $0.398$ (PI) | $0.544$ (PI) | $0.524$ (PI) |

Run 3 | $0.524$ (PI) | $0.416$ (PI) | $0.553$ (PI) | $0.548$ (PI) |

Run 4 | $0.485$ (PI) | $0.385$ (PI) | $0.498$ (PI) | $0.512$ (PI) |

Run 5 | $0.474$ (PI) | $0.370$ (PI) | $0.508$ (PI) | $0.526$ (PI) |

Run 6 | $0.491$ (PI) | $0.381$ (PI) | $0.511$ (PI) | $0.516$ (PI) |

Run 7 | $0.536$ (PI) | $0.379$ (PI) | $0.547$ (PI) | $0.527$ (PI) |

Run 8 | $0.533$ (PI) | $0.401$ (PI) | $0.560$ (PI) | $0.542$ (PI) |

Run 9 | $0.513$ (PI) | $0.388$ (PI) | $0.542$ (PI) | $0.532$ (PI) |

Run 10 | $0.495$ (PI) | $0.373$ (PI) | $0.523$ (PI) | $0.528$ (PI) |

Mean: | $0.507\pm 0.020$ | $0.388\pm 0.013$ | $0.532\pm 0.020$ | $0.529\pm 0.010$ |

**Table 9.**Accuracy for FMNIST dataset of the collapse approach. (PI: persistent image; PL: persistent landscape).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.642$ (PL) | $0.430$ (PI) | $0.632$ (PL) | $0.662$ (PL) |

Run 2 | $0.627$ (PL) | $0.410$ (PI) | $0.611$ (PL) | $0.640$ (PL) |

Run 3 | $0.636$ (PL) | $0.442$ (PI) | $0.619$ (PL) | $0.672$ (PL) |

Run 4 | $0.630$ (PL) | $0.391$ (PL) | $0.613$ (PL) | $0.654$ (PL) |

Run 5 | $0.634$ (PL) | $0.418$ (PI) | $0.621$ (PL) | $0.661$ (PL) |

Run 6 | $0.620$ (PL) | $0.410$ (PI) | $0.612$ (PL) | $0.642$ (PL) |

Run 7 | $0.637$ (PL) | $0.421$ (PI) | $0.611$ (PL) | $0.662$ (PL) |

Run 8 | $0.640$ (PL) | $0.434$ (PI) | $0.637$ (PL) | $0.662$ (PL) |

Run 9 | $0.646$ (PL) | $0.426$ (PL) | $0.628$ (PI) | $0.656$ (PL) |

Run 10 | $0.627$ (PL) | $0.413$ (PL) | $0.610$ (PL) | $0.646$ (PL) |

Mean: | $0.634\pm 0.008$ | $0.419\pm 0.014$ | $0.619\pm 0.009$ | $0.656\pm 0.010$ |

**Table 10.**Accuracy for FMNIST dataset of the multivector approach. (PL: persistent landscape; PS: persistent silhouette).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ (Concat) |
---|---|---|---|---|

Run 1 | $0.678$ (PL) | $0.431$ (PL) | $0.750$ (PL) | $0.717$ (PL) |

Run 2 | $0.679$ (PL) | $0.420$ (PS) | $0.702$ (PL) | $0.682$ (PL) |

Run 3 | $0.704$ (PL) | $0.448$ (PL) | $0.718$ (PL) | $0.715$ (PL) |

Run 4 | $0.690$ (PL) | $0.408$ (PL) | $0.721$ (PL) | $0.706$ (PL) |

Run 5 | $0.678$ (PL) | $0.418$ (PL) | $0.714$ (PL) | $0.707$ (PL) |

Run 6 | $0.670$ (PL) | $0.397$ (PL) | $0.707$ (PL) | $0.678$ (PL) |

Run 7 | $0.686$ (PL) | $0.412$ (PL) | $0.705$ (PL) | $0.688$ (PL) |

Run 8 | $0.698$ (PL) | $0.425$ (PL) | $0.721$ (PL) | $0.712$ (PL) |

Run 9 | $0.690$ (PL) | $0.438$ (PL) | $0.716$ (PL) | $0.707$ (PL) |

Run 10 | $0.682$ (PL) | $0.414$ (PL) | $0.709$ (PL) | $0.696$ (PL) |

Mean: | $0.686\pm 0.010$ | $0.421\pm 0.014$ | $0.716\pm 0.013$ | $0.701\pm 0.013$ |

Accuracy | [38] Pipeline | [41] Pipeline |
---|---|---|

Run 1 | $0.753$ | $0.810$ (PI) |

Run 2 | $0.739$ | $0.795$ (PI) |

Run 3 | $0.750$ | $0.818$ (TF) |

Run 4 | $0.757$ | $0.793$ (PI) |

Run 5 | $0.769$ | $0.795$ (PI) |

Run 6 | $0.738$ | $0.792$ (PI) |

Run 7 | $0.750$ | $0.802$ (PI) |

Run 8 | $0.748$ | $0.813$ (PI) |

Run 9 | $0.752$ | $0.803$ (PI) |

Run 10 | $0.736$ | $0.815$ (PI) |

Mean: | $0.749\pm 0.009$ | $0.804\pm 0.009$ |

Homology | Accuracy | Vectorization | Classifier | Approach |
---|---|---|---|---|

${H}_{0}$ | $0.681$ | PL | RFC | Multivector |

${H}_{1}$ | $0.417$ | PI | RFC | Collapse |

${H}_{0}+{H}_{1}$ (fused) | $0.716$ | PL | RFC | Multivector |

${H}_{0}+{H}_{1}$ (concat) | $0.701$ | PL | RFC | Multivector |

**Table 13.**Accuracy for COLLAB dataset. (PI: persistent image; PL: persistent landscape; PS: persistent silhouette; BC: Betti curve).

Accuracy | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{2}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}+{\mathit{H}}_{2}$ (Fused) | ${\mathit{H}}_{0}+{\mathit{H}}_{1}+{\mathit{H}}_{2}$ (Concat) |
---|---|---|---|---|---|

Run 1 | $0.602$ (PI) | $0.549$ (PS) | $0.731$ (PI) | $0.730$ (PI) | $0.730$ (PI) |

Run 2 | $0.613$ (PI) | $0.543$ (PS) | $0.760$ (PI) | $0.759$ (PI) | $0.747$ (PI) |

Run 3 | $0.613$ (PI) | $0.549$ (PS) | $0.741$ (PI) | $0.747$ (PI) | $0.739$ (PI) |

Run 4 | $0.621$ (BC) | $0.542$ (PL) | $0.736$ (PI) | $0.749$ (PI) | $0.737$ (PI) |

Run 5 | $0.628$ (BC) | $0.551$ (PS) | $0.746$ (PI) | $0.758$ (PI) | $0.752$ (PI) |

Run 6 | $0.621$ (PI) | $0.557$ (PS) | $0.759$ (PI) | $0.763$ (PI) | $0.753$ (PI) |

Run 7 | $0.609$ (PI) | $0.550$ (PS) | $0.736$ (PI) | $0.750$ (PI) | $0.734$ (PI) |

Run 8 | $0.626$ (BC) | $0.557$ (PS) | $0.750$ (PI) | $0.751$ (PI) | $0.725$ (PI) |

Run 9 | $0.615$ (PS) | $0.559$ (PS) | $0.745$ (PI) | $0.749$ (PI) | $0.739$ (PI) |

Run 10 | $0.607$ (PS) | $0.567$ (PS) | $0.753$ (PI) | $0.748$ (PI) | $0.739$ (PI) |

Mean: | $0.616\pm 0.008$ | $0.552\pm 0.007$ | $0.746\pm 0.009$ | $0.750\pm 0.009$ | $0.739\pm 0.009$ |

Homology | Accuracy | Vectorization | Classifier |
---|---|---|---|

${H}_{0}$ | $0.613$ | Betti Curves | RandomForestClassifier |

${H}_{1}$ | $0.550$ | Persistence Silhouette | RandomForestClassifier |

${H}_{2}$ | $0.746$ | Persistence Images | RandomForestClassifier |

${H}_{0}+{H}_{1}+{H}_{2}$ (fused) | $0.749$ | Persistence Images | RandomForestClassifier |

${H}_{0}+{H}_{1}+{H}_{2}$ (concat) | $0.736$ | Persistence Images | RandomForestClassifier |

p-Value | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Fused | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Concat |
---|---|---|---|---|

PI vs. PL | 2.03 × ${10}^{-9}$ | $9.33\times {10}^{-3}$ | $1.27\times {10}^{-3}$ | $5.06\times {10}^{-2}$ |

PI vs. PS | $2.03\times {10}^{-9}$ | $3.44\times {10}^{-1}$ | $5.38\times {10}^{-1}$ | $7.97\times {10}^{-1}$ |

PI vs. BC | $2.03\times {10}^{-9}$ | $4.60\times {10}^{-3}$ | $1.27\times {10}^{-3}$ | $3.84\times {10}^{-3}$ |

PL vs. PS | Null | $1.28\times {10}^{-3}$ | $2.26\times {10}^{-4}$ | $1.34\times {10}^{-4}$ |

PL vs. BC | Null | $4.61\times {10}^{-2}$ | Null | $2.55\times {10}^{-3}$ |

PS vs. BC | Null | $7.90\times {10}^{-3}$ | $2.26\times {10}^{-4}$ | $1.37\times {10}^{-4}$ |

p-Value | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Fused | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Concat |
---|---|---|---|---|

PI vs. PL | 4.55 × ${10}^{-9}$ | $1.02\times {10}^{-8}$ | $1.82\times {10}^{-2}$ | $3.39\times {10}^{-7}$ |

PI vs. PS | $8.20\times {10}^{-7}$ | $6.32\times {10}^{-7}$ | $1.58\times {10}^{-2}$ | $3.98\times {10}^{-6}$ |

PI vs. BC | $1.21\times {10}^{-3}$ | $8.97\times {10}^{-9}$ | $2.73\times {10}^{-1}$ | $4.18\times {10}^{-5}$ |

PL vs. PS | $1.06\times {10}^{-4}$ | $2.27\times {10}^{-2}$ | $6.23\times {10}^{-2}$ | $9.62\times {10}^{-5}$ |

PL vs. BC | $1.14\times {10}^{-6}$ | $3.62\times {10}^{-2}$ | $4.27\times {10}^{-2}$ | $6.76\times {10}^{-5}$ |

PS vs. BC | $7.92\times {10}^{-5}$ | $4.70\times {10}^{-3}$ | $4.91\times {10}^{-3}$ | $1.46\times {10}^{-3}$ |

p-Value | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Fused | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Concat |
---|---|---|---|---|

PI vs. PL | 2.02 × ${10}^{-2}$ | $6.40\times {10}^{-5}$ | $2.77\times {10}^{-1}$ | $9.63\times {10}^{-3}$ |

PI vs. PS | $1.34\times {10}^{-1}$ | $2.54\times {10}^{-5}$ | $1.01\times {10}^{-3}$ | $6.45\times {10}^{-3}$ |

PI vs. BC | $3.96\times {10}^{-1}$ | $1.96\times {10}^{-3}$ | $2.68\times {10}^{-2}$ | $8.23\times {10}^{-1}$ |

PL vs. PS | $6.91\times {10}^{-3}$ | $7.16\times {10}^{-1}$ | $9.79\times {10}^{-1}$ | $2.67\times {10}^{-1}$ |

PL vs. BC | $1.53\times {10}^{-2}$ | $2.05\times {10}^{-6}$ | $3.97\times {10}^{-3}$ | $1.66\times {10}^{-3}$ |

PS vs. BC | $6.25\times {10}^{-1}$ | $1.04\times {10}^{-6}$ | $8.16\times {10}^{-3}$ | $3.46\times {10}^{-4}$ |

p-Value | ${\mathit{H}}_{0}$ | ${\mathit{H}}_{1}$ | ${\mathit{H}}_{2}$ | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Fused | ${\mathit{H}}_{0}+{\mathit{H}}_{1}$ Concat |
---|---|---|---|---|---|

PI vs. PL | 3.73 × ${10}^{-2}$ | $1.53\times {10}^{-5}$ | $7.87\times {10}^{-4}$ | $7.89\times {10}^{-4}$ | $1.13\times {10}^{-3}$ |

PI vs. PS | $1.52\times {10}^{-1}$ | $7.06\times {10}^{-5}$ | $5.17\times {10}^{-3}$ | $2.32\times {10}^{-4}$ | $7.29\times {10}^{-4}$ |

PI vs. BC | $7.80\times {10}^{-1}$ | $3.71\times {10}^{-4}$ | $1.13\times {10}^{-3}$ | $2.01\times {10}^{-5}$ | $8.58\times {10}^{-5}$ |

PL vs. PS | $3.14\times {10}^{-4}$ | $4.24\times {10}^{-1}$ | $3.36\times {10}^{-5}$ | $2.67\times {10}^{-1}$ | $2.56\times {10}^{-1}$ |

PL vs. BC | $4.45\times {10}^{-5}$ | $9.19\times {10}^{-3}$ | $6.17\times {10}^{-2}$ | $2.20\times {10}^{-1}$ | $3.46\times {10}^{-1}$ |

PS vs. BC | $7.63\times {10}^{-4}$ | $7.05\times {10}^{-3}$ | $8.99\times {10}^{-1}$ | $7.00\times {10}^{-1}$ | $3.31\times {10}^{-2}$ |

p-Value | Dynamical | MNIST | FMNIST | COLLAB |
---|---|---|---|---|

${H}_{0}$ vs. ${H}_{1}$ | 5.71 × ${10}^{-5}$ | $1.82\times {10}^{-10}$ | $1.78\times {10}^{-8}$ | $1.39\times {10}^{-10}$ |

${H}_{0}$ vs. ${H}_{2}$ | - | - | - | $9.82\times {10}^{-3}$ |

${H}_{0}$ vs. fused | $1.35\times {10}^{-1}$ | $7.81\times {10}^{-6}$ | $5.26\times {10}^{-4}$ | $1.31\times {10}^{-2}$ |

${H}_{0}$ vs. concat | $2.18\times {10}^{-6}$ | $3.69\times {10}^{-3}$ | $3.58\times {10}^{-2}$ | $3.87\times {10}^{-3}$ |

${H}_{1}$ vs. ${H}_{2}$ | - | - | - | $3.84\times {10}^{-5}$ |

${H}_{1}$ vs. fused | $3.56\times {10}^{-4}$ | $5.27\times {10}^{-10}$ | $1.01\times {10}^{-8}$ | $3.65\times {10}^{-5}$ |

${H}_{1}$ vs. concat | $2.82\times {10}^{-2}$ | $5.66\times {10}^{-14}$ | $3.82\times {10}^{-8}$ | $1.34\times {10}^{-5}$ |

${H}_{2}$ vs. fused | - | - | - | $5.73\times {10}^{-1}$ |

${H}_{2}$ vs. concat | - | - | - | $3.50\times {10}^{-1}$ |

fused vs. concat | $8.08\times {10}^{-6}$ | $1.31\times {10}^{-1}$ | $1.04\times {10}^{-4}$ | $1.15\times {10}^{-2}$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Conti, F.; Moroni, D.; Pascali, M.A.
A Topological Machine Learning Pipeline for Classification. *Mathematics* **2022**, *10*, 3086.
https://doi.org/10.3390/math10173086

**AMA Style**

Conti F, Moroni D, Pascali MA.
A Topological Machine Learning Pipeline for Classification. *Mathematics*. 2022; 10(17):3086.
https://doi.org/10.3390/math10173086

**Chicago/Turabian Style**

Conti, Francesco, Davide Moroni, and Maria Antonietta Pascali.
2022. "A Topological Machine Learning Pipeline for Classification" *Mathematics* 10, no. 17: 3086.
https://doi.org/10.3390/math10173086