Digital Commons @ Trinity Digital Commons @ Trinity

: In this paper, we explore how to use topological tools to compare dimension reduction methods. We ﬁrst make a brief overview of some of the methods often used in dimension reduction such as isometric feature mapping, Laplacian Eigenmaps, fast independent component analysis, kernel ridge regression, and t-distributed stochastic neighbor embedding. We then give a brief overview of some of the topological notions used in topological data analysis, such as barcodes, persistent homology, and Wasserstein distance. Theoretically, when these methods are applied on a data set, they can be interpreted differently. From EEG data embedded into a manifold of high dimension, we discuss these methods and we compare them across persistent homologies of dimensions 0, 1, and 2, that is, across connected components, tunnels and holes, shells around voids, or cavities. We ﬁnd that from three dimension clouds of points, it is not clear how distinct from each other the methods are, but Wasserstein and Bottleneck distances, topological tests of hypothesis, and various methods show that the methods qualitatively and signiﬁcantly differ across homologies. We can infer from this analysis that topological persistent homologies do change dramatically at seizure, a ﬁnding already obtained in previous analyses. This suggests that looking at changes in homology landscapes could be a


Introduction
In topological data analysis, one is interested in understanding high dimensional structures from low dimensional ones and how discrete structures can be aggregated to form a global structure. It can be a difficult task to even think or believe that high dimensional objects exist beyond three dimensions since we can not visualize objects beyond a threedimensional space. However, embedding theorems clearly show that these high dimension structures do, in fact, exist, for instance, Whitney [1] and Takens [2] embedding theorems. From a practical point of view, to make inferences on structures embedded in high dimensional ambient spaces, some kind of dimensional reduction needs to occur. From a data analysis point of view, dimension reduction amounts to data compression where a certain amount of information may be lost. This dimension reduction is part of manifold learning, which can be understood as a collection of algorithms for recovering low dimension manifolds embedded into high dimensional ambient spaces while preserving meaningful information, see Ma and Fu [3]. The algorithms for dimension reduction may be classified into linear and nonlinear methods or parametric or nonparametric methods, where the goal is to select or extract coarse features from high dimensional data. Among the pioneering linear methods is the principal component analysis (PCA) introduced by Hotelling in [4]. Its primary goal is to reduce the data to a set of orthogonal linear projections ordered by decreasing variances. Another linear method is multidimensional scaling (MSD), where the data are aggregated using a measure of proximity, which could be a distance or a measure of association such as correlation or any other method describing how close entities can be, see, for instance, Ramsey and Silverman [5]. Linear discriminant analysis (LDA) is a linear method similar to PCA consisting of writing a categorical dependent variable as a linear combination of continuous independent variables, see, for instance, Cohen et al. [6], Friedman [7], or Yu and Yang [8]. As such, it is opposite to an analysis of variance (ANOVA) where the dependent variable is continuous and the independent variables are categorical. The focus of this paper will be on nonlinear techniques, which, similar to their linear counterparts, aim to extract or select low dimensional features while preserving important information. Since there are many such methods, our focus will be on isometric feature mapping (ISOMAP) [9], Laplacian Eigenmaps [10], fast independent component analysis, (Fast-ICA) [11], kernel ridge regression [12], and t-distributed stochastic neighbor embedding (t-SNE) [13]. We will compare them using persistent homology (PH). PH is one the many techniques of topological data analysis (TDA) that can be used to identify features in data that remain persistent over multiple and different scales. This tool can provide new insights into seemingly known or unknown data and has the potential to uncover interesting hidden information embedded within data. For instance, PH was used to provide new insights on the topology of deep neural networks, see [14]. PH was successfully used to provide new perspectives on viral evolution, see [15]. The following examples of successful applications can be found in [16], including but not limited to better understanding of sensor-network coverage, see [17]; proteins, see [18,19]; dimensional structure of DNA, see [20]; cell development, see [21]; robotics, see [22][23][24]; signal processing, see [25,26]; spread of contagions, see [27]; financial networks, see [28]; applications in neuroscience, see [29,30]; time-series output of dynamical systems, see [31]; and EEG epilepsy, see [32]. The approach in the last reference is of particular interest to us. Indeed, in that paper, the authors considered the EEG measured in a healthy person during sleep. They used the method of false nearest neighbors to estimate the embedding dimension. From there, persistent barcode diagrams were obtained and revealed that topological noise persisted at certain dimensions and vanished at some others. This paper has a similar approach and is organized as follows: in Section 2, we review theories behind some dimension reduction methods; then, in Section 3, we give an overview of the essentials of persistent homology; in Section 4, we discuss how to apply persistent homology to the data and compare the methods on an EEG data set using persistent homology. Finally. in Section 5, we make some concluding remarks.

Materials and Methods
Let us note that some of the review methods below are extensively described in [3]. To have all of our ideas self-contained, we reintroduce a few concepts. In the sequel, · is the euclidian norm in R d , for some d ≥ 3. In the sequel, topological spaces M will considered to be second-countable Hausdorff; that is, (a) every pair of distinct points has a corresponding pair of disjoint neighborhoods. (b) Its topology has a countable basis of open sets. This assumption is satisfied in most topological spaces and seems reasonable.

Definition 1.
A topological space M is called a (topological) manifold if, locally, it resembles a real n-dimensional Euclidian space, that is, there exists n ∈ N such that for all x ∈ M , there exists a neighborhood U x of x and a homeomorphism f : U x → R n . The pair (U x , f ) is referred to as a chart on M and f is called a parametrization at x.

Definition 2.
Let M be a manifold. M is said to be smooth if given x ∈ M , the parametrization f at x has smooth or continuous partial derivatives of any order and can be extended to a smooth function F : M → R n such that F M ∩U x = f .

Definition 3.
Let M and N be differentiable manifolds. A function ψ : M → N is an embedding if ψ is an injective immersion.
Next, we introduce the notion of the boundary of the topological manifold, which will be important in the sequel. Definition 4. Consider a Hausdorff topological manifold M homeomorphic to an open subset of the half-euclidian space R n + . Let the interior Int(M ) of M be the subspace of M formed by all points s that have a neighborhood homeomorphic to R n . Then, the boundary of M is defined as a complement of Int(M ) in M , that is, M \ Int(M ), which is an n − 1-dimensional topological manifold.

ISOMAP
Isometric feature mapping (ISOMAP) was introduced by Tanenbaum et al. in [9]. The data are considered to be a finite sample {v i } from a smooth manifold M . The two key assumptions are: (a) an isometric embedding ψ : M → X exists where X = R d , where the distance on M is the geodesic distance or the shortest curve connecting two points; (b) the smooth manifold M is a convex region of R m , where m << d. The implementation phase has three main steps.

1.
For a fixed integer K and real number > 0, perform an − K-nearest neighbor search using the fact that the geodesic distance D M (v i , v j ) between two points on M is the same (by isometry) as their euclidian distance v i − v j in R d . K is the number of data points selected within a ball of radius .

2.
Having calculated the distance between points as above, the entire data set can be considered as a weighted graph with vertices v = {v i } and edges e = e ij , where e ij connects v i with v j with a distance w ij = D M (v i , v j ), considered an associated weight. The geodesic distance between two data points v i and v j is estimated as the graph distance between the two edges, that is, the number of edges in the shortest path connecting them. We observe that this shortest path is found by minimizing the sum of the weights of its constituent edges.

3.
Having calculated the geodesic distances D G = w ij as above, we observe that D G is a symmetric matrix, so we can apply the classical multidimensional scaling algorithm (MDS) (see [33]) to D G by mapping (embedding) them into a feature space Y of dimension d while preserving the geodesic distance on M . Y is generated by a d × m matrix whose i-th column represents the coordinates of v i in Y .

Laplacian Eigenmaps
The Laplacian Eigenmaps (LEIM) algorithm was introduced by Belkin and Niyogi in [10]. As above, the data v = {v i } are supposed to be from a smooth manifold M . It also has three main steps:

1.
For a fixed integer K and real number > 0, perform a K-nearest neighbor search on symmetric neighborhoods. Note that given two points v i , v j , their respective K-neighborhood N K i and N K j are symmetric if and only v i For a given real number σ > 0 and each pair of points (v i , v j ), calculate the weight Obtain the adjacency matrix W = (w ij ). The data now form a weighted graph with vertices v, with edges e = e ij , and weights W = w ij , where e ij connects v i with v j with distance w ij . 3.
Consider Λ = λ ij to be a diagonal matrix with λ ii = ∑ j w ij and define the graph Laplacian as L = Λ − W. Then, L is positive definite so let Y be the d × n matrix that minimizes ∑ i,j w ij y i − y j 2 = tr(T LY T ). Then, Y can used to embed M into a d-dimensional space Y , whose i-th column represents the coordinates of v i in Y .

Fast ICA
The fast independent component analysis (Fast-ICA) algorithms were introduced by Hyvärinen in [11]. As above, the data v are considered to be from a smooth manifold M . It is assumed that the data v are represented as an n × m matrix (v ij ) that can be flattened into a n × m vector. As in principal component analysis (PCA), in factor analysis, projection pursuit, or independent component analysis (ICA), by considering the data as an n × m-dimensional observed random variable, the goal is to determine a matrix W such that s = W T v, where s is a n × m-dimensional random variable having desirable properties such as optimal dimension reduction or other interesting statistical properties such as minimal variance. Optimally, the components of s should provide source separation (the original data source v is assumed to be corrupted with noise) and feature extraction and be independent of each other. In a regular ICA, the matrix W is found by minimizing the mutual information, a measure of dependence between given random variables. In fast-ICA algorithms, the matrix W is found by using a Newton fixed point approach, with an objective function taken as the differential entropy, given as where it is assumed that W is such that E[(W T W ) 2 ] = 1, and z is the standard normal distribution. G is a function referred to as the contrast function, which includes but is and σ ≈ 1. From a dynamical system point of view, the fixed point is locally asymptotically stable with the exception of G(u) = 0.25u 4 , where stability becomes global. For simplification purposes, let g(x) = G (x). The key steps are: 1. Data preparation: it consists of centering the data v with respect to the column to The centered data are then whitened; that is, v c is linearly transformed into v c w , a matrix of uncorrelated components. This is accomplished through an eigenvalue decomposition of the covariance matrix C = v c (v c ) T to obtain two matrices V , E, respectively, of eigenvectors and eigenvalues so that E[C] = V EV T . The whitened data are found as v c w = E −1/2 V T v c and simply referred to again as v for simplicity.

2.
Component extraction: where W a is the optimal weight matrix. Applying the Newton • Repeat until a suitable convergence level is reached. • From the last matrix W obtained, let s = W T v.

Kernel Ridge Regression
The kernel ridge regression (KRR) is constructed as follows: as above, the data v are considered to be from a smooth manifold M of dimension d. It is assumed that the data v are represented as an n × m matrix v ij that can be flattened into a n × m vector. Suppose we are in possession of u = (u 1 , u 2 , · · · , u n ) data corresponding to a response variable and covariates given as With the least square method, we can find the best linear model between the covariates v = (v i ) and the response u = (u i ) by minimizing the objective function W is a 1 × n vector. Similar approaches include maximum likelihood approaches, see, for instance, [34] or perpendicular offsets [35]. However, regression methods are notorious for overfitting. Overfitting occurs when a model closely fits a training data set but fails to do so on a test data set. In practice, this can lead to dire consequences, see, for instance, the book by Nate Silver [36] for illustrative examples in real life. Numerous solutions were proposed to overcome overfitting; these include but are not limited to training with more data, data augmentation, cross-validation, feature selection, regularization, or penalization (Lasso, Ridge, Elastic net). The ridge regression is a compromise that uses a penalized objective function such as The solution can be In case the true nature of the relationship between the response and covariates is nonlinear, we can replace In particular, if the response is qualitative, that is, labels, then we have a classification problem, and ϕ is referred to as a feature map. Note that when using ϕ, the number of dimensions of the problem is considerably high. Put as the kernel, which is the only quantity needed to be calculated, thereby significantly reducing the computational time and dimensionality of the problem. In practice, we may use a linear kernel K(x, y) = x T y or a Gaussian kernel K(x, y) = e −σ x−y 2 , for some σ > 0, where · is a norm in R m and σ is given a real constant.

t-SNE
Stochastic neighbor embedding (SNE) was proposed by Hinston and Roweis in [37]. t-SNE followed later and was proposed by van der Maaten and Hinton in [13]. t-distributed stochastic neighbor embedding (t-SNE) is a dimension reduction method that amounts to assigning data to two or three dimensional maps. As above, we consider the data v = (v ij ) = (v k ) (k = 1, 2, · · · , N with N = n × m) to be from a smooth manifold M of high dimension, d. The main steps of the method are: • Calculate the asymmetrical probabilities p kl as represents the dissimilarity between v k and v l , and σ i is a parameter selected by the experimenter or by a binary search. p kl represents the conditional probability that datapoint v l is the neighborhood of datapoint v k if neighbors were selected proportionally to their probability density under a normal distribution centered at v k and variance σ i . • Assuming that the low dimensional data are u = (u k ), k = 1, 2, · · · , N, the corresponding dissimilarity probabilities q kl are calculated under constant variance as Then, we minimize the Kullback-Leibler divergence between p kl and q kl , given as p kl log p kl q kl , using the gradient descent method with a momentum term with the scheme w t = w t−1 + η ∂L ∂u + α(t)(w t−1 − w t−2 ) for t = 2, 3, · · · , T for some given T. Note that w 0 = (u 1 , u 2 , · · · , u N ) ∼ N(0, 10 −4 I), where I is the N × N identity matrix, η is a constant representing a learning rate, and α(t) is t-th momentum iteration. We note that ∂L ∂u • Then, we use u = w T as the low dimensional representation of v.

Persistent Homology
In the sequel, we will introduce the essential ingredients needed to understand and compute persistent homology.

Simplex Complex
Definition 5. A real d-simplex S is a topological manifold of dimension d that represents the convex hull of d + 1 points. In other words: (1)

Example 1.
A 0-simplex is a point, a 1-simplex is an edge, a 2-simplex is a triangle, a 3-simplex is a tetrahedron, a 4-simplex is a pentachoron, etc., see for instance Figure 1 below.

Remark 1.
We observe that a d-simplex S can also be denoted as We also note that the dimension of V i is i. Definition 6. Given a simplex S, a face of S is another simplex R such that R ⊆ S and such that the vertices of R are also the vertices of S.

Example 2.
Given a 3-simplex (a tetrahedron), it has 4 different 2-simplex or 2 dimensional faces, each of them with three 1-simplex or 1-dimensional faces, each with three 0-simplex or 0-dimensional faces.

Definition 7.
A simplicial complex Σ is a topological space formed by different simplices not necessarily of the same dimension that have to satisfy the gluing condition, that is (see Figure 2): 1. Given S i ∈ Σ, its face R i ∈ Σ.

2.
Given S i , S j ∈ Σ, either S i ∩ S j = ∅ or S i ∩ S j = R i = R j , the faces of S i and S j , respectively. It is important to observe that a simplicial complex can be defined very abstractly. Indeed, Definition 8. A simplicial complex Σ = {S : S ⊆ Ω} is a collection of non-empty subsets of a set Ω such that 1.

2.
For any set U such that S ⊂ U for some S ∈ Σ, then U ∈ Σ.

Homology and Persistent Homology
Definition 9. Let Σ be a simplicial complex. We define the Abelian group generated by the j-simplices of Σ as C j (Σ). We define a boundary operator associated with C j (Σ) as a homeomorphism We define the chain complex associated with Σ as the collection of pairs Now, we can define a homology group associated with a simplicial complex.

Definition 10.
Given a simplicial complex Σ, put A j (Σ) := kern(∂ j ) and B j (Σ) := Im(∂ j+1 ). Then, the jth homology group H j (Σ)of Σ is defined as the quotient group between A j (Σ) and B j (Σ); that is, What this reveals is the presence of "holes" in a given shape.

Remark 2.
It is important to observe that H j (Σ) = < j-dimensional cycles > < j-dimensional boundaries > , where < U > stands for the span of U, and a cycle is simply a shape similar to a loop but necessarily without a starting point. Another important remark is that the boundary operator can indeed be defined as where V k means not counting the vertices of V k . This shows that ∂ j (Σ) lies in a j − 1-simplex.
Another remark is that Now that we know that homology reveals the presence of "holes", we need to find a way of determining how to count these "holes".

Definition 11. Given a simplicial complex Σ, the jth Betti number b j (Σ) is the rank of
In other words, it is the smallest cardinality of a generating set of the group H j (Σ). In fact, since the elements of A j (Σ) are j-dimensional cycles and that of B j (Σ) are j-dimensional boundaries, the Betti number counts the number of independent j-cycles not representing the boundary of any collection of simplices of Σ. b 2 is the number of shells around cavities or voids.

Definition 12.
Let Σ be the simplicial complex, and let N be a positive integer. A filtration of Σ is a nested family Σ F N := Σ p , 0 ≤ p ≤ N of sub-complexes of Σ such that Now, let F 2 be the field with two elements, and let 0 ≤ p ≤ q ≤ N be two integers. Since Σ p ⊆ Σ q , the inclusion map Incl pq : Σ p → Σ q induces an F 2 -linear map defined as g pq : H j (Σ p ) → H j (Σ q ). We can now define, for any 0 ≤ j ≤ d, the j-th persistent homology of a simplicial complex Σ.
In a sense, the j-th persistent homology provides more refined information than the homology of the simplicial complex in that it informs us of the changes in features such as connected components, tunnels and holes, and shells around voids through the filtration process. It can be visualized using a "barcode" or a persistent diagram. The following definition is borrowed from [38]: Consider a simplicial complex Σ, a positive integer N, and two integers 0 ≤ p ≤ q ≤ N. The barcode of the j-th persistent homology H p→q j (Σ, F 2 ) of Σ is a graphical representation of H p→q j (Σ, F 2 ) as a collection of horizontal line segments in a plane whose horizontal axis corresponds to a parameter and whose vertical axis represents an arbitrary ordering of homology generators.
We finish this section with the introduction of the Wasserstein and Bottleneck distances, used for the comparison of persistent diagrams. Definition 15. Let X and Y be two diagrams. A matching η between X and Y is a collection of pairs η = {(x, y) ∈ X × Y} where x and y can occur in at most one pair. It is sometimes denoted as η : X → Y. x and y are referred to as intervals of X and Y respectively.
Then, η = {((0, 1), (−π, 0])} is matching between X and Y such that (0, 1) is matched to (−π, 0]) and [0, 1), (0, 1), and (0, 2) are unmatched. Definition 16. Let p > 1 be a real number. Given two persistent diagrams X and Y, the p-th Wasserstein distance W p (X, Y) between X and Y is defined as where η is a perfect matching between the intervals of X and Y. The Bottleneck distance is obtained when p = ∞; that is, it is given as

Results
In the presence of data, simplicial complexes will be replaced by sets of data indexed by a parameter, therefore, transforming these sets into parametrized topological entities. On these parametrized topological entities, the notions of persistent homology introduced above can be computed, especially the Betti number, in the form of a "barcode". To see how this could be calculated, let us consider the following definitions: Definition 17. For a given collection of points {s δ } in a manifold M of dimension n, itsČech complex C δ is a simplicial complex formed by d-simplices obtained from a sub-collection x δ,k , 0 ≤ k ≤ d, 0 ≤ d ≤ n of points such that taken pairwise, their δ/2-ball neighborhoods have a point in common.

Definition 18.
For a given collection of points {s δ } in a manifold M of dimension n, its Rips complex R δ is a simplicial complex formed by d-simplices obtained from a sub-collection x δ,k , 0 ≤ k ≤ d, 0 ≤ d ≤ n of points that are pairwise within a distance of δ.
Remark 3. 1. It is worth noting that in practice, Rips complexes are easier to compute thanČech complexes, because the exact definition of the distance on M may not be known. 2. More importantly, from a data analysis point of view, Rips complexes are good approximations (estimators) ofČech complexes. Indeed, a result from [17] shows that given δ > 0, there exists a chain of inclusions 3. Though Rips complexes and barcodes seem to be challenging objects to wrap one's head around, there is an ever growing list of algorithms from various languages that can be used for their visualization. All the analysis below was performed using R, in particular, the TDA package in R version 4.3.0, 21 April 2023.

Randomly Generated Data
We generated 100 data points sampled randomly in the square [−5, 5] × [−5, 5]. In Figures 3 and 4 below, we illustrate the Rips and barcode changes through a filtration.  As we move from left to right, it shows how sample points (blue dots) first form 0-simplices, then 1-simplices, and so on. In particular, it shows how connected components progressively evolve to form different types of holes.

Data Description
The main purpose of the manuscript is to analyze EEG data. We consider a publicly available (at http://www.meb.unibonn.de/epileptologie/science/physik/eegdata.html, last accessed on 1 June 2023) epilepsy data set. For a thorough description of the data and their cleaning process, see, for instance, [39]. They consist of five sets: A, B, C, D, and E. Each contains 100 single-channel EEG segments of 23.6 s, where A and B represent healthy individuals. Set D represents the data obtained from the epileptogenic zone in patients, and set C represents data obtained from the hippocampus zone. Set E represents the data from seizure prone patients.

Data Analysis
The approach is to first embed the data into a manifold of high dimension. This was already performed in [39]. The dimension d = 12 was found using the method of false nearest neighbors. Depending on the set used, the size of the data can be very large: for example (4097 × 100 × 5 = 2, 048, 500), making it very challenging to analyze holistically. In [39], we proposed to construct a complex structure (using all 100 channels for all 5 groups) whose volume changes per group. We would like to analyze the data further from a persistent homology point of view. This would mean analyzing 500 different persistent diagrams and making an inference. We note that simplicial complexes of these data sets are very large (2 Millions+). Fortunately, we can use the Wasserstein distance to compare persistent diagrams. To clarify, we use each of the dimension reduction methods introduced earlier, then proceed with the construction of persistent diagrams. We then compare them by method and by sets (A, B, C, D, and E).  Figure 4. Example of the evolution of barcodes through a filtration with parameter δ for the same data as above. As we move from left to right, from top to bottom, it shows the appearance and disappearance of lines (H 0 ) and holes (H 1 ) as the parameter δ changes. It shows that certain lines and holes persist through the filtration process.

Single-channel Analysis:
Suppose we select at random one channel among the 100 from set D, Figure 5 below represents a three dimensional representation of the embedded data using Takens embedding method (Tak), plotted using the first three delayed coordinates x = x(t), y = x(t − ρ), z = x(t − 2ρ), where ρ = 1∆t, with ∆t = 1 f s = 5.76 ms in Figure 5a; then, the first three coordinates in the case of kernel ridge regression (KRR i ) in Figure 5b; ISOMAP (iso.i) in Figure 5c; Laplacian Eigenmaps (LEIM i ) in Figure 5d; fast independent component analysis (ICA i ) in Figure 5e; and t-distributed stochastic neighbor embedding (t-SNE i ) in Figure 5f. From these three dimensional scatter plots, we can visually observe that the t-SNE plot (Figure 5f) is relatively different from the other five since it seems to have more larger voids. How different is difficult to tell with the naked eye. Figure 6 represents their corresponding barcodes. It is much clearer looking at the the persistent diagram for t-SNE (Figure 6f) that it is very different from the other five when looking at H 0 , H 1 , and H 2 . Now, a visual comparison is not enough to really assert a significant difference. Using the Bottleneck distance, we calculate the distance between the respective persistent diagrams for H 0 and H 1 in Table 1a and H 2 in Table 1b   The analysis above was performed using a single channel, selected at random from the set D. It seems to suggest that the t-SNE method is different from the other five dimension reduction methods discussed above. Strictly speaking, non zero Bottleneck distances are an indication of structural topological differences. What they do not say, however, is if the differences observed are significant. To address the issue of significance, we perform a pairwise permutation test. Practically, from set j and channel i, we obtain a persistent diagram D (j) i ∼ P (j) where j ∈ {1, 2, 3, 4, 5}, i ∈ {1, 2, · · · , 15}, and P (j) is the true underlying distribution of persistent diagrams; see [40] for the existence of these distributions. We conduct a pairwise permutation test with null hypothesis H 0 : P (j) = P (j ) and alternative hypothesis H 1 : P (j) = P (j ) . We use landscape functions (see [41]) to obtain test statistics. The p.values obtained were found to be very small, suggesting that the differences above are indeed all significant across H 0 , H 1 , and H 2 .

Multiple-channel Analysis:
(a) Within set analysis In each set, we make a random selection of 15 channels, and we compare the Bottleneck distances obtained. This means having 15 tables of distance, such as Table 1b above. There is consistency if the cell value k(i, j) in Table k, where k ∈ {1, 2, · · · , 15} and i, j ∈ {1, 2, 3, 4, 5}, is barely different from k (i, j) of Table k . Large differences are an indication of topological differences between the methods within the sets. In Figure 7 below, the y-axis represents Bottleneck distances, and the x-axis represents channels indices. The red color is indicative of the Bottleneck distance between persistent diagrams on H 1 and the blue color on H 2 from the data generated from each of the methods above. We see that overall, while there are small fluctuations from channels to channels on H 1 , the largest fluctuations actually occur on H 2 . A deeper analysis reveals that in fact, the large fluctuations are due to a large distance between t-SNE and the other five methods. This confirms the earlier observations (refer to Figure 6 and Table 1 above) that persistent diagrams are really different on H 2 . Topologically, this means that shells around cavities or voids that persist are not the same when using different dimension reduction methods. However, the small fluctuations on H 1 do not mean that tunnels and holes that persist are the same. Rather, what they do indicate is that they may not be all very different.

(b) Between set analysis
To analyze the data of the Bottleneck distances between sets, we need summary statistics for each set from the data above. It is clear from Figure 7 that the mean would not be a great summary statistic for H 1 , as there seem to be too many outliers. We will use the median instead and perform a pairwise Wilcoxon-Mann-Whitney test. Table 2 below shows the p.value on H 1 and H 2 . The take-away is that the last row of the table suggests that set E is statistically topologically different from others on H 1 , at a significance level of 0.05. In a way, this is a confirmation of the results obtained in [39] where set E (seizure) was already shown to be statistically different from other sets.

Concluding Remarks
In this paper, we have revisited the mathematical descriptions of six dimension reduction methods. We have given a brief introduction to the very vast topic of persistent homology. We discussed how to apply persistent homology to the data. In the presence of data (say in three dimensions) obtained either by projecting the data from a high dimension into a smaller dimension (as in Takens) or by performing some sort of dimension reduction, it is not always clear what we see or how different one method is compared to another. From their mathematical description, they seem to represent different objects. Furthermore, obtaining theoretically a clear discrimination procedure between these procedures seems a daunting, if not an outright impossible, task. Topology may offer a solution by looking at persistent artifacts through filtration. From Figure 5, it seems clear that the methods were different, but Figure 6 offers a different perspective. In the end, through the calculation of Bottleneck distances and hypothesis tests, we can safely conclude that the methods are different, topologically speaking, in that the connected components, the tunnels and holes and the shells around cavities or voids, do not match perfectly. Since these methods are indiscriminately used in many applications, the message is that the replication of results from one method to the next may not be guaranteed in the grand scheme of things. It does not, however, render them useless. In fact, our analysis is limited to one data set, meaning that another data set may yield different conclusions. Furthermore, due to the cost in calculation, we were limited to only a handful of samples. Additionally, the Wasserstein distances for p < ∞ are extremely costly in time to calculate on a regular computer. Even for p = ∞, the Bottleneck distance is also very costly in time to calculate, especially for H 0 . This explains why, at some point, we did not provide the comparison for H 0 . Given that some EEG epilepsy data are known to contain some deterministic chaos, it might be worthwhile to study whether persistent homology can also be used for the better understanding of chaotic data in dynamical systems.