Topological Data Analysis for Multivariate Time Series Data

Over the last two decades, topological data analysis (TDA) has emerged as a very powerful data analytic approach that can deal with various data modalities of varying complexities. One of the most commonly used tools in TDA is persistent homology (PH), which can extract topological properties from data at various scales. The aim of this article is to introduce TDA concepts to a statistical audience and provide an approach to analyzing multivariate time series data. The application’s focus will be on multivariate brain signals and brain connectivity networks. Finally, this paper concludes with an overview of some open problems and potential application of TDA to modeling directionality in a brain network, as well as the casting of TDA in the context of mixed effect models to capture variations in the topological properties of data collected from multiple subjects.


INTRODUCTION
The field of topology can be traced back more than two centuries ago, starting from the work of Leonhard Euler on the famous Königsberg bridge problem, which consists in finding a walk through the city that would cross each of those bridges once and only once [44]. As mentioned in [30], for over a century later, the field of topology was enriched by the contributions of numerous renowned mathematicians, such as Enrico Betti, Camille Jordan, Johann Benedict Listing, Bernhard Riemann, Felix Hausdorff (much later) and many others. By the turn of the century, Henri Poincaré had developed the concepts of homotopy and homology, thus starting the new field of algebraic topology. The field of topology witnessed major advances and theoretical breakthroughs throughout the twentieth century, becoming one of the most important fields of mathematics, but without practical applications.
Despite the major theoretical development throughout the twentieth century, the application aspect did not really take off until much later. Indeed, was only the beginning of this century that topology found its prosperous way to the applications arena under the coinage of topological data analysis. Topological data analysis (or TDA) has witnessed many important advances over the last twenty years, it aims to unravel and provide insights about the KAUST, (e-mail: anass.bourakna@kaust.edu.sa). University of Wisconsin-Madison, (e-mail: mkchung@wisc.edu). KAUST, (e-mail: hernando.ombao@kaust.edu.sa).
"shape" of the data, following the central TDA dogma: data has shape, shape has meaning and meaning drives value. This is done by analyzing the persistence homology using a persistence diagram or barcode. The reader is refered to [18] for an introduction to the notion of persistence, [17] and [20] for a survey on persistent homology and barcodes, and [13] for a review of TDA in general.
Several tools have been developed under the TDA framework to analyze all sorts of data. These tools have been applied to various scientific fields including: Biology [53], finance [22] and brain signals [33] and [6]. These tools aim to guide the practitioner to understand the geometrical features present in high dimensional data, which are not always directly accessible using other classical techniques. Even if such features could be observed upon examination using classical graph theoretical methods, these would not have been found automatically if the topological methods had not first detected them. As a summary, TDA provides a set of techniques for dimensionality reduction without loss in the topological information contained in the data.
The main tools in TDA are Morse and Vietoris-Rips filtrations. These techniques have been extensively used in various applications. For instance, Morse filtrations have been used to study patterns in imaging data such as in [15] and [57], or the geometry of random fields in general as in [1]. However, Vietoris-Rips filtrations (through persistence homology) have been extensively used to study all sorts of point cloud data sets. This includes time series data and its transformations using various embedding techniques. To better understand persistence homology, a few concepts such as simplicial complexes and their filtrations will be introduced.
The overall objective of this paper is to introduce some of the key TDA concepts to a statistical audience and provide an overview of the potential applications of these concepts to multivariate time series data. Thus, we briefly introduce TDA and persistent homology, and focus on various approaches that provide a powerful way to analyze multivariate time series data. In Section 2 we review some background material for TDA: Morse filtration, the persistence homology and time delay embeddings of univariate time series. In Section 3, we present TDA for dependence networks of multivariate time series. In Section 4, we propose a testing framework for identifying topological group differences based on permutation tests. Finally, in Section 5 we present various open problems of interest in brain dependence studies.

BACKGROUND MATERIAL FOR TDA
Understanding topological data analysis requires familiarization with a few key concepts. When reading Topological Data Analysis (TDA) literature, it is common to find terminology such as data points, distance between points, or whether the following TDA summary is stable or unstable. Such phrases often use vocabulary familiar to the statistician, however the meaning may differ greatly, making it difficult for the reader to understand. Therefore, we invite the reader to ask themselves the following questions: • Meaning of data: What constitutes data? • Meaning of distance: How can we define a meaningful distance between data points? • Notion of stability: Is this given TDA summary stable? This is addressed through stability theorems.
These questions will be addressed (directly or indirectly) in this section as well as the coming sections. First, we start by investigating the use of TDA with univariate time series data, then we will consider multivariate time series data such as brain electroencephalogram signals recorded from many electrodes on the scalp. Furthermore, we will examine dependence-based distance functions that measures the degree of association between time series components (e.g., between pairs of electrodes). Finally, we demonstrate TDA on real-world applications with dependence networks.
There is no doubt that data encompasses a wide range of concepts. In the mind of a geneticist, data means something specific (e.g., sequence of nucleotides), however, data means something else for the neuroscientist who works with brain signals (e.g., Electroencephalogram (EEG), Local Field Potential (LFP), functional Magnetic Resonance Imaging (fMRI) etc.). In general data can be represented in various forms; for example, images, functions, time series, counts, random fields, bandpass-filtered signals, Fourier transforms, localized transforms (e.g., wavelets and SLEX) etc. In other words, every type of data may require a different statistical approach. For example, a cloud of points in Euclidean space might require one statistical approach, while a time series of counts might require another.
Similarly, the notion of distance could potentially vary across data modalities. Indeed, the notion of distance is intrinsically linked to the nature of the data. Distance means some measure of proximity between data points present in some space of reference. For instance, it is meaningless to use a Euclidean distance when dealing with categorical data as opposed to continuous data, e.g., gender data or a count time series vs a cloud of points in R 2 .
In general, the notion of distance aims to capture some notion of similarity. When the data naturally originates from a meaningful metric space, one can use the inherited distance metric from the space of reference. Otherwise, a more suitable metric should be used to capture the information of interest. For example, if we are interested in studying the brain networks originating from brain imaging techniques, a meaningful distance metric could be based on the notion of dependence. Sometimes, the goal in a study is to examine the extent of synchrony between regions in a brain network and how that synchrony may be disrupted due to a stimulus or a shock. Often it would be more informative to study the potential cross-interactions between oscillatory activities at different channels and how an increased amplitude in one channel may excite (increase the amplitude) or inhibit (decrease the amplitude) of another channel, see [41]. Simple correlationbased distance measures have become commonplace in many applications due to their simplicity in computation and interpretation. However, these approaches can only examine linear associations and may not be appropriate for brain signals, where frequency-specific crossinteractions may be more informative.
Stability is a tricky concept mainly because it has many facets, especially in the context of TDA. While the origins of TDA have been in mathematics, it is now more often applied and further developed by statisticians who may have different motivations and applications in mind. In the mind of a mathematician, the stability of some transformation (g) means robustness to small deformations in the input and is usually captured by inequalities such as: where Y is a smooth transformation of X and g is the transformation for which we want to prove the stability.
Where d E could be the Euclidean distance or the Hausdorff distance between X and Y and d F could be the Frobenius norm (||.|| F ) if the function g returns a matrix or it could be the Wasserstein or Bottleneck distance (as defined in Equations 2 and 3) if g return a set of points.
where γ represents a bijection between the sets A and B.
For statisticians, stability has a completely different meaning. Indeed, statistical reasoning considers stability based on the robustness of a conclusion, e.g., the result of a statistical test or inference, against perturbations in the original data due to random noise or any departure from initial model assumptions. When the mathematician considers small/smooth perturbations in the input, the statistician however, considers adding random noise with small standard deviation or a set of outliers to the data. Furthermore, the statistician might also consider the perturbation in terms of a change in the distribution of the data, e.g., if the distribution of the data has a heavier tail (for example student's t-distribution instead of a normal distribution), this might result in the presence of many unexpected outliers.
Both perspectives can lead to the same result in some cases. For instance, if data is sampled from a manifold (later in Figure 24 we provide an illustration of this abstract notion of sampling time series components from an underlying dependence manifold) and small sampling noise is considered, it is typically similar to smoothly deforming the manifold and resampling from the new manifold as seen from Figure 1. However, in this context, adding an outlier to the data may alter the topology of the object completely. In this case, having a stability theorem is of little practical importance since it is too restrictive. Indeed, adding an outlier completely alters the input. Hence, the inequality in Equation 1 still holds. However, it is not always tight enough to provide meaningful conclusions. More details regarding the persistence diagram can be found in [16].
A persistence diagram of a topological space is a multiset of points (birth-death pairs) that represent the various features present in the topology of the data set. A birth time means that at this specific time or scale a new topological feature appeared in the filtration, and the corresponding death time is the time or scale from which such topological feature is no longer present in the filtration. When dealing with one dimensional Morse functions we can only consider zero-dimensional features (number of connected components), but in general when analyzing an arbitrary set of points we can consider higher dimensional features (wholes, voids etc.) to better capture the shape of  Persistence diagrams of clouds of points. The first row shows the cloud of point data: Original data sampled from a circle (LEFT), original data plus noise (MIDDLE) and finally, the original data plus an outlier in the centre (RIGHT). The second row displays the corresponding persistence diagram for each cloud of point. This figure demonstrates how much the persistence diagram can be sensitive to the presence of outliers. Thus, pointing out to the fact that we might need to transform the data before we can apply TDA.
the underlying manifold at hand. For example, the persistence diagrams in Figure 2 show how even a little additive noise (with small σ) can affect the persistence diagram. Adding an outlier in the middle of the circle, however, changes completely the location of the dots (BLUE and ORANGE), which may lead to different conclusions about the the underlying unknown process that generated the data. Persistence diagrams help visualize/summarize the topological information contained in data. A detailed explanation of this notion will follow in the next section.

Persistent homology of Morse filtration
Very often in time series analysis, the observed time series can be modeled using a mean structure plus a random (and possibly correlated) noise, as seen in Equation 4 below where the mean structure µ(t) is supposed to capture the deterministic trend, and the noise (t) accounts for the stochastic fluctuations around the mean which captures the autocovariance (or generally, the within dependence) structure. Viewing µ(t) as a Morse function allows us to use Morse theory to build a sublevel set filtration that captures the topological information contained in µ, specifically the arrangement of its critical values, see Theorem 3.20 in [3] that says: The homotopy of the sublevel set only changes when the parameter value passes through a critical point. In a nutshell, the "homotopy" of a set considers only the critical information about its topology and disregards the effect of continuous deformations, for example, shrinking or twisting the set without tearing.
In general, EEG signals represent the superposition of numerous ongoing brain processes. Therefore, to be able to accurately characterize and estimate brain functional response to a stimulus, it is necessary to record many trials of the same repeated stimulus. In this case, a smoothing approach (i.e., average many time-locked single-trial EEGs recorded from the same stimulus) is meaningful as it allows random (nonrelated to the stimulus) brain activity to be canceled out and relevant signal to be enhanced. This new signal is referred to as event-related potential (ERP).
Indeed, applying the Morse filtration to ERP data is meaningful as it allows to capture meaningful information regarding the critical values of the ERP signal. On account of the additive nature of the noise , a first step is needed to smooth the time series (i.e., recover the mean structure µ(t)). After smoothing, the Morse filtration can be built then visualized as in Figure 3. As the parameter a increases, the corresponding sequence of preimages (i.e., µ −1 (] − ∞, a])) forms a sublevel set filtration. The topology of the preimage only changes when a goes through a critical value with non vanishing second derivative or non singular Hessian matrix. At every critical value, a component is either born (at local minimums) or dies (at local maximums) by merging with another component.
The Morse filtration summarizes the topological information contained in the mean of the time series, µ(t), by capturing the information contained in the arrangement of the local extrema. This implies that TDA ignores the information contained in the noise structure, e.g., the covariance and dependence structure. Therefore, the practitioner has to verify that the assumptions of Equation 4 are valid. Otherwise applying the Morse filtration will result  in big information loss. For example, in time series analysis, such a model could have disastrous consequences, as in the case of autoregressive processes, see Figure 4. In this example, the presence of a high-frequency AR(1) process does not modify the mean structure significantly, which does not alter the conclusions of the Morse filtration as the smoothing step cancels the noise structure and unravels the true mean structure. The presence of a low-frequency AR(1) noise process can be problematic because this can be incorrectly absorbed into the mean structure µ(t). This could lead the Morse filtration into erroneous conclusions.

Persistent homology of Vietoris-Rips filtration
The original motivation behind homology theory is to distinguish between topological objects algebraically, using group theory based on Betti numbers ( Figure 5). Homology is a tool from algebraic topology that analyzes the topological features of objects, such as connected components, holes, cavities, etc. For example, Figure 5 presents a few topological objects (e.g., a sphere, a torus etc.) with various topological features of different dimensions such as wholes and cavities. Often, the data is assumed to be a  Example of a Vietoris-Rips filtration. As the radius increases, the balls centered around the data points with radius start intersecting, which leads to the appearance of more and more complex features in the increasing sequence of simplicial complexes.
finite set that is sampled with additive noise from an underlying topological space. In order to analyze the shape of the data, we usually build the homology of the data by looking at the generated networks of neighboring data points at varying scales/distances, as seen in Figure 6. We call this sequence of increasing networks Vietoris-Rips filtration, see [26], as we increase the radius the balls surrounding the data points start intersecting and thus form a network of neighboring points, see Figure 6. The goal of such approach is to detect the nature of the geometrical patterns when they appear (birth) and for how long they persist (death time minus birth time) for a wide range of radius values. The Vietoris-Rips filtration is constructed based on the notions of a simplex and simplicial complex, which is a finite collection of sets that is closed under the subset relation, see Figures 7 and 8. Simplicial complexes can be thought of as higher dimensional generalization of graphs. The use of simplicial complexes aims  for summarizing the shape of the data at every scale by a mathematical object that simplifies abstract manipulations. Simplicial complexes can be as simple as a combination of singleton sets (disconnected nodes), or more complicated such as a combination of pairs of connected nodes (edges), triplets of triangles (faces), quadruplets of tetrahedrons, or any higher dimensional simplex as in Figure 7 or a combination of different k-simplices in general as in Figure 8. The notion of a simplicial complex may seem almost identical to the notion of networks. However, there is a major difference. In spite of this, there is still a significant difference. On the one hand, networks and graphs disregard surfaces, volumes, etc., and can be thought of as flexible structures, whereas simplicial complexes take a much richer approach by keeping track of many levels of complexity represented by various k-simpleces.
The above definition of simplicial complexes gives a rigorous description of the Vietoris-Rips filtration as an increasing sequence of simplicial complexes. To construct this increasing sequence of simplicial complexes, practitioners use the concept of a cover with an open ball around each node. An important motivation behind this approach is the Nerve theorem. Historically proposed by Pavel Alexandrov (sometimes attributed to Karol Borsuk), the Nerve theorem simplifies continuous topological spaces into abstract combinatorial structures (simplicial complexes) that preserve the underlying topological structure and can be examined by algorithms. Indeed, as described in [7], the Nerve theorem states that a set and the nerve of the set covering are homotopy equivalent as FIG 9. The open cover (LEFT) and the corresponding nerve (RIGHT) have identical Betti numbers, denoted by β k (i.e., number of connected components, holes, voids etc.). As the resolution of the cover increases the topological structure of the resulting nerve resembles that of the original space.
the resolution of the cover increases, i.e., they have identical topological properties, such as the number of connected components, holes, cavities etc. As a result of the Nerve's theorem, see Figure 9, the topological properties (i.e., Betti numbers) of the simplicial complexes generated from the open cover should emulate those of the underlying manifold as the resolution of the covering increases, and thus the open cover converges to the original manifold.
In practice, to construct the Vietoris-Rips filtration from a finite set of points, one looks at the increasing finite is a ball centered around the i th point/node with radius r) of the topological space of interest at a wide range of radius values, as seen in 7. Another example is considered in Figure 10 where the threshold values where topological features appear or disappear are denoted by i . Once the persistent homology is constructed, it needs to be analyzed using some topological summary such as the barcode, the persistence diagram or the persistence landscape. In Figure 11 we see the representation of the corresponding persistence diagram and the persistence landscape. The persistence diagram (PD) is constructed based on the times of birth and death of the topological features in the filtration as seen in Figure  11. Thus, for every birth-death pair a point is represented in the diagram, e.g., ( 1 , 2 ) and ( 2 , 3 ). The points in the PD are color coded so that every color represents a specific dimension of the homology (dimension 0 for connected components, dimension 1 for cycles etc.). It is hard to manipulate (take averages or compute distances) persistence diagrams (e.g., compute Bottleneck distance or Wasserstein distance, see Figure 12). In [2], the authors compare persistence diagrams. In particular, they show how it can be time consuming to compute the Bottleneck or the Wasserstein distances as it is necessary find point correspondence. Additionally, defining a "mean" (or center) or the "variation" (or measure of spread) of a distribution persistence diagrams is not straightforward, especially when the number of points in each diagram varies. For these reasons, practitioners prefer to analyze a transformation (persistence landscape) of the persistence diagram, which is a simpler object (function), as defined in [9], see Figure 11. The persistence landscape (PL) can be FIG 10. When = 0 all points are disconnected, as grows the open cover becomes larger and larger. When = 1 some of the balls start to intersect and thus an edge (or higher dimensional simplex, if more than two balls intersect) is created between such pair of points which results in the creation of a cycle (one dimensional whole). When = 2 , more edges (ten 1-simpleces) get added and as well as a tetrahedron (four 2-simpleces and one 3-simplex) which results in the creation of a new cycle and destruction of the first cycle. Finally, when = 3 more simpleces are added and the second cycle disappears. This Figure was inspired form the paper by [37]. constructed from the persistence diagram (PD) by drawing isosceles triangle for every point of a given homology dimension in the PD centered around the birth and death times, as shown in Figure 11. In case there are intersecting lines, the most persistent (highest) function is defined to be the PL (i.e., λ 1 ), see example in Figure 13. For more details regarding the properties of the persistence landscape refer to [9]. Since the PLs are functions of a real variable (scale), it becomes very easy to compute group averages and to define confidence regions, and all sorts of statistical properties (such as the strong law of large numbers, or the central limit theorem) could be derived for the PL, as shown in [9].

Time delay embeddings of univariate time series
So far we described the process of building the persistence homology from a cloud of points living in a metric space. However, in order to analyze time series data using the previous method, it is necessary to create some kind of embedding of the univariate time series into a metric space. For instance, instead of studying the time series {Y (t), t = 1, . . . , T }, we will study the behaviour of the cloud of points {Z s = (Y (s), Y (s − 1)), s = 2, . . . , T }, as seen in Figure 15. This particular embedding is known as the time delay embedding, see [49], and it aims at reconstructing the dynamics of the time series by taking into consideration the information in lagged observations.
In a time delay embedding, the aim is to reconstruct the phase space (the space that represents all possible states of the system) based on only one observed time series component, borrowing information from the lagged observations to to do so. Under the initial assumptions, this is indeed possible since all the components are interdependent through the shared dynamical system. This phase space may contain valuable information regarding the behaviour of the time series. For example, the time series might display some chaotic behaviour in time, see 14, however, in the phase space it might show some conver-  gence to a strange attractor (a region of the phase space towards which the system converges), see [49] and [46] for more details regarding the application of topological data analysis to time series data. The foundation of this method derives from the framework of dynamical systems. Indeed, Takens's theorem states conditions under which the attractor of a dynamical system can be reconstructed from a sequence of observations, as it can be seen in Figure 14. Therefore, when practitioners use topological data analysis to analyse the shape of the point cloud embedding of a time series, they are in fact assessing the geometry of the attractor of the underlying dynamical system. This approach is perfectly meaningful if the initial assumptions of the Takens's theorem are valid, that is assuming there is a corresponding dynamical system that is continuous and invertible that corresponds to the observed time series. However, very often in time series analysis such approach does not make sense, especially when the observed time series is noisy, has constant mean and has no apparent relation with any dynamical system and hence it is unlikely to observe such patterns. For example in Figure  15, we see that the time embedding does not show any interesting geometrical features when the level of the noise is high.

TOPOLOGICAL METHODS FOR ANALYZING MULTIVARIATE TIME SERIES
Over the last three decades, the analysis of the human connectome using various brain imaging techniques such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG) has witnessed numerous successes (see [32], [34], [55], [54], [25], [56], [40], [51], [52], [24]) to discover the background mechanisms of human cognition and neurological disorders. In this regard, the analysis of the dependence network of a multivariate time series from a topological point of view will have the potential to provide valuable insight.
A multivariate time series might not display any relevant geometrical features in the point cloud embedding. However, it might display topological patterns in its dependence network as seen in Figure 16. For this example, none of the previous methods can capture the interdependence between different time series components. However, an application of TDA to the time series' dependence network would directly reveal the cyclic pattern.
Traditionally, due to their stochastic nature, multichannel or multivariate brain signals have often been modeled using their underlying dependence network, i.e., dependence between brain regions or nodes in a brain network, see [11], [29], [41], [54]. The brain network is often constructed starting from some connectivity measure derived from the observed brain signals. There are many possible characterizations of dependence and hence many possible connectivity matrices. These include crosscorrelation, coherence, partial coherence, partial directed coherence, see [41]. Due to the difficulty to analyze and visualize weighted networks, very often, practitioners make use of some thresholding techniques to create a binary network (from the weighted brain network) where the edge exists when the continuous connectivity value exceeds the threshold, see [33], [31], [8] and [12]. This thresholding is usually done at an arbitrary level. A major problem associated with this common approach is the lack of principled criteria for choosing the appropriate threshold. Moreover, the binary network derived from a single threshold might not fully characterize the dependence structure. Obviously, the thresholding approach induces a bias and a huge loss of information that produces a simplified network.
Consequently, applying topological data analysis directly to the original data (i.e., consider a filtration of connectivity graphs) appears as an appealing alternative to the arbitrary thresholding of weighted networks. First, the TDA approach detects all potential topological patterns present in the connectivity network. Second, by considering all possible thresholding values, TDA avoids the arbitrary thresholding problem. Refer to [33] for more details regarding this problem.
To formalize these ideas, let X i (t) be a time series of brain activity at location i ∈ V and time tt ∈ {1, ...T }, where V = {1, . . . , P } is the set of all P sensor locations (in EEGs) or brain regions (in fMRI). Therefore, considering a set of P brain channels (e.g., electrodes/tetrodes) indexed by V , the object X = (V, D) is a metric space, where D ij is the dependence-based distance between channel i and channel j (i.e., between X i (t) and X j (t)). We build the Vietoris-Rips filtration by connecting nodes of X that have a distance less or equal to some given , which results in the following filtration: where 1 < 2 < · · · < n−1 < n are the distance thresholds. Nodes within some given distance i are connected to form different simplicial complexes, X 1 is the first simplicial complex (single nodes) and X n is the last simplicial complex (all nodes connected, i.e., a clique of size n). In general, X for a given represents the simplicial complex thresholded at distance . However, X only changes for a finite number of distance values, those present in the distance function, i.e., there are at most n = P (P − 1)/2 simplicial complexes in the filtration (this is the number of simplicial complexes in the filtration, of course the number of possible simplicial complexes must be much higher: 2 ). For a detailed review of how to build the Vietoris-Rips filtration based on a metric space refer to [26].
Given a topological object X with some filtration as defined in Equation 5, the corresponding homology analyzes the object X by examining its k-dimensional holes through the k-th homology groups H k (X ). The zerodimensional holes represent the connected components or the clustering information, the one-dimensional holes the represent loops, the two-dimensional holes represent the voids etc. The rank β k of H k (X ) is known as the k-th Betti number, see illustration in Figure 5. Refer to [38] and [35] for more rigorous definitions of these topological objects.

Examples of time series models
In order to show how TDA can be used to analyze the dependence pattern of a real multivariate time series, we propose to illustrate via simulations how topological patterns such as cycles and holes can arise in multivariate time series. Based on the idea developed in [41] and [23] that recorded brain signals could be viewed as mixtures of latent frequency specific sources, i.e., mixture of frequency specific neural oscillations. These oscillatory activities can be modelled by a linear mixture of second order autoregressive (AR(2)) processes. A latent process with spectral peak at the alpha band (8-12Hz) can be characterized by an AR(2) process of the form: where W α (t) is white noise with E W α (t) = 0 and V ar W α (t) = σ 2 α ; and the AR(2) coefficients φ α 1 and φ α 2 are derived as follows. Note that Equation 6 can be rewrit- The AR(2) characteristic polynomial function is: Consider the case when the roots of the Φ(r), denoted by r 1 and r 2 , are (non-real) complex-valued and hence can be expressed as r 1 = M exp(i2πψ) and r 2 = M exp(−i2πψ) where the phase ψ ∈ (0, 0.5) and the magnitude M > 1 to satisfy causality, see [47]. For this latent process Z α (t), suppose that the sampling rate is denoted by SR and the peak frequency is f α ∈(8-12Hz). Then the roots of the AR(2) latent process are r α 1 = M α exp(i2πψ α ) and r α 2 = M α exp(−i2πψ α ) where the phase ψ α = f α /SR. In practice, we can choose ψ α = 10/100 for a given SR = 100Hz and the root magnitude is M α or some number greater than 1 but "close" to 1 so that the spectrum of Z α (t) is mostly concentrated on the alpha band . The corresponding AR(2) coefficients are derived to be φ α 1 = 2 cos(2πψα) α . An example of such stationary AR(2) process can be visualized in Figure 17. Examples. The goal here is to illustrate the previous idea by considering multivariate stationary time series data with a given cyclic dependence network (cyclic frequency specific communities), see Figure 18 and 19. However, the advantage of applying TDA to the weighted network as explained above is to detect the presence of such topological features. The simulated time series can be visualized in Figure 20 and 21.
It is very common in brain signals to exhibit communities or cyclic structures, as seen in [52] and [50] or in simulations in Figure 22. There are various reasons that could explain the presence of such pattern in brain networks. For instance, the brain network could be organized in such a way to increase the efficiency of information transfer or minimize the energy consumption. Also, the brain connectivity network could be altered due to some brain damage, e.g., Alzheimer's disease could impair some brain regions that could result in the creation of cycles/holes or voids. In general, we can imagine the following scenarios: • Groups of neurons firing together (presence of clusters) • Groups of neurons share the some latent processes (potential cycles) To generate the previous multivariate time series (in Figure 20 and 21) with both dependence patterns, we use the following approach: Example 1: The goal of this example is to show how to generate a time series with the dependence pattern presented in Figure 18.    be the observed time series, Z(t) = [Z 1 (t), . . . , Z 8 (t)] the latent AR(2) processes (as in Figure 17) and (t) = [ 1 (t), . . . , 9 (t)] be the iid Gaussian innovations. Then we can simulate Y (t) as follows: where (t) are iid Gaussian white noise with covariance σ = I 9 . Example 2: The goal of this example is to show how to generate a time series with the dependence pattern presented in Figure 19. Let Y (t) = [Y 1 (t), . . . , Y 9 (t)] be the observed time series, Z(t) = [Z 1 (t), . . . , Z 5 (t)] be another set of independent latent AR(2) processes (as in Figure 17). Then using the following matrix, we can generate the second dependence structure in Y (t).
where (t) are iid Gaussian white noise with covariance σ = I 9 . In order to build the Vietoris-Rips filtration, a dependence-based distance function needs to be defined between the various components of the time series. For instance a decreasing function of any relevant dependence measure could be useful. Therefore, based on the dependence network, a distance matrix can be used to build the persistence homology. First, define the Fourier coefficients and the smoothed periodogram as follows: where k h (ω − ω k ) is a smoothing kernel centered around ω k and h is the bandwidth parameter. Second, define the dependence-based distance function to be a decreasing function (e.g., G(x) = 1 − x) of coherence: Coherence at some pre-specified frequency band is the squared cross-correlation between a pair of filtered signals (whose power are each concentrated at the specific band), see [42]. Another way to estimate coherence is via the maximal cross-correlation-squared of the bandpass filtered signals, see [41]. Example 3: The goal of this example is to provide a simple model that can explain/distinguish the observed cyclic structure from the random one in some data sets, see Figure 22. Assume, we observe six time series components that share common cyclic latent independent process copies Z i (t) for group 1: Assume, we observe five time series components that share random common latent independent process copies Z i (t) for group 2: when the parameter c = 0 (vanishing noise) the coherence between the observed time series is maximal: Coh(y i , y j , ω) = 1, ∀ω ∈ Ω α , and as the parameter c increases the coherence between the observed time series drops to reach zero when c = ∞. In this way, the signal to noise ratio (SNR= V ar(Signal) V ar(N oise) ) controls the coherence between the components and therefore it controls the distance between the components. This simple example aims to demonstrate a potential explanation of the mechanism behind the appearance of topological features in the brain dependence network. After generating the time series data from both models as described previously, we estimated the coherence matrix based on the smoothed periodogram (rectangular window) for the 100-200Hz frequency band. Based on this we construct the distance matrix and apply TDA to get the PD in Figure  23. We can clearly see that the subplot on the left display 1-dimensional features (orange dot far from the diagonal), whereas the subplot on the right does not.
To summarize, what we are trying to learn from the previous examples is the underlying topological pattern of the dependence space. We can try to visualize this idea using Figure 24.

TDA vs Graph theoretical modeling of brain connectivity
The human brain is organized both structurally and functionally as well into complex networks. This complex structure allows both the segregation and integration of information processing. The classical approach to analyze brain functional connectivity consists of using tools from the science of networks, which often involves the use of graph theoretical summaries such as modularity, efficiency and betweenness, see [28]. Graph theoretical models witnessed an important success in modeling complex brain networks, as it is described in [48] and [10]. The above mentioned graph theoretical summaries aim to characterize the topological properties of the network being studied, and hence can distinguish between small-world, scale-free and random networks. Indeed, such graph theoretical models display important features that are of particular interest in the study of brain activity. For example, random networks (a network where the edges are selected randomly) usually have a low clustering coefficient (low measure of the degree to which nodes in a graph tend to cluster together) and a low characteristic path length (low average distance between pairs of nodes), on the other hand, regular networks have a high clustering coefficient but a high characteristic path length. However, small-world models have high clustering (higher than random graphs) and low characteristic path (roughly the logarithm of the size of the graph). Furthermore, scale-free networks can have even smaller characteristic path length and potentially smaller clustering coefficient than that of small-world networks. Such topological properties may have a direct impact on brain activity such as the robustness to brain injury or efficiency of information transfer between far apart brain regions (variable cost of brain integration). See [58] for an overview of small-world networks and their potential applications and properties and [4] for an overview of of the emergence of scale-free networks in random networks. Such models and graph summaries have been extensively used to study the impact of diseases on the topology of brain connectivity, see [27] and [5].
The goal of such approach is to characterize the topological properties of the brain network. Although such summaries provide interesting and valuable insights into the topology of brain networks, they nevertheless suffer some limitations. Indeed, such summaries cannot capture all the topological information contained in the network, such as the presence of holes and voids, see Figure 26. Furthermore, such summaries cannot be applied directly to a weighed connectivity network. Very often, a thresholding step is necessary, which can be a serious limitation because it can result in an important loss of information if the threshold is not selected properly, as seen in Figure 25. In this regard, applying TDA to brain connectivity FIG 25. The thresholding step consists in comparing the weights of the original network (CENTER) with some given threshold τ . If the weight of the edge is larger than the threshold, one edge is created in the new network, if the weight is smaller, no edge is added. If the selected threshold is low (τ = 0.5), the resulting network is dense (LEFT), if the threshold is too high (τ = 0.7) the resulting network is sparse (RIGHT). The problem becomes: How to select the threshold τ so we balance the loss of information with sparsity? could provide complementary information on the topology of the brain functional network, since TDA consider all potential threshold values.
There are many advantages to using the persistence homology techniques. Topological data analysis is designed to study the topological features (geometry and spatial organization) of networks. Classical approaches describe some topological properties of the network. However, it remains difficult to detect/assess the topological patterns present in general, as seen in Figure 26. A graph theoretical algorithm will count as three different cycles, the cycles around the same hole (green, blue and red). However, TDA can detect exactly only one large hole/topological feature because it uses the concept of a simplicial complexes. Furthermore, an algorithm that clusters the network nodes (modularity analysis) needs a parameter choice, whereas the TDA techniques provides overall answers regarding the network topology without parameter tuning.

DATA APPLICATION AND PERMUTATION TESTING
The purpose of this section is to compare the differences in the topological features of the brain connectivity networks of young individuals with Attention Deficit Hyperactivity Disorder (ADHD) and healthy control groups, see [36]. Specifically, the impact of ADHD on the connected components (0-dimensional homology) and the network cyclic information (1-dimensional homology).
The participants in this study were 61 children with ADHD and 60 healthy controls aged between 7 -12 years old. The ADHD children were diagnosed by an experienced psychiatrist, and have taken Ritalin for up to 6 months. None of the children in the control group had a history of psychiatric disorders, epilepsy, or any report of high-risk behaviors. EEG signals were recorded based on 10-20 standard by 19 channels at a sampling frequency of 128 Hz, see Figure 27.
Since one of the deficits of ADHD children is visual attention, the EEG recording protocol was based on a visual attention task. The children were shown a series of cartoon characters and were asked to count them. To ensure a continuous stimulus throughout the signal recording, each image was displayed immediately and uninterrupted after the child's response. Hence, the duration of EEG recording throughout this cognitive visual task was based on the child's response speed. Only 51 subjects with ADHD were kept with 53 healthy controls. The PREP pipeline was used to preprocess the data (removing the effect of the electrical line; removing artifacts due to eye movements eye blinks or muscular movements; detecting/removing/repairing bad quality channels; filtering non-relevant signal components and finally re-referencing the signal to improve topographical localization).
In both Figures 28 and 29, we see the group level differences in the persistence landscapes. For the zerodimensional homology, the differences seem to be at the delta, theta and alpha-frequency bands. In addition, the one-dimensional homology displays group differences at all frequency bands except the gamma-frequency bands. On contrast, the two-dimensional homology group seems to display differences between the two groups that are too small in magnitude (one order of magnitude smaller than the previous ones!). Now, assume that we are interested in testing for differences in the topology of the brain dependence network between the two group (ADHD and healthy control). Let H 0 be the null hypothesis:There is no impact of ADHD on the brain connectivity network. In order to test such hypothesis, we may carry a permutation test based on the norm of the discrepancy persistence landscapes at some given homology dimension and for specific frequency band Ω: In order to make a decision to reject H 0 , we need to compare the observed test statistic with a threshold obtained Population average persistence landscapes for the 0dimensional homology group for ADHD (orange) and healthy control (blue) groups, at various frequency bands. High frequency bands do not seem to display any differences between the two groups. These plots suggest that both groups have a similar structure at the connected components level. Population average persistence landscapes for the 1dimensional homology group for ADHD (orange) and healthy control (blue) groups, at various frequency bands. Middle frequency bands seem to display differences between the two groups. This suggests that the ADHD group seem to have more cycles/holes in their dependence network. from the reference distribution of the test statistic under H 0 . We use a permutation approach to derive this empirical distribution under the null hypothesis, as it was done in [9], [43] or [57]. A formal framework for testing between two groups in topological data analysis is presented in [45], with an extension to three groups in [14]. Practical examples on nonparametric permutation tests at an acceptable level can be found in [39]. Refer to [43] and [19] for more examples regarding permutation and randomization tests in functional brain imaging and connectivity. Therefore, following the permutation approach we propose the following procedure: 3. Compute the sample discrepancy from the permuted PLs: 4. Repeat steps 2 to 3, B times 5. Compute the threshold τ as the (1 − α)-quantile of the empirical distribution of test statistic F T After applying the above to our data we obtain the following reference distribution for the zero-and onedimensional homology persistence landscapes. Despite the differences between the PLs of the two populations in zero and one-dimensional homology groups, only the differences in the alphaand beta-frequency bands seem to be significant for the one-dimensional homology group, as shown in Figures 28 and 29.

OPEN PROBLEMS
As the field of topological data analysis keeps advancing and developing, new and challenging problems continue to emerge. We discuss briefly three open problems Reference distribution for testing for group level differences between ADHD and healthy control persistence diagrams, based on B=100000 permutations. One-dimensional homology group.
that may be of interest to readers with an interest in topological data analysis applied to brain networks.
The study of brain signals shows that the brain dependence networks may display between group discrepancy as well as within group variability. Historically, linear mixed-effects models (LMEM) have been proposed to to analyze data with fixed effects (average persistence landscape) and random effects (variance of the persistence landscape), e.g., y = Xβ + U z + ). Is it possible to develop such a model, which can be applied to detect group level differences (fixed effect, i.e., differences in β) in the topological structure of the network via the estimated persistence landscapes as well as within group variations of the topological structure (random effects, i.e., differences in z).
Brain dependence networks can be constructed based on various dependence measures. When correlation or coherence are used to measure dependence between brain channels the resulting dependence network is a nonoriented graph. In contrast, when more complex models of dependence (e.g., flow of information) are used to model the dependencies between brain channels, such as partial directed coherence, the resulting network is an oriented one. This results in a non-symmetric distance function which is a problem for the application of TDA. A potential approach to extend the use of TDA to oriented networks is to use the matrix decomposition A = A s + A n , where A s = 1 2 (A + A ) and A n = 1 2 (A − A ). The classical application of TDA results in a global analysis of the network. Therefore, it is impossible to state where the topological features are located in the network. Therefore, it natural to wonder if TDA can be applied locally? to "local sub-networks"? Can we think of TDA in a hierarchical terms? Similarly, sometimes a transient change in connectivity can be observed in brain networks (e.g., localized behavior in time that leads to task-specific functional brain connectivity). How can we use TDA to study such evolutionary or transient events?

CONCLUSION
Historically, brain network analysis relied on graph theoretical measures such as clustering coefficients, betweeness centrality, and average shortest path length to study the topology. Although such an approach revealed some interesting facts about the brain in the past, it does not give us the full picture of the network's geometry. In contrast, TDA has begun to be used to analyze brain network topological data from a persistent homology perspective. This enables a summary of all the scales without having to use arbitrary thresholds.
The purpose of the paper was to provide pedagogical introduction to topological data analysis within a multivariate spectral analysis of time series data. This approach has the advantage of combining the power of TDA with spectral analysis. Which will allow practitioners to characterize the commonalities and differences in the shape of brain connectivity networks across different groups for frequency-specific neural oscillations.
We demonstrated the advantages of using the Rips-Vietoris filtration over the Morse one. We gave a pedagogical review of persistent homology using the Vietoris-Rips filtration over a cloud of points. We discussed how the time delay embedding could be pertinent and showed its limits when the initial assumptions of the Takens's theorem are not satisfied. Finally, we recommended to apply TDA to the connectivity network, as this could capture the rich information contained in the brain signals dependence structure as this was shown in the data application.
Indeed, the application of TDA to the connectivity networks of ADHD vs healthy control individuals shows significant discrepancy between their respective PLs at the alpha-and beta-frequency bands for the one-homology group. This suggests that the ADHD condition affects the cyclic structure of the brain connectivity network more than the connected components.