A Brief Introduction to Nonlinear Time Series Analysis and Recurrence Plots

: Nonlinear time series analysis gained prominence from the late 1980s on, primarily because of its ability to characterize, analyze, and predict nontrivial features in data sets that stem from a wide range of ﬁelds such as ﬁnance, music, human physiology, cognitive science, astrophysics, climate, and engineering. More recently, recurrence plots, initially proposed as a visual tool for the analysis of complex systems, have proven to be a powerful framework to quantify and reveal nontrivial dynamical features in time series data. This tutorial review provides a brief introduction to the fundamentals of nonlinear time series analysis, before discussing in greater detail a few (out of the many existing) approaches of recurrence plot-based analysis of time series. In particular, it focusses on recurrence plot-based measures which characterize dynamical features such as determinism, synchronization, and regime changes. The concept of surrogate-based hypothesis testing, which is crucial to drawing any inference from data analyses, is also discussed. Finally, the presented recurrence plot approaches are applied to two climatic indices related to the equatorial and North Paciﬁc regions, and their dynamical behavior and their interrelations are investigated.


Historical Background
A seminal event in the history of time series analysis was the discovery of nonlinear behavior, such as deterministic chaos and self-similarity, in the 1960s. The subsequent development of such concepts in the next three decades fundamentally and irrevocably changed how we viewed complexity. First, many nonlinear dynamical systems showed 'exponential divergence' of close-by trajectories, which meant that determinism-as opposed to stochasticity-did not guarantee predictability. Second, and particularly relevant for time series analysis, nonlinear dynamical systems could produce 'irregular', aperiodic signals without the presence of any stochastic component. Linear time series analysis techniques typically considered the measured time-ordered signal as comprising of harmonics or periodicities, mixed with stochasticity. In the linear perspective, all irregularities of the signal are attributed to noise [1]. Deterministically chaotic systems demonstrated, however, that it was possible to generate irregular, non-repeating signals without noise. The paradigm of nonlinear dynamical systems provided an additional and fundamentally different route by which to approach real-world complex systems.
It was not until the early '80s, however, that the theoretical developments of nonlinear dynamical systems began to give rise to new time series analysis techniques. This was primarily because most real-world time series were one-dimensional scalar measurements which, in the nonlinear dynamical viewpoint, were the projection of a higher dimensional time evolution on to the set of real numbers R 1 by a 'measurement function.' It was not clear how to infer the details of the high dimensional dynamics in the true 'state-space' of the system from scalar-valued time-series measurements. Two papers, one by Packard et al. [2] in 1980, and the other by Takens in 1981 (cf. [3] for a more accessible version of Takens' original paper [4]), offered possible solutions to this problem and, in essence, gave form to the discipline of nonlinear time series analysis as it is formulated even today. The task of inferring the high dimensional dynamics from a scalar time series, called 'state-space reconstruction,' became the cornerstone of nonlinear time series analysis. From the 1980s on, along with the further development of theoretical phenomena such as self-similarity and fractal dimensions [5][6][7], strange attractors [8][9][10][11][12], bifurcations [13][14][15], and chaotic synchronization [16][17][18][19], there were an increasing number of studies that applied these concepts to time series obtained from laboratory experiments [20][21][22] and real-world data [23][24][25]. The boom in nonlinear dynamics-related studies is also attributable to the parallel development of increasingly faster, more compact, and more powerful computers (cf. Section 2.9-computers and chaos-of [26]). The 1990s witnessed a drastic increase in the number of studies related to "nonlinear time series analysis" (Figure 1a). Nonlinear concepts began to be increasingly applied to complex systems from different fields such as population ecology [27][28][29], human physiology [30][31][32], and climate [33][34][35].  Based on an online search in the Web of Science Core Collection database, the number of publications per year are shown which contain the terms "nonlinear time series" (a) and "recurrence plots" (b), either in the title (yellow bars) or anywhere in the text (green bars). Studies in nonlinear time series analysis begin to increase from the early 1990s on, coinciding with the increasing ubiquity of more powerful and more compact computers. The field of recurrence plots, however, receive increased attention only about a decade later, after the turn of the century. Moreover, studies that are explicitly based on nonlinear time series techniques peak ca. 2005-2010, whereas explicit recurrence plot studies are still on the rise. The difference between the "All fields" and "In title" counts indicate that both concepts are increasingly seen as methodological tools to be applied to other systems.
In 1987, Eckmann, Kamphorst, and Ruelle proposed a new graphical representation, the 'recurrence plot,' which visualized the 'recurrences' of the states of a dynamical system and captured essential features of its dynamics [36]. The recurrence plot was a simple, easily estimable, visual aid to characterize the dynamics of a system. It was based solely on the measured time series and was designed to complement new approaches of the time that estimated various nonlinear dynamical characteristics such as the Lyapunov exponent [37], information dimension [38], and correlation dimension [10]. Over the next decades, however, several methods were put forward to quantify the patterns seen in recurrence plots-e.g., diagonal line structures-to infer probable dynamical features such as a high degree of determinism [39][40][41]. The approaches based on recurrence plots are collectively referred to as 'recurrence quantification analysis' (RQA), and they form the core of recurrence plot-based techniques. A crucial advantage of recurrence plot-based approaches over other nonlinear approaches is that they perform reasonably well even when the length of the time series is short (ca. 50-100 data points). Moreover, in cases when the underlying system is not sufficiently deterministic or stationary, recurrence plots have proven useful in characterizing their behaviour.
The intuitive, visual appeal of recurrence plots, and their applicability to time-series from a wide range of systems contributed to a consistent rise in the number of studies involving "recurrence plot" approaches in the last two decades, following the rise in nonlinear time series analysis by roughly a decade (Figure 1b). In addition to studies that extend the theoretical boundaries of recurrence plots themselves, recurrence plot-based approaches have been applied to a wide range of real-world scientific problems from diverse disciplines. For instance, in the field of finance, Strozzi, Zaldívar and Zbilut [42] studied the time evolution of various RQA measures obtained from high-frequency currency exchange data. Bastos and Caiado [43] used RQA to characterize more than twenty stock market indices from around the world and identify changes in the recurrence quantifiers over time, especially during well-known financial crises. In information technological research, Palmieri and Fiore [44] used RQA measures to design a network traffic classification scheme based on the identification of nonlinear transitions in IP traffic flows. Yang, Pan, and Xu [45] used RQA measures to quantify the complexity of the sequences produced by their proposed quantum walk-based pseudorandom number generator for image encryption. Recurrence plots have also been used to study music. Serrà, Serra and Andrzejak [46] used cross-recurrence analysis to design an automated approach that successfully identified cover versions of a given song. Moore, Corrêa and Small [47] use RQA measures to design a surrogate-based hypothesis test which successfully distinguished ten compositions by famous Baroque composer J. S. Bach from a Markov process.
In cognitive science research, Richardson and Dale [48] used cross-recurrence analysis to study the coupling between the eye movements of a speaker, who was told to describe a scene, and that of a listener watching the same scene. Duran et al. [49] used RQA to monitor body movement during acts of deception and found that the upper face, and partly the arms, of a person show less determinism and high complexity while lying. In the field of physiology, Konvalinka et al. [50] analyzed recurrence plots constructed from heart rate data of different subjects present (as active participants and as spectators) during a fire-walking ritual in a tiny rural Spanish village and found that the collective ritual induced synchronized arousal between active participants and bystanders. Acharya et al. [51] used ten RQA measures estimated from electroencephalogram (EEG) data to construct a feature set that was fed into various classifying algorithms to determine the best predictor for epileptic seizures.
Recurrence plots have also been successfully applied to astrophysical and geophysical research. Zolotova and Ponyavin [52] used cross-recurrence plot analysis to investigate phase synchrony between northern and southern sunspot activities. Stangalini et al. [53] applied RQA measures to detect dynamical transitions in solar activity in the last 150 years. Li et al. [54] used joint recurrence plots to investigate the synchronization between a vegetation index and climatic observables and revealed that for most parts of China, vegetation was more synchronized to temperature than precipitation. Zhao et al. [55] conducted a similar study for the Qinghai-Tibetan plateau region using RQA measures and found that vegetation is anti-correlated to the determinism of temperature data and positively correlated to the entropy. A 2003 study by Marwan et al. [56] used cross-recurrence analysis to investigate the influence of the tropical Pacific region on northwestern Argentina using both modern-day and paleoclimatic proxy data sets. More recently, Eroglu et al. [57] applied RQA to speleothem-based paleoclimatic data from two caves in Asian monsoon region and revealed a see-saw relationship in the dynamical regimes between the two data locations. Zaitouny, Walker and Small [58] use recurrence plots in combination with the previously established 'quadrant scan' technique [59] to identify tipping points of dynamical systems and demonstrate their approach with real-world examples such as petrophysical data from a geological well in Australia, electrocardiogram (ECG) data recording cardiac arythmia, and EEG data of a person doing a multiplication task.

Recurrence Plots in Engineering Research
Recurrence plot-based analyses have also been used in several aspects of engineering research. An early study by Feeny and Liang in 1997 [60] used a recurrence-based generalized autocorrelation function (cf. Zbilut and Marwan [61]) to infer unstable periodic orbits in a stick-slip oscillator model with dry friction. However, the study does not explicitly mention the two-dimensional recurrence matrix that forms the basis of all RQA. In a 2004 study, Wendeker et al. [62] constructed recurrence plots explicitly using pressure time series from a spark ignition engine but they interpreted it only visually to distinguish the observed dynamics from stochasticity. Follow-up studies used RQA measures to show that (i) spark engine dynamics are multidimensional and they change with respect to the spark advance angle [63], and (ii) low (high) frequency crankshaft rotation results in periodic (intermittent) pressure dynamics [64].
Recurrence plot-based approaches have also been used in damage quantification and detection. Nichols. Trickey, and Seaver [65] used multivariate measurements from nine locations on a finite element model of a rectangular steel plate to estimate RQA measures that help to identify dynamical changes induced by a cut (of various lengths) on the plate. Iwaniec et al. [66] used four RQA measures to investigate the changes in dynamics between three different harmonic excitation modes of an aluminum plate and between cracked and uncracked versions of a metal plate. Qian, Yan and Hu [67] estimated a recurrence-based entropy to construct a data-driven model of ball bearing degradation, which they showed could predict failure up to 50 minutes in advance. More recently, Zhou and Zhang [68] constructed a feature vector from four RQA measures and applied it to a T 2 control chart [69] which was able to detect faulty ball bearings reliably.
Several studies that investigate the corrosion of metal surfaces with the help of electrochemical noise measurements have applied recurrence plots to distinguish different types of corrosion dynamics. Based on RQA, Garciá-Ochoa et al. [70] showed that an increase in sensitization intensity of S30400 stainless steel led to more deterministic dynamics. Yang et al. [71] showed that higher hydrostatic pressures applied to high strength Ni-Cr-Mo-V steel also induced more deterministic dynamics. Hou et al. [72] used four RQA measures to construct a PCA-based model for detecting corrosion and also as a feature vector for a multilayer perceptron to predict corrosion type. More recently, Barrera, Gómez and Garciá-Ochoa [73] used RQA to assess and compare the performance of protective coatings on cast iron that inhibit corrosion.
Another important application of recurrence-based approaches is in the study of friction-induced vibrations (FIV), in particular with respect to brake squealing and brake noise. Oberst and Lai [74] gave a first quantitative insight into brake squeal noise, using RQA measures to classify and rank noise performance, and showing that (a) linear measures are insufficient to characterize brake squeal dynamics and (b) nonlinear measures reveal that higher nonlinearity leads to a worse squeal. Wernitz and Hoffmann [75] provided a visual interpretation of recurrence plots obtained from friction brake vibration measurements and argued that FIV dynamics are deterministic on short time scales whereas random disruptions were seen on longer time scales. In a recent study, Stender et al. [76] used RQA to demonstrate that FIVs are multiscale in nature, that squealing is consistent with low dimensional attractors, and that the higher vibration levels correspond to higher determinism and periodicity of the dynamics. They also proposed an automated framework based on RQA to detect and monitor brake squealing. In a related study, Stender et al. [77] showed how RQA could help to identify transitions between different types of friction-induced vibration such as chaotic and weakly chaotic epochs, limit cycles and intermittent stick-slip dynamics.
Kabiraj and Sujith [78] used recurrence plots to verify intermittency in their investigation of thermoacoustic instability in combustion systems using a ducted conical premixed flame. Nair and Sujith [79] employed recurrence plots as a visual tool to infer the dynamics of flows in turbulent combustors and infer Type I or Type II intermittency dynamics. Additionally, Nair, Thampi and Sujith [80] showed that RQA measures in swirl-stabilized and buff-body-stabilized combustors could reliably quantify changes in dynamics as a function of Reynolds number and where a potential candidate for early-warning indication of impending combustion. Elias and Namboothiri [81] used cross-recurrence analysis to identify the transition from regular cutting to chatter cutting of metal plates irrespective of the choice of metal. Harris et al. [82] showed that RQA measures were able to characterize the dynamics of bistable cross-shaped laminated plate energy harvesters and proposed recurrence plots as an efficient approach to distinguish different classes of dynamics such as single-well periodic dynamics, chaotic snapthrough, and continuous snap-through. Oberst et al. [83] have recently shown how recurrence-based power spectra could be used to identify stable and unstable periodic orbits in model systems as well as mineralogical deposits in Western Australia. In their study, recurring episodes of low dimensional dynamics and period-doubling route to instability were revealed along the depth axis of the deposits.

About This Tutorial Review
The aim of this tutorial review is to provide a brief introduction to the basics of nonlinear time series approaches, with a special focus on recurrence plots. Since the concepts of high dimensional flows, state-space reconstruction, time delay embedding, and dynamical attractors are essential to understand how a recurrence plot is estimated and how its patterns are to be interpreted, this review covers the fundamentals of these topics. An additional crucial concept related to time series analysis is surrogate-based hypothesis testing. The fundamental ideas behind statistical hypothesis testing and the role of surrogate data sets in time series analysis methods are presented and a couple of surrogate methods are discussed. Applications from model systems and real-world systems are used as didactic tools where appropriate.
This review is not the first of its kind, neither in the broader domain of nonlinear time series analysis, nor in the more focussed topic of recurrence plots. I refer the reader to [84][85][86], and more recently [87], for excellent introductions to nonlinear time series analytical thought. For an in-depth overview of recurrence plots as a data analysis tool, the review in [88] is recommended, and for a historic coverage of recurrence plot-based approaches, [89] is helpful. This review is not exhaustive-it is intended to provide only a glimpse into nonlinear thinking. No attempt is made to provide a comprehensive literature survey. Rather, the focus is on illustrating key topics with the idea that the readers are made curious and literate enough to go out and explore on their own. Moreover, the review is biased by my own areas of expertise and relevant pointers to similar approaches developed in other studies are provided when appropriate and necessary.
The remainder of the tutorial review is organized as follows: Section 4 highlights several consequences of moving from a linear to a nonlinear paradigm of looking at the world. Section 5 touches upon the basics of the theory of nonlinear dynamical systems. Section 6 summarises the fundamentals of how nonlinear time series approaches represent the time series data and how we can reconstruct the state-space of the high dimensional dynamics from a scalar time series. Section 7 deals with recurrence-based analysis, outlining the concepts of recurrence plots and recurrence networks and showing how to use them to characterize dynamics, estimate synchronization, and detect transitions. Section 8 illustrates the idea behind surrogate-based hypothesis testing and its necessity in time series analysis of real-world complex systems where we do not have access to multiple realizations of the dynamics. Section 9 demonstrates the discussed approaches with real-world examples from climate, and Section 10 concludes the tutorial review with a brief summary and outlook.
Before we proceed, a brief note on measurement units: Without loss of generality, all measured variables presented in this tutorial review are considered to be scaled such that they are dimensionless. For instance, dividing each value of a signal by the sample standard deviation makes it dimensionless while retaining all dynamical features-the primary focus of the discussion here. In a similar sense, the unit of 'time' for all-time series presented here (other than the real-world examples) is arbitrary, i.e., we can choose it to be seconds, minutes, or hours, etc., without impacting the dynamical aspects being explained. Thus, rather than fix a particular unit for time in the model systems and numerical examples, which is a bit contrived in such cases, we drop the unit entirely with the understanding that it bears no direct relevance to the content of the review.

Consequences of Nonlinearity
Idiosyncratic nonlinear behavior such as deterministic chaos, bifurcations, phase synchronization, etc. lead to crucial differences in how we approach various aspects of time series analysis. In many cases, nonlinear behavior is not reflected in linear time series measures such as the power spectrum, correlation, mean, and variance (cf. Chapter 1-Introduction: why nonlinear methods?-of [86]). This does not, however, imply that linear measures are redundant. In practice, while working with a time series for which the underlying process is not known, nonlinear approaches are used as complementary tools to linear time series characterization so that we pin down the dynamics as reliably as possible. Four key aspects of time series analysis that are deeply impacted by nonlinearity are discussed below.

Predictability
Many nonlinear dynamical systems exhibit 'exponential divergence,' which means that the distance between trajectories starting from extremely close-by initial conditions increase exponentially as time progresses [90]. This is a necessary (but not sufficient) feature of chaos and has drastic implications for predictability. Imagine we have measured two signals, one that comes from a linear system and is essentially composed of three superimposed sinusoidal frequencies with some additive noise (Figure 2a), while the other comes from a chaotic system (Figure 2b). Now consider the following situation: We measure the time series for both systems up to time t = 250 and based on this data we determine all three component frequencies of the linear system and we also estimate the exact set of differential equations for the nonlinear chaotic system. We are now in a position to predict the future time evolution of both systems. However, due to the measurement process, our knowledge of the state at time t = 250 has a miniscule error of the order of 10 −6 times the standard deviation of both signals (which are approximately equal). Due to the exponential divergence of nearby trajectories in the chaotic system, the tiny uncertainty in fixing the initial condition for future predictions at t = 250 results in a prediction uncertainty equal to the variance of the signal after some time (yellow region in Figure 2b visible from t ≈ 325 on). In the linear system, the initial uncertainty stays constant throughout the entirety of the future projection. This is one fundamental reason why the weather cannot be predicted more than ten days in advance (on average), even if we were to have immense computational power and high measurement precision. , the signal is from a linear process where it is essentially a superposition of three sinusoids with frequencies, whereas in (b), the signal comes from a nonlinear system in deterministic chaos. We assume perfect knowledge of all three components of the harmonic signal, and also of the system of differential equations of the chaotic system. Using the knowledge of the governing equations, at t = 500, we predict the future trajectories for both systems with an initial uncertainty in determining the state of the system of the order of 10 −6 . For the chaotic system, the 95% confidence interval (i.e., the interval that contains 95% of all initial conditions chosen within initial error bounds) of future projections soon become as large as the entire spread of the signal (yellow shaded area at ca. t ≈ 325). Exponential divergence of chaotic systems thus imply that even in the situation when we have perfect knowledge about system dynamics, errors in measuring the state of the system result in prediction errors that become as large as the diameter of the system after some time. This is not the case for linear systems (a).

Transitions
Another important consequence of nonlinearity is that nonlinear processes exhibit non-intuitive transitions between qualitatively different dynamics when system parameters are changed by small amounts. At times, these transitions are not visible in the time series themselves and are also not captured by their statistical moments. To demonstrate this, consider the chaotic Rössler system, a canonical model system that exhibits paradigmatic nonlinear behavior and is often used as a benchmark for testing new techniques of nonlinear time series analysis. We integrate the three-dimensional system of ordinary differential equations for the Rössler system (cf. Equation  Figure 3a) fails to capture the transition. The moving variance of 25 time units shows different behavior for the screw-type chaos but it is still not possible to uniquely associate a distribution of variances to each dynamical behavior. To detect transitions in a time series generated from nonlinear processes is not a trivial task, as the dynamical changes can be subtle and non-trivial. Nevertheless, it is important to use as many indicators as feasible to reliably conclude whether or not the system might have gone through a transition.

Synchronization
The study of synchronization predates the discovery of chaos and it is a far more general phenomenon found in various systems, both linear and nonlinear. The synchronization of chaotic systems, however, opens newer ways of looking at interdependencies between complex systems. It turns out that there are subtler ways of connecting two systems than by a simple positive or negative correlation. One interesting case is the phenomenon of 'phase synchronization,' which is a weak form of synchrony between two systems, classically used to study the co-evolution of periodic self-sustained oscillators. That phase synchrony is possible for chaotic systems was first demonstrated in 1996 by Rosenblum, Pikovsky and Kurths [18]. We consider here the same example as used in [18], where two chaotic Rössler systems are coupled in the first variable x and which show phase synchrony for a coupling strength of 0.035 ( Figure 4a). Under this condition, the amplitudes of both systems are not correlated (Figure 4b) whereas the phases (as estimated using a Hilbert transform of the x-components) are approximately equal (Figure 4c). In the context of time series analysis, this implies that estimating only the cross-correlation (or even mutual information) of the measured signal (in this case, the x-components of both systems) is not enough to capture all possible forms of interrelations. Equations (21) and (22) and [18,88]). The amplitudes, i.e., the x-components of both systems, are almost uncorrelated (Pearson's cross-correlation of 0.013 in (b)) while the phases, obtained by a Hilbert transform of the x-components, are almost identical, evidenced by the diagonal line obtained when the phase φ 2 of the second Rössler oscillator is plotted against the phase of φ 1 of the first (c).

Characterization
In signal processing, it is common to characterize a time series by its power spectrum, which highlights the dominant harmonic frequencies in the signal by plotting the square of the amplitudes of the Fourier transform of the data against its frequency range. For a linear and time-invariant (i.e., stationary) system, the Fourier transform completely describes the response of the system to perturbations because the complex exponentials of the Fourier transform are also the eigenfunctions of such systems. In certain cases of nonlinearity, this might not be sufficient to describe the entirety of the system's properties. To illustrate this, we consider two signals (Figure 5a), where one is from the chaotic Rössler system, and the other is a randomized version of the Rössler time series. The randomisation is done using a specific method [91] which ensures that the power spectrum of the randomized signal is the same as the original signal (Figure5b). The randomized signal, however, possesses none of the complex characteristics in high dimensional time evolution as is seen in the chaotic time series (shown in Figure 5c). Thousands of such randomized versions of the original signal can be generated, each unique in their time evolution, and each with a power spectrum similar to the original chaotic time series. Still, they are, in essence, random processes-sharing none of the determinism of the Rössler system. . Power spectrum is insufficient to describe high dimensional nonlinear dynamics. The power spectrum (b) of two signals shown in (a) do not reflect the stark difference between the two as is evident from a three-dimensional representation of their time evolution in (c). Signal 1 (green) is the x-component of the chaotic Rössler system and signal 2 (yellow) is constructed using an algorithm than randomizes signal 1 such that the power spectrum is (approximately) unchanged. This highlights that linear characteristics are at times insufficient to describe high dimensional nonlinear complexity.

What Is a Dynamical System?
The time evolution of a dynamical system is formulated mathematically in terms of either (i) a system of differential equations that result in 'flows,' i.e., continuous time-evolving trajectories, or (ii) a system of discrete iterative rules often called 'maps.' The well known Rössler model already used in Section 4 is an example of a three-dimensional flow and the two-dimensional Hénon map is a well-known example of a discrete map ( Figure 6 shows the state space trajectories for the two systems). In both scenarios, the system's time evolution starts from a prescribed initial condition (crosses in Figure 6a,b) and all subsequent points are obtained according to the integration of the system of differential equations (for the Rösser system) or by using the discrete-time iteration rules (for the Hénon system).
The Rössler model is described by the following equation system: whereẋ denotes the time derivative dx dt , and a, b, c are the parameters of the system (for Figure 6a a = 0.432, b = 2, c = 4). For the Hénon map, the dynamic (i.e., the time evolution) is described by a set of time discrete functions: where x t denotes the value of x at time N t = 1, 2, 3, . . . and a, b are the parameters of the system (for the behaviour in Figure 6b, a = 1.4 and b = 0.3). A crucial difference between the two systems is that the system state in the Hénon map 'jumps' from one point in the XY-plane to another at each time step of iteration. However, in the Rössler system, the trajectory 'flows,' in a continuous sense, from one (x, y, z) point in the 3D space to another such that any point in the trajectory is connected to any other point itself by a continuous unbroken path.
The system of equations in Equation (1) can be rewritten more concisely using the following notation. Consider X := (x, y, z) ∈ X ⊆ R 3 as the vector (in the 3D space X ) that describes the state of the system at a given time t. Next, take ( f x , f y , f z ; θ) =: F : X → X as the 3D function which maps every point in X to another point in X itself, and θ denoting the set of parameters, which in this case is θ = {a, b, c }. The three individual functions f x , f y , f z that compose F are the relations given in the right-hand sides of the equations in Equation (1) . Thus the Rössler model in Equation (1) becomeṡ which, without loss of generality, can be used to denote the dynamic of any flow. Similarly, we can write down the generic form of the map in Equation (2) as which again is a generic form used to describe maps. In this particular case, for the Hénon map, In both cases, given the space X , which contains all possible states the system can attain, and the function F, which tells us how to move from the state at a time t to a next time t + δt (continuous) or t + 1 (discrete), we have complete knowledge about the time evolution or the dynamics of the system, i.e., the system is fully described. The space X is called the 'state space' of the system (at times also referred to as the 'phase space'), and F is referred to as the 'dynamic' which occurs within the state space. Together, the state space and the dynamic constitute the 'dynamical system.' Hénon y t b a Figure 6. Nonlinear dynamical systems: Flows and maps. The Rössler system (a) is an example of a continuous dynamical system whereas the Hénon map (b) is a paradigmatic example of a 2D discrete dynamical system. For the Rössler, trajectories in the 3D state-space are smooth, differentiable paths that settle on the attractor as t → ∞. For the Hénon map however, the trajectories jump from point to point in the state space and the attractor is therefore discontinuous. Here, the parameters for the Rössler model are fixed at a = 0.432, b = 2, c = 4 and those for the Hénon are fixed at a = 1.4, b = 0.3, for which both attractors are chaotic. Three different initial conditions (marked as "×"s) all converge to the attractors as time progresses.

Attractors
Given a particular configuration of the parameters θ of a dynamical system, it is found that the equilibrium behavior, i.e., the values of state X as time t → ∞, is either 'uni-stable' or 'multi-stable.' 'Uni-stability' implies that, all initial conditions for a given θ converge to a single type of equilibrium behavior such as fixed point, periodic, chaotic, hyperchaotic, etc. 'Multi-stability' would imply that the state-space can be partitioned into mutually exclusive sub-spaces of initial conditions, such that each sub-space converge to a particular equilibrium behavior. In this tutorial review, we consider only the uni-stable case. This is shown in Figure 6, where all three initial conditions chosen for the Rössler system and the Hénon map ultimately ended up in the same region of the state space, and actually even in the same 'shape.' The set of states in the phase space which determine the equilibrium dynamics irrespective of initial conditions is called the 'attractor' of the system for the chosen parameters θ. An additional condition for the set of states to be an attractor is that once the trajectory of the system reaches this set, it should stay within the set for the rest of the time. There are many different mathematical ways of defining attractors, as also there are different kinds of attractors for different dynamical systems and under different conditions. In general, a few desired properties of attractors are [9,92]: 1. Invariance: the attractor should map to itself under F. 2. Attractivity: any set of initial conditions in the state space should, for large t i.e., t → ∞, converge to the attractor. 3. Irreducibility: the attracting set of states should be connected by one trajectory and it should not be possible to decompose the attractor to subsets of states which have non-overlapping trajectories. In this case, each subset would be an attractor and not their union. 4. Persistence: the attractor should be stable under small perturbations, i.e., small deviations from the trajectory on the attractor should return back to the attractor. 5. Compactness: the attracting set of states for the dynamic should be compact.

Bifurcations
Dynamical systems typically exhibit different types of equilibrium behavior when one or more of the system parameters are changed. The equilibrium behavior changes because the change in the parameters destabilizes an existing attractor and new attractors emerge in its place. An added feature of such behavioral changes is that they often happen suddenly, i.e., there typically exists a critical parameter value above and below which the system has two different attractors, or in the case the system is multi-stable, the system might show different configurations of multi-stability before and after the critical parameter value. The parameter value at which the attractors of the system change qualitatively to a different type is called the 'bifurcating point' and the phenomenon itself in which the system changes its equilibrium behavior is called a 'bifurcation.' There are many classes of bifurcations that have been observed in dynamical systems, such as the saddle-node, pitchfork, Hopf, homoclinic, and heteroclinic bifurcations (cf. Chapters 3 and 8 of [90]). Although for algebraically simple dynamical systems, the points of bifurcation can be analytically obtained, in most cases, it is useful to numerically integrate or iterate the system for various values of the bifurcating parameter and plot the 'bifurcation diagram.' The bifurcation diagram is obtained by visualizing the equilibrium values of one dynamical variable against the parameter value. There are different ways in which a dynamical system approaches a chaotic attractor as the bifurcation parameter is varied. Both the Rössler model and the Hénon map considered here bifurcate when we change their respective parameter a (Figure 7) and approach a chaotic attractor via the 'period-doubling route' to chaos [93].
In analyzing time series where the system dynamics are not known, the notion of bifurcations plays an important role as it opens up the possibility of having transitions that do not show up in the statistical moments of the variable but nevertheless involve qualitatively different equilibrium behavior. We have encountered one such scenario in Figure 3 in which the Rössler system undergoes a bifurcation from spiral-type chaos to screw-type chaos. The mean and the variance of the observed time series are similar before and after the transition. For the continuous Rössler system, the bifurcation diagram is obtained by discretizing the dynamic by taking the local extrema of the time series y(t). Cross-sections such as the y = 0 plane (known as Poincaré sections) can also be considered, which leads to a different projection of the dynamic. The similar pattern seen in both systems-as they move from a single stable fixed point for low values of a to periodic behavior and finally to chaos-is a universal phenomenon seen in chaotic systems known as the 'period-doubling' route to chaos [93]. The vertical red dashed lines indicate the value of a used in Figure 6.

State Space Reconstruction
The diverse complex behavior exhibited by nonlinear dynamical systems opened up new perspectives of looking at real-world systems. At the same time, however, it gave rise to a critical problem in the analysis of time series that were obtained from real-world complex systems: how could high dimensional dynamics be analyzed using only scalar time series? One solution to this challenging problem was first proposed in the seminal paper by Packard et al. in 1980 [2], in which the authors proposed to reconstruct the state space from a scalar time series by using the derivatives of the time-series measurements. In the case of noisy measurements, however, derivatives of increasingly higher-order amplify the noise. In this tutorial review, we consider a second approach, first given by Takens in 1981 [4], in which a system of delay coordinates is used to reconstruct the state space dynamics of higher dimensionality. These two papers-by Packard et al. and by Takens-in conjunction with a third crucial addition by Sauer, Yorke, and Casdagli [3] a decade later, laid down the fundamental pillars of nonlinear time series analysis methods. We explain below the time-delay embedding approach to state-space reconstruction, and follow the review in [84] closely, due to its concise, sharp description.

The Measurement Paradigm and Time Delay Embedding
Consider the dynamical system {X , F} composed of the d-dimensional state space X ⊂ R d and the flow F : X → X which maps a point in the state space at a given time to the next point on the trajectory at the next time instant. We obtain a scalar time series s t ⊂ R 1 by using a measurement function h : X → R 1 which maps discrete locations x t ∈ X of the d-dimensional flow to discrete, time-ordered scalar values such that s t = h(x t ), where x t := F t (x 0 ) and x 0 is the initial state of the system at time t = 0. In this situation, one can define a delay coordinates map [84], Here, G maps a point x t on the flow to a point y of the reconstructed state space in R m with the help of the 'time delay' τ, and the 'embedding dimension' m. Takens proved that G has the generic property that it is an embedding of {X , F} in R d for the condition m ≥ 2d + 1 [4]. That it is an embedding implies that G : X → G(X ) ⊂ R m is a diffeomorphism, and that it is 'generic' implies that the subset of pairs (h, τ) which result in an embedding is an open and dense subset in the set of all pairs of (h, τ). Although in principle this was a path-breaking result, for actual high dimensional real-world systems (such as regional weather systems), it meant that the embedding dimension had to be greater than twice the dimension of the state space. This would mean that the scalar time series would have to be extremely long in order to facilitate an embedding in such a high dimension. The next advancement in this regard was made by Sauer, Yorke, and Casdagli [3] where the condition m ≥ 2d + 1 was further constrained so that m needed to be greater only than twice the box-counting dimension of the attractor A ⊂ X in order to ensure that it is an embedding of {X , F}. All such embeddings are "prevalent" rather than "generic" in the sense that 'almost all' pairs of (h, τ) will result in an embedding. This result had tremendous practical consequence because a large part of nonlinear dynamical systems were, in fact, dissipative systems in which the volume and dimensionality of the attractor A was drastically smaller than the state-space X . In other words, since all initial conditions would ultimately settle to the dynamics on A, which had a much smaller volume, and consequently much smaller dimension, than X , it was possible to study and model the equilibrium dynamics of a high dimensional system {X , F} using a much smaller dimensional embedding G as given in Equation (5).

Time Delay Embedding in Practice
In order to determine the embedding from a given time series, the time delay τ and the embedding dimension m must be fixed. The results of Takens and Sauer, Yorke, and Casdagli prove the existence of the embedding, but they do not tell us how to determine τ and m. Till date, there are no rigorous mathematical results that provide a unique route by which to determine these two parameters. In the absence of such proofs, numerous methods have been proposed that use various heuristic considerations to direct the time series analyst on how to optimally choose τ and m. We focus here on two popular approaches, one each for selecting τ and m, which allow us to reliably reconstruct the attractor of high dimensional dynamics from scalar time series (Figure 8c,d).
1. Determining the time delay. The choice of τ impacts the resulting embedding critically. When τ is smaller than the desired value, consecutive coordinates of y t are correlated and the attractor is not sufficiently unfolded. When τ is larger than the desired value, successive coordinates are almost independent, resulting largely uncorrelated cloud of points in R m without much structure. It is important that the fundamental idea in determining the time delay τ is that each coordinate of the reconstructed m-dimensional vector y t must be functionally independent. In order to achieve this, it is recommended to set τ to the first zero-crossing of the autocorrelation function. However, the autocorrelation function captures only linear self-interrelations, and it is more preferable to use the first minimum of the self-mutual information function (Figure 8a), as first shown in [94]. For the scalar time series {s t } N t=1 , the self-mutual information at lag τ is, p(s,s(τ)) log p(s,s(τ)) p(s) p(s(τ)) , whereS s is a random variable underlying the sample {s t } N−τ t=1 andS(τ) s(τ) is the random variable underlying the sample {s t } N t=τ . In this notation, the optimal value of τ is given by, 2. Determining the embedding dimension. The method of false nearest neighbours (FNN) put forward by Kennel, Brown and Abarbanel in 1992 [95] is typically used to determine the embedding dimension, once a time delay τ is chosen. This approach is based on the geometric reasoning that given an embedding y m t in dimension m, it is possible to differentiate between 'true' and 'false' neighbours of points on the reconstructed trajectory. In this method, we first choose a reasonable definition of 'neighbourhood.' Based on this definition, we identify the neighbours of all points on the trajectory in R m . Next, we look for the false neighbours, defined as those neighbours which cease to be neighbours in m + 1 dimensions, i.e., when we consider the trajectory y m+1 t ∈ R m+1 . The false neighbours were neighbours in the lower dimensional embedding solely because the attractor was not properly unfolded and we were actually looking at a projection of the attractor rather than the attractor itself. As an example, consider the 2D limit cycle trajectory on a circle, where opposite points that are almost on the same vertical line would be seen as neighbours if the same dynamic were projected on to the horizontal axis, i.e., the 1D real line R 1 . Once the attractor is properly unfolded, however, the number of false neighbours would go to zero. In practice, this notion is implemented by the following formula (after Equation (3.8) of [86]), where Θ(·) denotes the Heaviside function which is one for all positive arguments and zero otherwise, || · || denotes the maximum norm and k(i, m) is the index of the point closest to y m i in the m-dimensional embedding based on maximum norm. The first term in the numerator counts all those cases when the distance between a point and its closest neighbour increases by more than a factor of r in going from m dimensions to m + 1 dimensions. However, in order to not count those cases where the points are already far apart in m dimensions, the second term in the numerator is used as a weight and also as a normalisation factor in the denominator. The second term ensures that we count only those cases where the closest neighbour in m dimensions is closer than σ/r, where σ is the standard deviation of the data. The final embedding dimension m e is the smallest value of m for which the fraction FNN(m) is zero (Figure 8b).

Recurrence-Based Analysis
Recurrence-based analysis utilizes a fundamental characteristic of dynamical systems: that a system's dynamical trajectory eventually returns close to earlier states as time passes. Particularly in the case of time series obtained from real-world systems, we find that systems repeat earlier behavior, even if only in an approximate sense, and even though they might at times be interrupted by regime shifts and dynamical transitions. Poincaré was the first to mathematically describe the recurrence of dynamical systems 130 years ago, known today as the 'Poincaré Recurrence Theorem' [96]. As a principle of analyzing time series, however, the first breakthrough for recurrence-based analysis came with the pioneering paper of Eckmann, Kamphorst, and Ruelle [36]. Their work demonstrated, for the first time, how to construct the 'recurrence plot,' which encoded the pairwise recurrences of time series values, and which created a visual typology of the dynamics, solely from the measured time series. This forms the basis of all of the recurrence-plot-based analysis presented in the sections below. We refer the reader to [88] for a comprehensive review of recurrence plots.

Recurrence Plots
Consider the dynamical system {X , F}, X ⊂ R d , F : X → X , as before where {x t } N t=1 is a measured d-dimensional time series of length N. The system is said to recur when a state x i at time t = i is approximately 'close' to a different state x j at time t = j, i.e., x i ≈ x j . Here, the notion of x i being close to x j depends on (i) the choice of a norm such as the Euclidean norm or the maximum norm, and (ii) the choice of a distance threshold, which helps to unambiguously define all states farther apart than itself as 'not close,' and vice versa. We can thus encode all possible pairs of recurrences in the recurrence matrix R, where || · || is an appropriate norm, ε is the chosen distance threshold and Θ is the Heaviside function.
The resulting matrix R of size N × N thus a binary-valued matrix comprising solely of 1 s and 0 s, where the 1 s denote pairs of time points where the measured states recur and 0 s denote non-recurring pairs of time points. R is symmetric only if the chosen norm is symmetric. In their original paper, Eckmann, Kamphorst and Ruelle [36] used a k-nearest neighbor norm. When we have a single scalar time series {s t } N t=1 , we first embed the time series using time-delay embedding (Section 6) to obtain an embedded state-space trajectory {y t } N t=1 , where N = N − (m e − 1)τ e . The recurrence matrix is then estimated from the embedded trajectory y t . A 'recurrence plot' is obtained when we visualize the recurrence matrix by placing a black dot for every 1 and an empty dot for every 0. Based on the relatively simple estimation process given by Equation (9), we arrive at a powerful visual representation that captures the difference in dynamical behaviour ( Figure 9). The recurrence matrix contains all relevant dynamical information. Robinson and Thiel [97] proved that, given a recurrence matrix R, it is possible to reconstruct the time evolution of the dynamical system up to a homeomorphism, i.e., up to a change in the coordinate system. This result is practically observable in the successful reconstruction of time series and attractors from recurrence plots [98,99].
To construct a recurrence plot (RP), however, one has to carefully choose the various parameters related to the estimation of R in Equation (9). The embedding parameters m e and τ e have been extensively studied for their influence on recurrence plot-based estimates. Iwanski and Bradley [100] demonstrated that that for particular low dimensional systems, RQA measures remained unchanged on change of embedding parameters, and Thiel et al. [98] showed that the estimation of second-order Rényi entropy and the correlation dimension from recurrence plots was independent of the choice of embedding. However, this is not the case for all RP-based estimates. March, Chapman and Dendy [101] derive analytical formulae that explicitly express two recurrence-based measures (indicative of deterministic behavior) as a function of the embedding dimension. Thiel, Romano, and Kurths [102] also report how over-embedding-the choice of embedding dimension much larger than the minimum required to unfold the attractor-can introduce spurious correlations between the embedded vectors.
The choice of the parameter ε impacts further analyses based on R, and there is no mathematical result which prescribes how to choose ε uniquely. There are several studies that have proposed approaches to decide on an optimal ε [103][104][105][106][107]. As a general rule of thumb, it is best to choose ε such that the recurrence matrix does not have too few 1 s (empty recurrence plot) or too many (almost completely filled recurrence plot). The ratio of the number of 1 s to the size of R, known as the recurrence rate, is typically kept in the range of 5-10% for most analyses. The choice of ε is innately linked to the timescales that we wish to investigate. On increasing ε, we increase the tolerance of what constitutes a recurrence, and end up with more 1 s in R, which fills in finer, and shorter timescale, structures from the recurrence plot. If the focus of the study is to study longer timescale behavior, increasing the recurrence rate beyond 30% might be appropriate.
Irrespective of the final chosen value of the parameters m e , τ e , and ε used to construct a recurrence plot, sensitivity tests have to be carried out. A sensitivity test (with respect to one parameter) is typically done by changing the parameter value up to a few percent of the value chosen for the main analysis. The results would indicate whether or not the final conclusions are robust to small changes in parameter values.
Beside univariate time series, recurrence plot analyses can also be extended to bivariate data using cross-recurrence plots [108] and multivariate data using joint-recurrence plots [109]. Other than cross-and joint-recurrence plots, additional extensions of recurrence plots include windowed-and meta-recurrence plots [110] weighted recurrences plots [111], fuzzy recurrences plots [112], order pattern recurrence plots [113], and recurrence plots of recurrence plots [114]. Each of these extensions provide useful insights into the recurrence features of idiosyncratic dynamical systems. Pure white noise shows no structure, while harmonic and chaotic behavior show diagonal lines to different degrees, and Brownian motion with drift reveals modular periods interspersed with bottleneck periods. The embedding parameters and the recurrence plot thresholds are given in the title of each subplot. In (b), ε was used as a fixed recurrence rate, while for the rest it was the threshold for the Euclidean norm.

Recurrence Networks
In 2009, Marwan et al. [115] proposed an extension to recurrence plot-based analysis. Their approach was to exploit the inherent analogy between the adjacency matrix of a network and a recurrence matrix to define a new object, the 'recurrence network,' which is the network represented by the adjacency matrix obtained by removing the main diagonal of the recurrence matrix R, i.e., where δ ij is the Kronecker delta, which is 1 when i = j and 0 otherwise. Removing the main diagonal from R is equivalent to removing self-loops in the recurrence network-a necessary condition to analyze several 'path'-based measures in order to avoid getting stuck in an infinite loop. The recurrence network is thus the network obtained when we consider the states x i as the nodes of the network and place a link between x i and all other states which are closer to it than the chosen threshold . The network itself is embedded in the state space of the dynamics (or in the embedding space for reconstructed attractors), and is similar to the phase space networks described by Xu, Zhang, and Small in their 2008 article [116]. In the last decade, recurrence networks have been applied successfully to reveal nontrivial dynamical features in various complex systems [117][118][119].
In general, considering the recurrence patterns of a nonlinear dynamical trajectory as a complex network results in a paradigmatic shift in how we phrase the dynamical questions (synchronization, transitions, and so on), and enables us to apply the entire toolbox of complex network-based measures to the time series under investigation. For instance, consider the 2014 study by Eroglu et al. [111], in which the authors propose an intuitive criterion for choosing ε: that the recurrence network should remain connected. In this way, there exists a critical value of ε below which the network ceases to be connected, and a value just above this critical value is recommended for ε. We refer the reader to [120] for an overview of complex network techniques used in time series analysis.

Quantification Based on Recurrence Patterns
Different structures in recurrence plots are interpreted as characteristics of different kinds of dynamics. Bradley and Mantilla [121] show how unstable periodic orbits can be used as a basis set for the geometry seen in a recurrence plot and how a comparison of the block structure between two RPs can indicate the (dis)similarity between the underlying dynamics. Zou et al. [122] use RPs to distinguish quasi-periodic, fully chaotic, and sticky chaotic orbits in nonintegrable Hamiltonian systems using far less data than other methods. Facchini, Kantz, and Tiezzi [123] show that characteristic curved patterns in an RP can be caused by nonstationarities in the signal, in particular by a linearly increasing or a periodically modulated carrier frequency, and that such nonstationarities are not captured in a traditional spectrogram. More recently, Hutt, and beim Graben [124] propose a new RP-based approach that can identify metastable attractors and heteroclinic orbits, a task that is typically challenging using standard methods.
In general, there are various measures that attempt to quantify different aspects of the recurrence plot into a single number, and they are collectively referred to as recurrence quantification measures. The measures based on recurrence plots typically require the time-ordering information of the recurring points and can thus be considered to capture dynamical features. Measures derived from recurrence networks, however, do not require the time-ordering information, and capture various aspects of the geometry, i.e., the topology, of the attractor. In this section we consider two measures, viz. 'determinism' and 'average shortest path length,' one each from the recurrence plot and recurrence network frameworks.
1. Determinism. A prevalent feature found in most recurrence plots are diagonal lines, which show up when there are periods in which trajectories evolve in parallel to each other. A diagonal line of length l occurs when the following condition is satisfied: y i ≈ y j , y i+1 ≈ y j+1 , y i+2 ≈ y j+2 , . . . , y i+l−1 ≈ y j+l−1 . This condition can hold only when the two sections of the trajectory-one between y i and y i+l−1 and the other between y j and y j+l−1 are parallel to each other in the reconstructed state space, which occurs for periodically repeating portions of the trajectory. A higher number of such periodically repeating sections of the trajectory would imply that the state of the system can be predicted on timescales equal to the period of oscillation which, in this example, would be the time difference ∆t = |i − j|. Diagonal lines are thus typically used as an indicator of deterministic behavior, as is also seen in the recurrence plots given in Figure 9.
To quantify the extent of determinism contained in the recurrence plot, the recurrence plot-based measure DET is defined as, where the denominator is the total number of recurring points, K(l) is the number of lines of length l, and l 0 is the minimum number of points required to form a line. Although technically l 0 should be 2, higher values can be chosen in certain cases. For instance, in noisy systems, one can expect very short diagonal lines to occur purely by chance. Such lines do not encode determinism of the system and are better avoided in the estimation of DET. In such situations, we an set l 0 to a larger value so as to count only longer lines as they are less likely to occur due to randomness and more likely to indicate (any) deterministic component of the underlying process. DET gives a number between 0 and 1 such that a periodic signal (e.g., a sinusoid) will have a value of 1 and a purely stochastic signal will result in a value extremely close to 0. 2. Average shortest path length. A 'path' between two nodes i and j in a network is defined as a sequence of nodes that needs to be traversed in order to go to node j from node i. In general, there exist many possible paths between any pair of nodes in a network, and there can be even several possible shortest paths between a pair of nodes. However, it is possible to uniquely define a shortest path length d ij between two nodes i and j which is the smallest number of nodes that need to be traversed in order to reach j from i. Often the average shortest path length is a characteristic feature of networks that can help distinguish the topology of one network from another. In recurrence networks, shortest path length helps to characterize the topology of nearest-neighbor relationships. Each shortest path d ij is the distance between two states i and j of the system measured by laying out d ij straight line segments between them such that: (i) each line segment cannot be more than ε units long, and (ii) the ends of each line segment must lie on a measured state, the first and last of which are i and j respectively. Thus, d ij is bounded from below by the straight line between the states i and j, i.e., d ij is the upper bound for the Euclidean distance between two states on the attractor [115,117], and its average value is an upper bound for the mean separation of states of the attractor [117]. The average shortest path length, SPL, is estimated as- where N is the size of the recurrence network and the normalization factor N(N − 1) is the possible number of unique paths between N nodes. In order for Equation (12) to be estimable, we should not have self-loops in the network, which is ensured by Equation (10), and the network also has to be connected, in the sense that must exist at least one path between any pair of nodes in the network. Note that SPL quantifies the topology on the recurrence networks embedded in the state space and in several situations, such as multiple attractors existing for the same parameter set, SPL can help quantify differences between the different attractors based on their geometric layout. This might not be reflected in estimations of mean separation of states for the same attractors.
To illustrate the two recurrence-based quantifiers introduced above, we estimate DET and SPL for the Hénon map for different values of the parameter a between a = 1 and a = 1.4, and compare the results with the 'maximal Lyapunov exponent' (MLE) Λ in each case ( Figure 10). The MLE is typically used to quantify the exponential divergence of trajectories: which for the x-component of the Hénon map translates to, Values of Λ > 0 indicate chaotic attractors where nearby trajectories diverge exponentially as time progresses. Negative Λ values denote periodic attractors where all trajectories eventually converge to a fixed number of states.
We find that DET tends to decrease from a ≈ 1.12 as the map bifurcates towards the chaotic attractor at a = 1.4 (green shaded area in Figure 10), and it typically shows a trend opposite to that of the MLE (red shaded area in Figure 10), which increases with a. SPL has low values till around a ≈ 1.12, at which point it jumps to a higher value and then gradually decreases as a is increased. In the periodic windows (ca. 1.23 ≤ a ≤ 1.24 for instance), DET increases sharply and there is a corresponding drop in SPL. Together, the two measures help to infer the underlying dynamics meaningfully: While the decrease in DET from a ≈ 1.12 on tells us that the equilibrium dynamics turn more and more irregular, the decrease in SPL informs us that the attractor shrinks in volume in the same interval.  (2)) on changing the bifurcation parameter a (cf. Figure 7). For comparison, the MLE Λ (red) is shown alongside the recurrence quantification measures to indicate the chaotic and periodic windows. DET (SPL) is low (high) for chaotic dynamics (i.e., Λ > 0) and high (low) for periodic behaviour. Recurrence plots were estimated with τ = 1, m = 2, ε = 2 for the x variable of the Hénon map. The shaded areas denote the interquartile range obtained using 100 randomly chosen initial conditions for each value of a.

Inferring Dependencies Using Recurrences
Recurrence plots also provide powerful ways to determine interdependencies between two systems based solely on the measured time series. Particularly for chaotic oscillators, it can detect subtle and nontrivial modes of connectedness between two systems, such as phase synchronization (cf. Section 4.3) and generalized synchronisation [125]. To detect such forms of synchronization without knowledge of the system equations or system parameters is a difficult inverse problem that becomes potentially tractable with recurrence plots. Romano et al. [126] showed how conditional recurrences obtained from recurrence plots can be used to detect coupling directions whereas Feldhoff et al. [127] estimated coupling directions by a comparison of geometric motifs in the recurrence networks of coupled dynamical systems. Groth [128] showed that a combination of cross-recurrence plots and order recurrence plots can lead to a robust estimator of coupling between two systems. Tanio, Hirata and Suzuki [129] used recurrence plots and combinatorial optimization to reconstruct a slowly varying driving force by observing only the driven system. Hirata and Aihara [130] presented an approach to detect common driving forces from bivariate time series measurements by using counts of joint recurrence of the two systems that were in excess of what was expected from their individual recurrence rates. More recently, statistically motivated measures of interdependencies were proposed which attempt to formulate a mutual information-like quantity based on recurrences [131,132]. Here, we briefly present the measure to detect phase synchronization proposed by Romano et al. [125] and the statistical measure for interdependence proposed by Goswami et al. [131].
1. Correlation of probabilities of recurrence. The determination of the phase from the measured time series of a chaotic oscillation is a challenging task. Especially in the case when the attractor is in a non-phase-coherent dynamical regime, it is nontrivial to determine which particular combination of the state space variables would result in a reliable definition of the 'phase' of the motion, in the sense that with every time period, the phase should increase by 2π. In their study, Romano et al. [125] exploit this idea to note that for complex systems, we need to relax the condition || x i − x i+τ || = 0 (which is true for a purely periodic system with a single well defined period τ) to rather have || x i − x i+τ || ≈ 0, i.e., || x i − x i+τ || < ε, which allows us to define the function, where the normalisation N − τ ensures that the average number of recurring points for every possible period τ is considered. The function q(τ), also referred to as the τ-recurrence rate, can be considered as a kind of generalized autocorrelation function of the system which peaks at multiples of the dominant periods of the dynamics (cf. [61]). An important point here is that τ should be typically greater than the correlation timescales of the system. To estimate the extent of phase synchronisation, Romano et al. suggest to use the Pearson's cross-correlation coefficient of the q X (τ) and q Y (τ) curves from two dynamical systems {X , F X } and {Y, F Y } in combination with an appropriately selected so-called Theiler window [133] to take into account the autocorrelations of the dynamics. This was updated in a later study [134] to consider only those values τ ≥ τ 0 , where τ 0 = max τ {τ X c , τ Y c } is the maximum of the two decorrelation times for the two systems, where the decorrelation time was considered as the smallest value of τ for which the autocorrelation function is less than 1/e. Here, we suggest to consider the decorrelation time with respect to q(τ) instead of the autocorrelation function, i.e., τ c = min τ {q(τ) ≤ 1/e}. Thus, the phase synchronisation between X and Y can be obtained by, where µ and σ denote the sample mean and standard deviations, and the tilde symbol (˜) is used to denote estimates based on the τ ≥ τ 0 condition. Equation (16) is thus nothing but the sample cross-correlation coefficient of the q(τ ≥ τ 0 ) values obtained for X and Y. 2. Recurrence-based measure of dependence. Goswami et al. [131] recently proposed a statistically motivated measure of dependence based on recurrence plots. This idea was further developed by Ramos et al. [132] to include conditional dependences as well which helped to identify and remove 'common driver' effects in multivariate analyses. The so-called recurrence-based measure of dependence (RMD in Eq. 20 below) is the mutual information of the probabilities of recurrence of two dynamical systems X and Y. Consider the recurrence plot R X constructed from the measured/embedded series {x t } N t=1 : we can estimate the probability that the system X recurs to the state at time t = i as, and similarly, we get p Y i for system Y. Now, consider the joint recurrence plot [88], which encodes the joint recurrence patterns of systems X and Y by looking at those pairs of time points where a recurrence in X coincides with a recurrence Y and vice versa. The joint recurrence plot allows us to define the joint probability which encodes the joint probability that recurrences of the system X to its state at time t = i coincide with recurrences of system Y to its state at the same time t = i. The three quantities p X i , p Y i , and p X Y i allow us to define the mutual information of these probabilities of recurrence, i.e., which encodes the extent to which X and Y are non-independent. For the case where the two systems are completely synchronized, p X Y i = p X i = p Y i , which means, according to Equation (20), , the Shannon entropy of the recurrences of states of the system. If the systems are independent, RMD is zero as the joint probability p X Y i is simply the product of the two individual probabilities of recurrences. We can understand this by observing that the product in Equation (18) involves the element-wise product of two corresponding columns of the individual recurrence plots. For independent systems X and Y, this amounts to estimating the probability of getting overlapping 1 s from multiplying two binary series where the 1 s in each series have been distributed independently according to probabilities p X i and p Y i respectively, which is simply . Inferring interdependencies using recurrences. Estimates of CPR (blue), recurrence-based measure of dependence (RMD) (yellow), and PCC (red) for the coupled Rössler system (Equations (21) and (22)) for varying coupling strength µ. The shaded interval denote the interquartile ranges of the measure as obtained from 100 different initial condition choices for the coupled system. The onset of phase synchronization at µ ≈ 0.042 is recorded by CPR as it becomes almost one, but it is also recorded by RMD as it suddenly jumps from values close to zero to a plateau at around RMD ≈ 10. This plateau for RMD ends at µ ≈ 0.075; and after µ ≈ 0.102, RMD plateaus again, signaling the onset of lag synchronisation [88]. Between µ = 0.075 and µ = 0.102, RMD has a much larger spread indicating large fluctuations due to intermittent lag synchronisation. RMD shows similar results to that of JPR (cf. Figure 43C of [88]), but is easier to implement and computationally less expensive. Note that PCC dips just before the onset of phase synchronisation and has a continues increase afterwards, but is unable to detect either the precise onset of phase or lag synchronisation.
To illustrate how CPR and RMD work, we consider two mutually coupled Rössler oscillators (after [88], cf. Figure 43C and Equations (A7) and (A8) of that paper) connected bidirectionally in the first component by a coupling strength µ- where the parameters a = 0.15, b = 0.2, c = 8.5, and ν = 0.02. For the phase synchronous state shown in Figure 4, the coupling strength µ = 0.035, but in the current example, we vary the coupling strength in the interval µ ∈ [0., 0.12]. For each value µ considered, we integrate the coupled system until it reaches equilibrium behaviour, and consider the last part of the time series for the recurrence plot-based interdependence analysis, i.e. the estimation of CPR and RMD ( Figure 11). We take the first components, x 1 and y 1 , as the observed time series and embed them according to the steps in Section 6. We find that CPR begins to rise as the system approaches phase synchronisation and at µ ≈ 0.042, it hits 1 and remains at that value for all higher values of µ. RMD records a jump at the onset of phase synchrony, which is followed by a noisy plateau till µ ≈ 0.075. RMD further shows the onset of lag synchronisation at µ ≈ 0.102 after which its fluctuations plateau off again. In the interval µ ∈ (0.075, 0.102), the values of RMD show large fluctuations primarily due to intermittent lag synchronisation. In general, RMD behaves similar to the measure JPR put forth in [125] but is easier to implement and faster to compute. For comparison to CPR and RMD, we also estimate the Pearson's cross-correlation coefficient PCC, where µ and σ denote the sample mean and standard deviations of X and Y as before and N is the length of the time series. PCC dips sharply to moderate negative values before the onset of phase synchronisation but fails to detect the onset of phase and lag synchronisation (red area in Figure 11).

Detecting Dynamical Regimes Using Recurrences
Recurrence-based approaches can also be used to distinguish between various dynamical regimes. Casdagli [110] proposed a change point detection measure based on a conditional probability that estimates the likelihood of observing recurrences in two block diagonal squares whose upper-right and lower-left corners touch at the time point being tested for being a potential change point. Beim Graben and Hutt [135] propose to identify change points using a phase space partition constructed as a union of intersecting balls along the trajectory of the dynamics. Rapp, Darmon and Cellucci [59] use a similar notion and propose the 'quadrant scan' method to detect transitions, which has been recently also applied by Zaitouny et al. [58] to detect transitions in various kinds of real-world data. Iwayama et al. [136] proposed a change point detection approach based on the detection of 'communities' in recurrence networks using spectral clustering. More recently, Goswami et al. [137] use a similar notion and propose that communities can be used as an indicator of abrupt transitions in time series.
All of the above approaches have one thing in common: the idea that block structures in the recurrence matrix correspond to epochs where the trajectory is 'trapped' in a particular part of the state space. Identification of block structures, and in particular, the bottleneck time points between two consecutive blocks indicate the change point. Formally, the identification of block structures in a recurrence matrix is equivalent to identifying 'communities' in the corresponding recurrence network. 'Community' is used here in the sense of Newman's definition [138], which corresponds to groups of nodes in a network that are more densely linked to each other than the rest of the network. Below, we use the simplest form of identifying a network community, 'modularity,' to identify abrupt transitions and dynamical regimes in a given time series.
Newman and Girvan [139] put forth the notion of modularity to provide a quality criterion that allowed the inter-comparison of a set of given network partitions (i.e., a community structure) to determine the partition with the strongest community structure, i.e., the highest modularity. Given a network and a partition of the network, modularity estimates the deviation of average within-community link density of the given partition from that of a random network with the same degree sequence. For the adjacency network A, let M denote the total number of links in the network, then the modularity is where k i is the 'degree' of node i defined as the total number of links attached to node i, and c i is is an indicator function that gives the community index of node i such that δ(c i , c j ) equals 1 when c i and c j belong to the same community and zero otherwise. In order to evaluate the modularity MOD, we need to thus specify a partition of the network such that different nodes are grouped into different communities. This idea is used to construct an optimisation algorithm which, given all possible communities for a given network, returns the partition that maximizes MOD which serves as a utility function for the maximisation algorithm. There are several algorithms which carry out this optimisation in reasonable computation time. Besides modularity, which is known to have several drawbacks in detecting block structure in networks, many other methods have been proposed to identify communities in complex networks, and we refer the reader to [140] for a comprehensive overview of this topic. We demonstrate the approach with the example of a particle moving under a symmetric triple-well potential- with b = 0.1, c = 1, and a = 1 (Figure 12a). Based on the Langevin equation for a particle moving under the potential U, where σ denotes the magnitude of the stochastic Wiener process dW, we esimate the time series for the particle's location by integrating Equation (26) with the Euler-Marayama method: where dt = t n+1 − t n , and ξ ∼ N (0, 1) is a standard normal random variate. Using the fast greedy implementation in Python package for igraph [141] for detecting the communities in a given network while optimizing Newman's modularity, we obtain three distinct communities (Figure 12b) which almost always correspond to the three stable states around the three minima of the potential U.

Surrogate-Based Hypothesis Testing
All-time series analysis essentially involves the estimation of quantifiers from data and subsequent inference of dynamical characteristics. In practice, however, time series quantifiers often yield values that are intermediate and which make it infeasible to formulate an unambiguous inference from the value alone. For instance, if we obtain a value of DET = 0.68 we are unable to clearly conclude whether the underlying dynamics are deterministic (DET ≈ 1) or random (DET ≈ 0) (cf. [104]). The ambiguity of such time series quantifiers can be tackled in two different ways, which correspond to two different situations of obtaining time series.
First, consider the situation that we obtain our time series from an 'active experiment,' where we can change the parameters and initial conditions of the system and obtain further time series if necessary. In this case, we can try to identify a bifurcating parameter of the system, and change its value incrementally. We can estimate DET corresponding to each value of the bifurcating parameter, and compare them with each other (this is the situation in Figure 10). This way, we gain an understanding of which configurations of the system are more deterministic and which are less so. Active experiments allow us to explore time series characteristics on a relative scale simply by enabling us to investigate the system dynamics under different parameter configurations. The second situation is that of a 'passive experiment,' where we have only one realization of the system, and where we cannot generate a new time series for different system configurations. This is the case for most real-world time series. In the case of such passive experiments, we have to rely on the framework of statistical hypothesis testing, and we have to use 'surrogate time series' to help us carry out the tests.
Surrogate time series are randomized versions of the original time series that retain a few desired characteristics of the original dynamics. The characteristics which they retain encode a 'null hypothesis' against which we test the 'significance' of the time series characteristic. The simplest surrogate generation method is to shuffle the values of the original time series, which destroys all time-ordered dynamical information but retains the value distribution. We can use this method, for instance, to test whether the DET estimated from a time series is statistically significant under the null hypothesis: the observed determinism is explainable by a random process with a similar value distribution. A second surrogate generation technique was used in Section 4.4 ( Figure 5), where we used the Iterated Amplitude Adjusted Fourier Transformed (iAAFT) method [91] to create stochastic time series such that all dynamical structure of the input Rössler data was removed but nevertheless the power spectrum remained unchanged. In this case, the null hypothesis is: the observed dynamical characteristics (DET, SPL, and so on) are explainable by a random process with a similar power spectrum. To test the significance of measures of interdependence, such as RMD, CPR, or PCC, we typically use twin surrogates [142], which mimic trajectories from the same dynamical system albeit with random initial conditions. Twin surrogates retain both linear and nonlinear characteristics as the original time series, but by virtue of being a trajectory from a randomly chosen initial condition, the coupling is broken. The null hypothesis for twin surrogates is: the observed coupling is explainable by a randomly chosen trajectory, i.e., an independent realization, of the same dynamical system. If the observed value of CPR or RMD is not likely from a randomly chosen pair of twin surrogates, then we fail to accept the null hypothesis, indicating that the observed value of CPR or RMD are 'statistically significant', i.e., extremely less likely to have occurred simply by chance from systems that are dynamically similar. We refer the reader to [143] for a comprehensive overview of the multitude of surrogate time series generation methods that have been proposed till date.
In practice, in order to use surrogates for statistical hypothesis testing, we do the following: using a histogram function or a kernel density estimate. This distribution is known as the 'null distribution' as it is the distribution of values Q for the situation the null hypothesis is true, i.e., for whatever characteristic the surrogates preserve. 5. Using P null (Q), estimate the so-called 'p-value,' defined as the total probability of obtaining a value at least as extreme as the observed value Q orig , i.e., The p-value encodes how less likely is the observed value Q orig to be obtained from the null distribution P null . 6. Based on a chosen confidence level of the test α, determine whether Q orig is statistically significant at level α by checking whether p < α or not. When p < α, the observed value Q orig is statistically significant with respect to the chosen null hypothesis, and we fail to accept the null hypothesis, indicating that the observed Q orig is caused by characteristics other than what is retained in the surrogates. By convention, α is typically chosen at 5%, i.e., α = 0.05 or in some cases at 1%, i.e., α = 0.01. Values of α higher than 5%, such as 10%, is not recommended as the statistical evidence in such cases is rather weak. 7. In cases when there is more than one statistical test, we have to take into account the problem of multiple comparisons. This situation commonly arises in a sliding window analysis, where we divide a time series into smaller (often overlapping) sections and estimate the quantifier Q for each section. If α = 0.05, then 5% of the windows are possibly false positives. To reduce the effect of false positives, 'correction factors' such as the Bonferroni correction or the Dunn-Sidák correction are used [144]. In particular, Holm's method [145] is preferable as it does not require that the different tests be independent. The fundamental idea behind correction factors is to use a corrected value of α which is far lower than the actual reported α, thereby reducing the effective number of false positives at the reported level of confidence.
It is important to keep in mind that the final inferences are only as strong as the chosen surrogate method and the corresponding null hypothesis. If we estimate DET from a time series and test its significance against a P null (Q) generated from a simple random shuffle of the time series, then a statistically significant result can only let us conclude that the observed levels of determinism are not likely caused by an uncorrelated stochastic process with the same value distribution. However, if we observe a statistically significant value of DET against a P null (Q) generated from iAAFT surrogates, then we can make a stronger inference, viz., that the observed levels of determinism are not likely caused by a stochastic processes that possess not only the same value distribution but also the same autocorrelation structure. This would indicate that the determinism is likely due to factors other than the autocorrelation of the dynamics. To test whether an observed value of DET is significant or not, twin surrogates would not make sense because they give us dynamically similar trajectories which presumably, also have similar values of DET. To test the significance of DET against a null hypothesis which gives surrogates with similar DET values is a contradiction in itself. That said, we can, however, keep increasing the characteristics retained by the surrogates-value distribution, power spectrum/autocorrelation, mutual information, neighborhood structure, and so on-and make the inferences accordingly stronger, as long as the increase in the strength of inference is relevant to what we want to learn from the time series.

Application: Climatic Variability in the Equatorial and Northern Pacific
We apply the concepts discussed in the previous sections to the climatic variability of the Pacific Ocean, particular of the equatorial Pacific and the North Pacific basin. The sea surface temperatures (SSTs) in these two regions reportedly impact climatic conditions and weather patterns globally [146][147][148]. SSTs in the equatorial Pacific exhibit the well known El Niño southern oscillation (ENSO), which is an oscillatory behavior between anomalously warm and anomalously cold SSTs on typically quasi-biennial timescales. Anomalous warm (cold) periods in this region, are known as El Niño (La Niña) years. Here, we use the Niño 3.4 climatic index [149], constructed by averaging the monthly SST values in a pre-defined spatial box along the equatorial central Pacific roughly north of Tahiti, between 120W-170W and 5S-5N. The North Pacific basin, roughly the region north of Hawaii and contained between the Asian shoreline to the west and the northwest American shoreline to the east, exhibits a slower timescale oscillation known as the Pacific Decadal oscillation (PDO) [150]. An index for the PDO is estimated from the leading empirical orthogonal function (EOF) pattern of SST anomalies in the North Pacific basin. The positive (negative) phase of the PDO typically has a cold (warm) SST anomaly in the central part of the North Pacific and shows a corresponding warm (cold) El Niño (La Niña) like pattern in the equatorial Pacific. Although the positive (negative) phase of the PDO is similar to the positive (negative) phase of the ENSO, the two phenomenon operate on different time scales: While the ENSO phases typically last only for around 6 to 18 months, PDO SST patterns remain consistent for several decades before switching phase. The PDO is also suggested to modulate the likelihood of the positive and negative phases of the ENSO, pointing to a complex relationship between the two climatic modes [151]. We try to understand various aspects of the dynamical behavior of, and the interrelations between, the two climatic phenomena in this section. We use monthly resolved time series of Niño 3.4 index [152] and the PDO index [153] that cover a period of almost 150 years from January 1870 to November 2018.
We quantify the ENSO and PDO dynamics using DET and SPL (Figure 13a,b). We first estimate the recurrence plot of the entire time series with the following set of parameters: (i) for the ENSO index time series: τ e = 10, m e = 3, (ii) for the PDO index time series: τ e = 16, m e = 3. We then estimate DET and SPL in a sliding window of 10 years, moved ahead in steps of 1 month. The recurrence plot of each window is obtained by extracting the relevant entries for the window from the entire recurrence plot. The entire procedure is repeated for 1000 iAAFT surrogates generated from the full-time series, and the results are used to define anomalous periods of DET and SPL, identified here as those values of DET and SPL which lie outside the interquartile range of the null distributions.
We find that ENSO is typically more deterministic than the PDO (Figure 13a). Moreover, the DET time series of the two modes show a largely anti-correlated evolution in the first part of the range, ca. 1876-1910. This changes to a visibly non-stationary lagged relationship between the two with the PDO leading the ENSO ca. 1910-1950, and the ENSO leading the PDO for a brief period ca. 1950-1964. Thereafter, the lagged relationship vanishes and gives way to a largely correlated time evolution. The SPL series are also largely positively correlated, but show an anti-correlation in terms of when they are anomalously high or low with respect to the iAAFT null model (Figure 13b). This is particularly visible from ca. 1920 onwards, when most values of ENSO SPL are anomalously higher (lower) when PDO SPL values are lower (higher). This however does not hold true for the period surrounding the 1998 El Niño event, where both the ENSO and the PDO show anomalously high SPL values simultaneously.
We investigate the interrelations between the ENSO and the PDO using the recurrence-based measures of similarity CPR and RMD (Figure 13c). Using the same recurrence plot parameters as before, and the same window and step sizes, we obtain the time evolution of CPR and RMD and estimated the statistical significance (at 5% confidence) of each estimate based on a null model using 1000 twin surrogates. The statistical significance is obtained by taking into account multiple comparisons with Holm's method. We find that none of the CPR and RMD values are statistically significant after taking multiple comparisons into account. The situation does not change even if we do not take into multiple comparisons into account. We can infer from this result that -given the null hypothesis that the observed interdependence is due to the internal dynamics of the ENSO and the PDO, i.e., that a distribution of twin surrogates can explain the observed CPR and RMD valuesthere is no significant interrelation distinguishable from the time series data alone. It is likely then that most reported correlations between these two climatic systems are due to the frequent interlocking of a pair of otherwise uncoupled nonlinear oscillators.
We also look at the dynamical regimes of the two systems from the recurrence network perspective outlined earlier. After constructing the recurrence network, we use the fast greedy implementation of modularity optimization from the Python package for igraph to obtain the recurrence network communities that maximize the modularity. For both the ENSO and PDO indices, we obtain a predominantly two-community structure with a few likely stochastic variations (Figure 13d,e). The two communities effectively recover the positive and negative phases of the oscillations. Although this seems like a trivial result at first glance, it is a powerful proof of concept: we are able to recover the underlying states of the system (viz. the positive and negative phases of the ENSO or the PDO) without any knowledge of the system details, and solely from topological considerations inferred from time-series measurements. The results are striking for for both cases, in which the recurrence network community detection separates he positive and negative values of the two indices without any information about the actual values themselves and by solely utilizing the recurrence patterns encoded in the time evolution. For the PDO index, the largely warm and negative phases (marked as 'W' and 'C' respectively in Figure 13e) coincide with known phase shifts of the PDO and also explain the shifts in the determinism of the PDO in Figure 13a. The 1957The -1958 El Niños are also revealed in the midst of cold PDO phases in the PDO regime detection as belonging to the 'warm' PDO phase. This indicates that faster timescale variability of the PDO also plays an important role in modulating sea surface temperatures in the Pacific. 1850Pacific. 1860Pacific. 1870Pacific. 1880Pacific. 1890Pacific. 1900Pacific. 1910Pacific. 1920Pacific. 1930Pacific. 1940Pacific. 1950Pacific. 1960Pacific. 1970Pacific. 1980Pacific. 1990

Summary and Outlook
This review provided a brief glimpse into nonlinear time series analysis methods while focussing on recurrence plot-based approaches. The historical development of the discipline was outlined and the benefits of approaching a time series from a nonlinear perspective were motivated by showing how nonlinear processes impact our understanding of predictability, transitions, synchronization, and characterisation. A short primer on dynamical systems, attractors, bifurcations, and equilibrium dynamical behavior was given, on the basis of which the fundamental ideas of state-space reconstruction were discussed. Time delay embedding, a widely used approach of state-space reconstruction, was considered in detail and the methods to determine the time delay and dimension of embedding were described. Recurrence plots and various recurrence plot-based time series analysis methods were taken up in detail. This included measures that characterize the dynamics (DET) and the geometry of the attractor (SPL), those that quantify interdependence based on auto-correlation-like (CPR) and mutual information-like (RMD) estimates of recurrence patterns, and those that identify dynamical regimes based on geometric considerations (MOD). The concept of surrogate-based hypothesis testing was introduced and its critical role in time series analysis was motivated. Two surrogate generation techniques-the iterated amplitude adjusted Fourier transform (iAAFT) method and the twin surrogates method-were briefly discussed and the null hypothesis corresponding to each approach was stated.
We applied the presented approaches to the ENSO and PDO climatic indices that record the conditions of the sea surface temperatures in the equatorial and North Pacific regions respectively. We find that the ENSO is more deterministic than the PDO and that the two share a complex lead-lag relationship that changes with time. Around the turn of the 19-th century, from around 1876 to 1910, the determinism of the two systems shared an anti-correlated relation. The SPL of the two systems was mostly positively correlated but they were out-of-phase in terms of when they were higher or lower than the iAAFT null model. We found no statistical evidence for interrelations between the ENSO and the PDO using CPR and RMD and a twin surrogate-based null model at 5% confidence level. Finally, we used a fast greedy algorithm to find the partition of the recurrence network that maximizes MOD for both indices, and recovered the positive and negative phases of the two indices. This is a powerful proof of concept for the approach, which is able to uncover the underlying states of the system without prior knowledge about system details.
Nonlinear time series analysis evolved at a time when computational power was transforming the scientific and technological world. Since the 1980s, this field has expanded from a niche discipline to being widely used in several applications, such as finance, climate, neuroscience, ecology, and engineering. Almost forty years after Packard et al.'s paper on state-space reconstruction, we are now standing at the brink of another revolutionary moment in this discipline, where the concepts of Bayesian statistics and machine learning are being incorporated into nonlinear time series methods. Both these fields shared similar trajectories of growth in the second half of the 20-th century, parallel to the one outlined for nonlinear time series analysis in this review. The incorporation of ideas from these two disciplines will likely lead to new discoveries and deeper insights obtainable from time series. Studies such as the one by Brunton, Proctor, and Kutz [154] and by Lu, Hunt, and Ott [155] have taken first steps in this direction. The fundamental question, however, remains the same as the one in Packard et al.'s paper: How do we infer high dimensional dynamical invariants from a scalar time series?
Funding: The author is funded by the DFG project IUCliD (project no. DFG MA4759/8).
Code Availability. Codes and datasets necessary to explore the concepts discussed in this tutorial review and to generate all manuscript figures are available at: https://github.com/bedartha/nltsavibration.