On the Optimization of Machine Learning Techniques for Chaotic Time Series Prediction

: Interest in chaotic time series prediction has grown in recent years due to its multiple applications in ﬁelds such as climate and health. In this work, we summarize the contribution of multiple works that use different machine learning (ML) methods to predict chaotic time series. It is highlighted that the challenge is predicting the larger horizon with low error, and for this task, the majority of authors use datasets generated by chaotic systems such as Lorenz, Rössler and Mackey–Glass. Among the classiﬁcation and description of different machine learning methods, this work takes as a case study the Echo State Network (ESN) to show that its optimization can lead to enhance the prediction horizon of chaotic time series. Different optimization methods applied to different machine learning ones are given to appreciate that metaheuristics are a good option to optimize an ESN. In this manner, an ESN in closed-loop mode is optimized herein by applying Particle Swarm Optimization. The prediction results of the optimized ESN show an increase of about twice the number of steps ahead, thus highlighting the usefulness of performing an optimization to the hyperparameters of an ML method to increase the prediction horizon.


Introduction
There are a wide variety of natural phenomena in science and engineering applications that exhibit chaotic behavior, such as weather [1][2][3], turbulent flows [4], reacting flows [5,6], health-related pathologies [7], etc. All these phenomena are known for their complexity as they are modeled by all the variables involved, and the evolution of the time series is highly sensitive to the initial conditions. Due to this chaotic characteristic, the prediction of the future behavior of the time series becomes quite difficult, even by applying Machine Learning (ML) methods, which can predict future data from known data without the need to use a mathematical model [8].
Among the ML methods that have been applied to predict the evolution of chaotic time series, those related to neural networks have shown good prediction capabilities. For instance, Recurrent Neural Networks (RNNs) were developed to perform tasks related to data prediction [9], their introduction also improved the Feed Forward Neural Networks (FFNN), which are traditionally more used for classification and regression problems. From its introduction, a lot of work has been performed using RNN for chaotic time series prediction [10][11][12][13][14][15][16]. However, RNNs have well-known drawbacks, such as a complicated training process, large amount of calculation, and slow convergence [17,18]. In order to overcome these disadvantages, and to enhance the time series prediction, the Echo State Network (ESN) [19,20] and Liquid State Machine (LSM) [21] appeared, but still, the challenge of predicting a large horizon with low error remains.ESNs are one of the most used networks in the prediction of chaotic time series due to their good results in this field, they have a low computational cost and, in addition, their training is relatively simple compared to other RNNs. One way to improve the prediction horizon by applying ML methods is by performing an optimization process, in which the main question is: how to select the hyperparameters, the number of neurons, the number of layers, etc., to maximize the prediction horizon given a ML method? As one can anticipate, there is not an exact answer to this question, but there are different recommendations to determine these parameters and it depends on the ML method, the type of time series to be predicted, and the limits related to the computational cost.
The optimization of ML methods is not a trivial task, but nowadays, different works have shown the usefulness of applying Evolutionary Algorithms (EAs) [22], which are inspired by natural evolution to find the best fitness individuals [23]. In general, EAs can be divided into two main categories, namely: swarm intelligence optimization algorithms and genetic evolution algorithms. Some swarm intelligence algorithms are: Ant Colony Algorithm [24], Firefly Algorithm [25], Cuckoo Search Algorithm [26], and Particle Swarm Optimization (PSO) algorithm [27], among others. These optimization techniques have been widely used to choose the hyperparameters of different ML methods to increase the time series prediction horizon. In this work, we list ML methods in the state of the art to predict chaotic time series and highlight how the optimization of the hyperparameters is vital to achieve a larger prediction horizon. The case study considered in this work is the application of PSO to optimize an ESN in closed-loop mode, whose main goal is devoted to increasing the prediction horizon, resulting in a little more than double.
In the following sections, one can find more details of the application of optimization algorithms to enhance ML methods for chaotic time series prediction. Section 2 summarizes the description of chaotic systems, and it lists works that have applied ML methods for the prediction of chaotic time series. Section 3 presents some works about the optimization of ML methods for the prediction of chaotic time series as well as the main fitness functions used. Section 4 provides a case study and focus on the optimization of an ESN applying PSO to predict the time series of the chaotic Lorenz system. Section 5 contains a brief discussion of the time series prediction results and makes a comparison with related works. Finally, the conclusions are given in Section 6.

Chaotic Systems and Time Series Prediction by ML Methods
This section includes two subsections: the first one is devoted to describing the most common chaotic systems, providing their mathematical equations, parameters and attractors in the phase-space; the second subsection describes related works on machine learning methods that are used for the prediction of chaotic time series and their predicted steps-ahead.

Chaotic Systems
According to the definition given by Strogatz [28], a chaotic time series exhibits longterm aperiodic behavior, is deterministic, and is sensitive to initial conditions. Aperiodic in the long term means that the path of the time series will not converge to a fixed, periodic, or quasi-periodic point in infinite time. Deterministic indicates that the system does not have random or noisy inputs or parameters but that its irregular behavior arises from the non-linearity of the system; in addition, it is sensitive to initial conditions, meaning that a millionth change in the initial conditions will cause the trajectories to eventually diverge. The first chaotic system was described by Lorenz in 1963 [29], and it was derived from the simplified equations of the convection rolls that occur in the dynamic equations of the Earth's atmosphere. From that time to now, the Lorenz system is one of the most studied chaotic systems. Other chaotic systems that have been developed and highly studied are: the Rössler system [30], Lü system [31], Chen system [32], and Chua's circuit [33]. These chaotic systems are described by just three ordinary differential equations (ODEs), as shown in Table 1, where one can appreciate the parameter values and attractors in the phase plane. During the last years, neural models have also been developed to exhibit chaotic behavior such as the Hindmarsh Rose neuron [34], Huber Braun [35], Cellular Neural Network [36], Hopfield Neuron [37], and so on. All these chaotic systems are continuous, and their solution can be obtained by solving the ODEs applying numerical methods.

Chaotic Time Series Prediction by ML Methods
The prediction of chaotic time series is a complex task due to characteristics such as its aperiodic behavior and the high sensitivity to initial conditions. In this manner, the prediction of chaotic time series can be defined as a task where temporal correlations must be learned, this is because the inputs and outputs are ordered sequentially, as shown in Figure 1. That is, they are temporally correlated, and it results that RNNs [38] were developed to learn temporal correlations. Figure 1 sketches the prediction of a chaotic time series, where the red dotted line indicates the forecast horizon reached, and how the predicted time series diverges from the target series.
The main goal in the prediction of chaotic time series is devoted to increasing the prediction window (either by increasing the Lyapunov times or the number of steps ahead) [39][40][41][42]. For instance, the prediction horizon described by (1) can be calculated in the time interval during which the normalized error is less than a threshold k [43][44][45], where y target is associated to the data to predict y(i), the predicted data, and HP, the prediction horizon. In the particular case of applying an ML method based on neural networks as RNN or ESN, the challenges are how to reduce the computational cost, the number of neurons [46], proposing different internal connections [47,48], and how to reduce the prediction error. Usually, the prediction is generally performed to estimate one step ahead or very few steps ahead, and it is sought to have the lowest possible error [49]. In the majority of cases, the prediction error is evaluated as the root mean squared error (RMSE), and it can take different magnitudes to validate the predicted steps ahead. Table 2 summarizes some relevant works for the prediction of chaotic time series, where λ −1 max represents the Lyapunov times that correspond to the inverse of the maximum Lyapunov exponent of the system. On this issue, it is well known that chaotic behavior can be generated from a mathematical model having at least three ODEs, so that one can evaluate three Lyapunov exponents (one negative, one close to zero and one positive). Systems with more than three ODEs can generate hyperchaos if they have more than one positive Lyapunov exponent. The maximum Lyapunov exponent is then a reference to indicate chaotic behavior and to calculate Lyapunov time, which sometimes is used to measure the prediction window. However, as shown in Table 2, the majority of authors do not report which Lyapunov exponent they use. In the same Table 2, it can be appreciated that the most used chaotic systems are: Lorenz, Rössler and Mackey-Glass, for which the different ML methods predict from 1 to 1000 steps ahead. With respect to the ML method, one can appreciate that ESN and some variants of it have been the most used. For this reason, this paper shows the optimization of an ESN by applying PSO to enhance the prediction horizon.

Optimization of ML Methods for Predicting Chaotic Time Series
Generally, when optimizing ML methods to enhance the prediction of chaotic time series, the main focus is to reduce the error in the prediction, so that one can establish the number of steps ahead to be predicted. Let us see some examples: In [57], Lin and Chen optimized a FLNFN (Functional Link-based Neural Fuzzy Network) using a hybrid algorithm consisting of PSO and a cultural algorithm, where the functional expansion in the model can produce the consequent part of a non-linear combination of input variables. The authors used Mackey-Glass time series and forecasted the number of sunspots with the goal of reducing the prediction error by optimizing the fuzzy rules and their relationship with the inputs, but the prediction was only one step ahead. Another work focused on reducing the prediction error by predicting few data is [49], in which a modified version of the Cuckoo Search algorithm was used to optimize a Wavelet Neural Network (WNN) to predict Lorenz time series, showing an improvement of at least 73% over a conventional randomly initialized WNN. Another clear example is given in [58], where Cooperative Coevolution was used to optimize an Elman RNN using time series from Mackey-Glass and Lorenz systems. Other examples are summarized in Table 3, where one can see optimization algorithms applied to ML methods to predict chaotic time series. The optimization algorithms given in Table 3 use different error metrics, such as: MSE, RMSE, and MAE described in (2)-(4), respectively [65]. Other recent works, as [66] prefer to introduce a fitness function, such as the one described in (5).

Optimizing an ESN by PSO to Enhance Time Series Prediction Horizon
As shown in Table 2, one of the most used ML methods for the prediction of chaotic time series is the Echo State Network (ESN) one. It consists of three layers: the input layer, the hidden layer and the output layer [67]. The hidden layer contains N interconnected neurons with randomly generated weights represented by a matrix W.ESNs have two prediction modes, closed-loop and teacher-forced; in the first mode, the predicted data are used to feedback the network and make the prediction of new data; therefore, a cumulative error is presented, and the predicted data will diverge from the target data; the second makes the prediction of data but it is not used to feedback the ESN, for the prediction of the next data, the point of the test set is taken as input. This implies that only one datum is predicted, and the prediction will not diverge from the target data. Figure 2 shows the closed-loop representation and Figure 3 shows the teacher-forced representation.   Equations (6)-(8) describe the main parameters of the ESN [68], which are mentioned below.

Output Layer
where x(t) represents the states of the neurons, W is the matrix of internal connections, W in is the input matrix, W out is the output matrix, which is interpreted as the trainable parameter, u(t) is the input, Y target is the data to be learned , y(t) is the predicted data, and the rest of the parameters are mentioned below.
• Leaking Rate (a): This parameter is associated with leaky integrator ESNs (LI-ESNs) [69]. These are ESNs whose reservoir neurons perform leaky integration of their activations from past steps of time. • Spectral Radius (SR): It is described as the maximum absolute eigenvalue of the reservoir weights (W). It is recommended that this parameter be between (0, 1) to ensure the echo state property [70]. • Reservoir Size (N): The reservoir size N represents the number of neuron units within the reservoir. It is a very crucial parameter, since it decides the maximum number of possible connections within the reservoir (N 2 ) [71]. Jaeger [71] has suggested that N be in the range ( T 10 ≤ N ≤ T 2 ) with T as the length of training data. • Reservoir Activation Function: For the ESN, the reservoir activation is a non-linear function. In most works, the function of choice has been the tanh(.) or positive logistic sign(.) [72]. • Regularization parameter of ridge regression (RR): Regularization is often aimed at reducing the noise sensitivity of the network and also to prevent overfitting [73].
The case study herein is the prediction of chaotic data from the Lorenz system, whose mathematical model is given in Table 1. To improve the prediction, it is recommended to normalize or scale the amplitudes of the state variables to be within the range of [−1. . .1]. For the time series prediction, we use 5000 data for training, an ESN with 500 neurons ( 1 10 T), SR = 1.25, a = 0.5, RR = 1 × 10 −8 , the matrices W in and W were randomly generated, and the matrix W was re-scaled according to the spectral radius.
To optimize the ESN with the parameters described above, we apply PSO, which is inspired from the swarming behavior of certain animals such as fishes and birds. The initial population is generated in a specific space. Each particle p is marked by a pair of position and velocity (x i , v i ), and it must be updated according to Equations (9) and (10). Then, the particles swarm flies throughout the search space. Every particle i moves according to its corresponding v i vector. At each time step, the solutions quality is evaluated according to a fitness function or objective function [74,75]. The general process to optimize an ESN in the chaotic time series prediction task is described below in Algorithm 1. In this case study, we have no constraints.
Algorithm 1 Optimization of an ESN to predict the Lorenz system with PSO.
1: Initialize the first particle of the population with known parameters, and the rest randomly (x). 2: Initialize the velocity of the particles v. 3: for (counter = 1; counter ≤ G; counter + +) do 4: for (i = 1; i ≤ N p ; i + +) do 5: for (j = 1; j ≤ D; j + +) do 6: For each set of particles (p), train the ESN, re-scaling W matrix according to the corresponding new spectral radius. 7: Compute W out for each set of (p). 8: Predict the Lorenz system time series with each set of (p). 9: Calculate the fitness function, MSE between the predicted and target data. 10: Find the best value from p and save it in g 11: Evaluate the new velocity using (9). 12: Evaluate the new position using (10). 13: end for 14: if f x is better than score i then 16: score i ← f x 17: if p i is better than g then 19: g ← p i 20: end if 21: end if 22: end for 23: end for 24: return x, p, g and score Although there are general recommendations to select the hyperparameters of the ESN, it is still a design problem; therefore, these parameters must be adjusted according to each problem. For this reason, we decided to optimize the hyperparameters SR, a, RR, to find the best values that allow having a lower MSE between the predicted data and the target data. For PSO, we established a population of 30 particles, 10 generations, and 3 variables to optimize in the following ranges: SR = [0.01:1.5], a = [0.01:1], RR = [1 × 10 −10 :1 × 10 −5 ]. Table 4 shows five solutions in which MSE was reduced with respect to the original parameters.  Figure 4 shows the time series prediction of the Lorenz system with the first set of optimized ESN parameters. Compared to the prediction results shown in Figure 1, where the ESN is not optimized, one can see that the prediction horizon doubles, so that there is an increase from 540 to 1240 steps ahead.

Discussion
An ESN is a Machine Learning method that has been widely used in the prediction of chaotic time series due to its good results in this task, low computational cost and easy training compared to other recurrent neural networks such as LSTM. However, like many neural networks, they have hyperparameters that must be set before training. In the case of ESNs, the main hyperparameters are the number of neurons, the leaking rate, the spectral radius and the regression coefficient, and although the authors in [70] recommend some ranges of values, there is not an exact way to find them. Starting from the previous premise, the optimization techniques allow finding the set of hyperparameters with which a greater prediction horizon can be obtained. In this manner, we have shown that the use of the PSO algorithm, which is one of the classics in the literature, allows one to optimize the hyperparameters of an ESN in closed-loop mode. This issue also shows that the values of the hyperparameters must be correctly selected to reach a large prediction horizon of chaotic time series. The time series prediction was performed by executing different experiments with different tests, showing that in all of them, the MSE error was reduced, and the prediction was greatly improved with respect to using an ESN without optimization. The different tests provided, on average, an increase in the prediction horizon of a little more than double. As one can infer, there are other optimization algorithms that can be used for the ESN; however, one must be aware that the use of other optimization algorithms does not guarantee that the prediction horizon will increase. The application of an optimization algorithm in this case is just devoted to the search for the ideal combination of hyperparameters, not for improving an ESN topology or its composition, which should imply a much more robust and complex design problem.
In the state of the art, one can find several optimization algorithms that have been applied to optimize an ESN to predict chaotic time series. For instance, Table 5 shows different evolutionary algorithms, and among them, one can see the application of PSO to optimize the ESN using different error values for the fitness functions and for different lengths of the data test. In such a table, one can see our results compared to published works, so that for chaotic time series of the Lorenz system, we reach the lowest error using 1000 data. It may be difficult to perform a comparison between the results obtained in our experiments and the results reported in the literature. This is because there are two types of prediction that can be made with an ESN: the closed-loop and teacher-forced. In the first mode, the predicted data are used to feedback the network, which causes the errors to accumulate and eventually diverges the predicted data from the target data. In the second, or teacher-forced mode, the predicted data are not fed back to the ESN, but the input comes directly from the data of the test set, which implies that only one datum is predicted. From Table 5, one can see the size of the data test for each case, but the authors do not mention what type of prediction they make. For example, in [66], Zhang et al. report a very small error in the time series prediction, but they do not specify what type of prediction is performed. The prediction mode is quite important, since using teacher-forced, there is not a cumulative error problem and generally the prediction will not diverge from the target data. In this manner, our contribution relies on the use of the closed-loop prediction mode, since the main goal of this work was focused on increasing the prediction horizon for chaotic time series, and this has been performed with the help of an optimization algorithm, i.e., PSO, to accomplish the correct selection of the hyperparameters of an ESN.

Conclusions
Although a great variety of ML methods have been used for the prediction of chaotic time series, this work showed that ESN is one of the most used. The optimization of ML methods, as those based on neural networks such as ESN, focus on finding the best values of the hyperparameters that minimize the prediction error. However, it is necessary to determine which parameters should be optimized for each problem. It should also be considered that EAs are metaheuristics, and they have their own parameters that must be carefully selected, such as the size of the population and the number of generations, among others. Another consideration is the fitness function that is chosen. Generally, some measure of error between the predicted values and the target values is used, but a good alternative would be to use measures such as maximizing the prediction horizon. Finally, it is worthwhile to mention a very interesting aspect when optimizing an ESN for the prediction of chaotic time series: the type of prediction that is made, which can be in closed-loop or teacher-forced mode. The first mode should be used when seeking to increase the prediction horizon, and the second can be used when the goal is to reduce the error in the prediction; however, usually, this is not specified in the works. To verify the importance of the selection of hyperparameters in a machine learning method for forecasting chaotic time series, we used PSO to optimize a closed-loop ESN, which allowed us to increase the forecast horizon around the double compared to a non-optimized ESN.