1. Introduction
In recent years, machine learning technologies have been widely applied to a variety of tasks, such as speech recognition, medical diagnosis, autonomous driving, image encryption and recommendation systems [
1,
2,
3,
4]. Chaos control has always been the focus of nonlinear research, and using machine learning technology to solve this problem has gradually become a trend [
5,
6,
7,
8]. We note that usually only finite time series data from certain dynamic processes are available. Thus, this method of learning only from the data itself is called “model-free” learning. The most commonly used method for model-free learning using dynamic time series is delayed coordinate embedding, which has been well established [
9,
10,
11,
12,
13].
However, delayed coordinate embedding is too complex, and the results often fail to meet the accuracy required by the project. In 2004, the ESN proposed by Jaeger and Haas achieved impressive results in “model-free” chaotic learning tasks, which was published in Science [
14]. In addition, many researchers have subsequently applied ESNs to various chaotic learning tasks. For example, Pathak et al. used reservoir computing to perform model-free estimates of the state evolution of chaotic systems and the Lyapunov exponents [
15,
16]. Moreover, ESN can also infer the unmeasured state variables from a limited set of continuously measured variables [
17]. An ESN is very different from a traditional neural network; the difference is that an ESN only needs to train the output weight, and it overcomes the problem of gradient disappearance and explosion when the traditional neural network uses gradient descent on the weight matrix [
18]. Therefore, In the following years, many results using ESNs have emerged [
19]. For instance, adaptive reservoir computing can capture critical transitions in dynamical systems. This network has been successful in predicting critical transitions in various low-dimensional dynamical systems or high-dimensional systems with simple parameter structures [
20]. Moreover, data-informed reservoir computing, which relies solely on data to enhance prediction accuracy, not only effectively reduces computational costs but also minimizes the cumbersome hyperparameter optimization process in reservoir computing [
21].
The above results show that echo state networks can be effectively applied to chaos prediction tasks, and our goal is to achieve long-term and accurate predictions. However, chaotic systems are extremely sensitive to initial conditions, which makes long-term predictions more challenging. In the ESN structure, nonlinear activation can simulate the nonlinear relationship of the chaotic system, model data characteristics, and solve complex problems [
22,
23], so it is very important for the completion of the task.
The update process of the reservoir state largely depends on the activation function [
24,
25]. The activation function is a function of the network input, the previous state, and the feedback output. According to the reservoir update equation, the network input plays a crucial role in determining the reservoir state update. Different learning tasks involve distinct input characteristics, necessitating different reservoir update methods. However, in traditional ESN models, regardless of the characteristics of the input data, the activation function usually remains unchanged, typically using fixed nonlinear functions such as tanh or sigmoid [
26]. Additionally, when noise or interference in the training set increases, the generalization ability of the ESN may decrease [
27]. To overcome the shortcomings of traditional single activation functions, in recent years, the double activation function echo state network (DAF-ESN) [
28], the echo state network activation function based on bistable stochastic resonance (SR-ESN) [
29], and the deep echo state network with multiple activation functions (MAF-DESN) have been proposed [
30]. By linearly combining activation functions, the resulting activation function varies as the coefficients change, providing greater flexibility and adaptability than single activation functions. This enhances the network’s expressive power, allowing the model to better adapt to complex learning tasks.
Recognizing this, in order to learn the key features of spatiotemporal chaotic systems, this paper introduces the homotopy transformation in topological theory and proposes a new chaotic prediction model, called the H-ESN. Under the premise of maintaining basic topological properties, our model achieves the optimal balance between nonlinearity and linearity by continuously transforming between different activation functions and adjusting the homotopy parameter, thereby capturing the key features necessary for learning chaos. In the experimental part of this paper, Our model has been successfully applied to the following classical prototype systems in chaotic dynamics: Lorenz system, MG system, and KS system, and it has obtained the following positive results compared to other models.
With appropriately chosen parameters, the H-ESN can provide longer prediction times for various high-dimensional chaotic systems.
Under the same parameter conditions, the H-ESN demonstrates smaller prediction errors compared to other models when predicting different dimensions of chaotic systems.
Compared to traditional methods, the H-ESN exhibits significant advantages in chaotic prediction tasks, particularly in the estimation of the maximal Lyapunov exponent.
The remainder of this paper is organized as follows:
Section 2 introduces the principles and methods of the ESN and H-ESN and provides the sufficient conditions for the H-ESN to satisfy the echo state property.
Section 3 discusses the application of the H-ESN to three chaotic system examples and compares its performance with other models, achieving significant results.
Section 4 summarizes our research findings and outlines future research directions.
3. Results
We will provide three examples, the Lorenz, Mackey–Glass, and Kuramoto–Sivashinsky systems, to illustrate the advantages of using the H-ESN in predicting chaotic systems.
3.1. Lorenz System
The Lorenz system, proposed by Edward Lorenz in 1963 [
41], is a three-dimensional nonlinear dynamical system originally designed to study atmospheric convection. As a fundamental model in chaos theory, it is known for its simplicity and complex dynamics. The system’s differential equations are as follows
where
and
. The system variables
are known, and the input
is used to obtain the output weight matrix
through training. Afterward, the system enters the prediction phase for
. Taking into account the symmetry of the Lorenz equations, Equation (
2) is modified to
, where
is a vector of dimensions
in which half of the elements of
are
, with
representing the components of
. Based on this, we compare the H-ESN with other commonly used ESN models, and the results are illustrated in
Figure 3, with the parameters shown in
Table 2. Additionally, the accurate prediction data lengths for the three models on the three variables of the Lorenz system are presented in
Table 3.
According to
Figure 3 and
Table 3, in the initial stages, all three models—Deep ESN, ESN, and H-ESN—can achieve relatively accurate predictions for the three variables of the Lorenz system. However, as the number of data points increases, the prediction trajectory of the Deep ESN deviates first from the true values, with the purple dotted line diverging from the blue solid line. This is because we selected a Deep ESN with three layers, each containing 100 nodes, which results in weaker nonlinear modeling capability compared to an ESN with a single reservoir (300 nodes). Later, the predicted trajectory of the ESN also starts to deviate from the true state with an increase in data points, with the green dotted line moving away from the blue solid line. In contrast, H-ESN demonstrates a significant advantage in prediction duration compared to the other two models, achieving accurate predictions for approximately 500 data points for the three variables of the Lorenz system. For comparison, we computed the mean squared error (MSE) values of the three models for the three variables of the Lorenz system at different prediction lengths in
Table 4.
As shown in
Table 4, the MSE between the predicted values at 300, 350, 400, 450, and 500 data points and the true values of the three variables of the Lorenz system were calculated separately. Additionally, in
Table 5, the average MSE percentage improvement of the H-ESN over the ESN for the three variables of the Lorenz system was calculated based on Equation (
9). It can be concluded that the MSE value for the H-ESN model is minimal at different prediction stages, indicating that the model proposed in this paper achieves the best performance in this prediction task.
In chaotic prediction tasks, the focus is on the duration and accuracy of predictions. Effective prediction time (EPT) is an important metric for evaluating the performance of time series prediction models. It refers to the limited period during which accurate predictions can be made using the model in chaotic scenarios. This period is finite because chaotic systems are extremely sensitive to initial conditions, leading to significant uncertainty in long-term predictions. In this paper, the effective prediction time is defined as . The prediction is considered invalid at time t, when the prediction error exceeds the set threshold , that is, when ( is a given error).
The parameter
is a very important hyperparameter, and its selection significantly affects the system’s predictive performance.
Figure 4 shows the EPT for the three variables under different values of
. Overall, When
is small, the EPT tends to decrease because the nonlinearity is too strong, making it difficult for the network to train or generalize effectively. When
is large, the EPT also tends to decrease due to excessive linearization, causing the network to fail to capture the key dynamics of the chaotic system. However, after reaching the intermediate region at
, the EPT starts to increase and reaches its maximum value, as the balance between nonlinearity and linearity is optimized. To ensure that the three variables have a longer prediction duration, it is recommended to choose a
value of approximately 0.7.
The H-ESN introduces a linear component () through homotopy transformation, finding the optimal balance between nonlinearity and linearity. However, the value of generally varies for different chaotic systems. Currently, we primarily determine the value of through grid search or empirical tuning. While effective, this method can be computationally expensive when dealing with high-dimensional or complex systems. Finding the optimal value of quickly and efficiently is a major challenge faced by the H-ESN.
3.2. Mackey–Glass Equation
The Mackey–Glass (MG) equation is a commonly used delay differential equation used to model complex dynamic behaviors in biological and physical systems with time delays, especially in biology and ecology [
42]. Its standard form is as follows
where
= 0.2,
= 0.1,
= 17, and
n = 10. The above equation is numerically solved using the Euler method to obtain the chaotic time series of the MG system. The first 2000 data points are used as the training set, and the next 1000 data points as the testing set.
Figure 5 shows a comparison of the prediction performance of the ESN and H-ESN on the MG time series, with the parameters listed in
Table 6.
As shown in
Figure 5, both models can make good short-term predictions for the MG time series. The ESN can accurately predict 533 data points, but it fails to accurately predict the peak information in the time interval between 533 and 800 time steps. On the other hand, the H-ESN can predict for 1000 time steps and is capable of capturing the peak information effectively, indicating that the H-ESN has a clear advantage in predicting the MG time series.
The hyperparameters in ESN have a significant impact on the prediction performance of chaotic systems. The following analysis examines how the MSE between the predicted and true values changes with the spectral radius over the first 500 time steps. Overall, as the spectral radius increases, the MSE decreases, and the H-ESN shows higher prediction accuracy compared to the other two models. For small values of , the prediction accuracy already reaches , and the MSE value reaches when = 1.3.
In addition to the spectral radius
, which affects the prediction accuracy of the model, the number of reservoir nodes
also plays a crucial role in chaotic prediction. The size of
directly influences the complexity of the state space that the network can represent. Generally speaking, the larger the number of reservoir nodes, the more dynamic and complex patterns the network can capture.
Figure 6 illustrates how the MSE between the predicted and true values changes with
for two different spectral radii. It can be observed that when
= 1.2 and
= 950, the corresponding MSE is minimized, and the prediction accuracy reaches
. Furthermore,
Figure 6 shows that for different reservoir sizes, for
= 300, the MSE corresponding to
= 1.2 is smaller than that corresponding to
= 1.25, which is the opposite trend compared to the one observed for H-ESN in
Figure 7. This indicates that different reservoir sizes have a significant impact on the prediction ability of the H-ESN.
3.3. Kuramoto–Sivashinsky Equations
Now consider a modified version of the Kuramoto–Sivashinsky(KS) system defined by the following partial differential Equation [
43]
If , this equation is reduced to the standard KS equation, and if , the cosine term makes the equation spatially inhomogeneous. We will focus on the case where below.
We take into consideration the fact that the KS system has periodic boundary conditions at , that is, , and the KS equation is numerically integrated into a uniformly spaced grid of size Q. The simulated data consists of Q time series with time step , represented by vector , where , .
Considering that the Kuramoto–Sivashinsky equation has high dimensional spatiotemporal chaos and a certain symmetry, we modify Equation (
2) by analogy with the Lorenz system. After the training stage
, the system uses Tikhonov regularization regression to obtain
. When the output parameters are determined, the system enters the prediction stage
and independently evolves according to
Figure 1b.
As shown in
Figure 8, the ESN model achieves the prediction of 7
steps for the KS system, while the H-ESN model can predict up to 12
steps, in almost twice the duration taken by the ESN model. In terms of prediction accuracy, the error panel of the H-ESN model is close to 0 during the early prediction stages. For comparison, we have plotted the root mean square error (RMSE) of both models at different prediction dimensions
Q in
Figure 9. The RMSE values of the H-ESN are consistently below 0.15, with relatively small fluctuations. In contrast, the RMSE values of the ESN model are mostly above 0.15, with a significant difference between the maximum and minimum values. Therefore, the H-ESN model is more accurate in prediction and also more stable in terms of prediction performance.
The most important characteristic of chaotic dynamics is their extreme sensitivity to initial conditions. In chaotic systems, long-term predictions of the system’s state are impossible, as even the smallest errors will exponentially amplify, quickly eroding predictive capability. In predicting chaotic systems, not only is it necessary to optimize the hyperparameters to extend the effective prediction time of various variables, but it is also essential to evaluate the prediction accuracy in terms of the system’s inherent chaotic characteristics. The maximum Lyapunov exponent (
) is a key metric for measuring the chaotic nature of dynamic systems. It evaluates whether the system exhibits a chaotic behavior by quantifying the rate of divergence of nearby trajectories in the phase space. By comparing the
of the KS system, as shown in
Table 7, it can be observed that the
estimated from the predicted data of the H-ESN model is closest to the true
of the KS system, with a difference of 0.0001. In contrast, the
obtained from the predicted data of the ESN model deviates from the true value by 0.0017. This difference indicates that the H-ESN model demonstrates a stronger ability to capture the chaotic characteristics and sensitivity of the system. Particularly at longer time scales, the H-ESN is able to more accurately reflect the system’s dynamical behavior, highlighting its advantages in modeling complex dynamical systems.
Chaotic systems are often disturbed by noise in practical applications, which can significantly reduce the performance of prediction models. To verify the robustness of the H-ESN under noisy conditions, we added Gaussian noise with varying intensities (noise levels of 0.01, 0.02, and 0.03) to the KS system, simulating real-world measurement errors. The experiment was conducted with the parameters listed in
Table 8, aiming to evaluate the performance of H-ESN under noise conditions.
As shown in
Figure 10, when the Gaussian noise intensity is 0.01, the H-ESN can still maintain good predictive capability, with a prediction duration reaching 6
. However, as the noise intensity increases, the predictive capability of the H-ESN gradually declines. This indicates that in low-noise environments, the H-ESN can effectively handle noise interference and maintain high prediction accuracy; however, under high-noise conditions, the impact of noise on model performance becomes more pronounced.