A TCN-Linear Hybrid Model for Chaotic Time Series Forecasting

The applications of deep learning and artificial intelligence have permeated daily life, with time series prediction emerging as a focal area of research due to its significance in data analysis. The evolution of deep learning methods for time series prediction has progressed from the Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN) to the recently popularized Transformer network. However, each of these methods has encountered specific issues. Recent studies have questioned the effectiveness of the self-attention mechanism in Transformers for time series prediction, prompting a reevaluation of approaches to LTSF (Long Time Series Forecasting) problems. To circumvent the limitations present in current models, this paper introduces a novel hybrid network, Temporal Convolutional Network-Linear (TCN-Linear), which leverages the temporal prediction capabilities of the Temporal Convolutional Network (TCN) to enhance the capacity of LSTF-Linear. Time series from three classical chaotic systems (Lorenz, Mackey–Glass, and Rossler) and real-world stock data serve as experimental datasets. Numerical simulation results indicate that, compared to classical networks and novel hybrid models, our model achieves the lowest RMSE, MAE, and MSE with the fewest training parameters, and its R2 value is the closest to 1.


Introduction
Chaotic research constitutes an interdisciplinary field [1], encompassing theories of dynamical systems, nonlinear dynamics, and complex systems [2].The significance of chaos theory lies in its elucidation of non-periodic behaviors and unpredictable characteristics within numerous systems [3].Notably, chaotic systems demonstrate extreme sensitivity to initial conditions, where minor initial discrepancies can lead to significant deviations in system trajectories, resulting in long-term unpredictability while retaining short-term predictability [4].This phenomenon marks chaos as a key feature of complex system behaviors, offering new perspectives for our understanding and analysis of these systems [5].
Time series forecasting involves predicting future outputs using historical information and future input signals [6], reflecting the dynamic changes of the system in the future, and holds broad research prospects [7].In the classical forecasting domain, mathematical models with rigorous derivations offer good interpretability; however, unknown parameters in the system increase the difficulty of modeling [8].Linear or singledegree-of-freedom dynamic systems are easier to predict, whereas, due to their sensitivity to initial conditions-characteristic of chaos-chaotic time series are more challenging to forecast [9].
The advent of data-driven modeling techniques has posed challenges for researchers, while the evolution of neural networks, particularly deep learning, bolstered by advancements in computer hardware, has opened new avenues for automated data analysis.Amaranto et al. [10], for instance, developed B-AMA (Basic dAta-driven Models for All), a flexible and easy-to-use tool for both non-expert users and more experienced developers.As for deep learning, traditional Convolutional Neural Networks (CNNs) [11] have Entropy 2024, 26, 467 3 of 15 causal convolutions within the TCN network, thereby expanding the model capacity of LSTF-Linear and enhancing its specialized learning capabilities for diverse time series data.It circumvents the gradient issues associated with RNNs and the loss of temporal scale information attributed to the attention mechanisms in Transformers, achieving a higher prediction accuracy with a reduced parameter count for training.The remainder of the paper is organized as follows.Section 2 introduces the theoretical underpinnings and the development of the model.Section 3 describes the experimental setup and results.Section 4 concludes the work and offers perspectives on future research.

Proposed Model 2.1. TCN
Bai et al. [37] introduced a novel Temporal Convolutional Network (TCN) that adapts convolutional networks for the processing of time series data.This model leverages a proposed causal convolution method to capture local dependencies within sequence data, ensuring temporality, and employs dilated convolutions to expand its receptive field for better learning of data correlations.Additionally, it utilizes convolutional operations for efficient parallel computation, making it suitable for large-scale data processing.

Causal Convolution
Causal convolution, as shown in Figure 1, is one of the core concepts of TCN.To ensure that convolution operations only utilize past information, causal convolution employs zeropadding at the beginning of the sequence.This technique ensures that the output at each time step is influenced solely by that point and its preceding inputs.Such an approach prevents forward leakage of information and maintains temporal alignment between the input and output sequences.
Building on the analysis provided, we introduce a novel hybrid neural network architecture, the TCN-Linear model.This model harnesses the advantages of dilated and causal convolutions within the TCN network, thereby expanding the model capacity of LSTF-Linear and enhancing its specialized learning capabilities for diverse time series data.It circumvents the gradient issues associated with RNNs and the loss of temporal scale information attributed to the attention mechanisms in Transformers, achieving a higher prediction accuracy with a reduced parameter count for training.The remainder of the paper is organized as follows.Section 2 introduces the theoretical underpinnings and the development of the model.Section 3 describes the experimental setup and results.Section 4 concludes the work and offers perspectives on future research.

TCN
Bai et al. [37] introduced a novel Temporal Convolutional Network (TCN) that adapts convolutional networks for the processing of time series data.This model leverages a proposed causal convolution method to capture local dependencies within sequence data, ensuring temporality, and employs dilated convolutions to expand its receptive field for better learning of data correlations.Additionally, it utilizes convolutional operations for efficient parallel computation, making it suitable for large-scale data processing.

Causal Convolution
Causal convolution, as shown in Figure 1, is one of the core concepts of TCN.To ensure that convolution operations only utilize past information, causal convolution employs zero-padding at the beginning of the sequence.This technique ensures that the output at each time step is influenced solely by that point and its preceding inputs.Such an approach prevents forward leakage of information and maintains temporal alignment between the input and output sequences.

Dilated Convolution
Dilated convolution represents another crucial component within TCN, serving as an extension of traditional convolution aimed at enlarging the receptive field of the convolutional layers.This enlargement enables the network to capture temporal dependencies over longer ranges.In dilated convolution, zeros are inserted between the elements of the convolution kernel (i.e., the dilation rate), allowing the network to cover a larger input area without an increase in the number of parameters or the computational complexity; the structure of dilated convolution is shown in Figure 2.
volutional layers.This enlargement enables the network to capture temporal dependencies over longer ranges.In dilated convolution, zeros are inserted between the elements of the convolution kernel (i.e., the dilation rate), allowing the network to cover a larger input area without an increase in the number of parameters or the computational complexity; the structure of dilated convolution is shown in Figure 2.With the increase in network depth, the dilation rate can be progressively increased, enabling deeper convolutional layers to possess an extensive receptive field.Consequently, the network can effectively learn long-term temporal dependencies without sacrificing temporal resolution.

Residual Connection
TCN mitigates the effects of gradient vanishing and explosion in deep networks to some extent.This model introduces straightforward direct connection channels, allowing the network to learn identity mapping, as shown in Figure 3.This ensures that the performance of deep networks does not degrade more than that of their shallower counterparts, and prevents the initial data weight increase caused by dimension changes during the input processing phase.With the increase in network depth, the dilation rate can be progressively increased, enabling deeper convolutional layers to possess an extensive receptive field.Consequently, the network can effectively learn long-term temporal dependencies without sacrificing temporal resolution.

Residual Connection
TCN mitigates the effects of gradient vanishing and explosion in deep networks to some extent.This model introduces straightforward direct connection channels, allowing the network to learn identity mapping, as shown in Figure 3.This ensures that the performance of deep networks does not degrade more than that of their shallower counterparts, and prevents the initial data weight increase caused by dimension changes during the input processing phase.
input area without an increase in the number of parameters or the computational complexity; the structure of dilated convolution is shown in Figure 2.With the increase in network depth, the dilation rate can be progressively increased, enabling deeper convolutional layers to possess an extensive receptive field.Consequently, the network can effectively learn long-term temporal dependencies without sacrificing temporal resolution.

Residual Connection
TCN mitigates the effects of gradient vanishing and explosion in deep networks to some extent.This model introduces straightforward direct connection channels, allowing the network to learn identity mapping, as shown in Figure 3.This ensures that the performance of deep networks does not degrade more than that of their shallower counterparts, and prevents the initial data weight increase caused by dimension changes during the input processing phase.

LSTF-Linear
LSTF-Linear is a simple direct multi-step model that operates via a temporal linear layer.The fundamental approach of LTSF-Linear employs a weighted sum operation to directly predict future values by regressing on historical time series data (as illustrated in Figure 4).The mathematical expression is Xi = WX i , where W ∈ R T×L is a linear layer along the temporal axis.Xi and X i are the prediction and the input for each i variate.
LSTF-Linear is a simple direct multi-step model that operates via a temporal linear layer.The fundamental approach of LTSF-Linear employs a weighted sum operation to directly predict future values by regressing on historical time series data (as illustrated in Figure 4).The mathematical expression is   ̂=   , where  ∈ ℝ × is a linear layer along the temporal axis.  ̂ and   are the prediction and the input for each  variate.Specially, D-Linear (shown in Figure 4) is a hybrid of a seasonal decomposition encoder-decoder and the Linear network which decomposes the original data into seasonal and trend components through a moving average kernel.Subsequently, each component is processed using a single linear layer, and the output results are summed to obtain the final prediction.This strategy enhances the model's performance when the data exhibit clear trends, which coincidentally can be identified within the phase diagrams of chaotic systems.

TCN-Linear
To further improve model capacity and prediction accuracy, a new hybrid model for chaotic time series prediction is proposed in this paper, named TCN-Linear, which is shown in Figure 5.We tried to improve the structure of D-Linear by fusing it with TCN.This model is constructed with several Residual Block modules and a D-Linear network, where each Residual Block contains two Dilated Causal Conv, two WeightNorm layers, two ReLu layers, and two Dropout layers.In the Dilated Causal Convolution within these blocks, the dilation factor (d) is set to values in the set {1,2,4} and the output of each current layer will serve as the input of the next layer.Finally, the prediction is output by the combination of the Decomposition scheme and the Linear layers.Specially, D-Linear (shown in Figure 4) is a hybrid of a seasonal decomposition encoder-decoder and the Linear network which decomposes the original data into seasonal and trend components through a moving average kernel.Subsequently, each component is processed using a single linear layer, and the output results are summed to obtain the final prediction.This strategy enhances the model's performance when the data exhibit clear trends, which coincidentally can be identified within the phase diagrams of chaotic systems.

TCN-Linear
To further improve model capacity and prediction accuracy, a new hybrid model for chaotic time series prediction is proposed in this paper, named TCN-Linear, which is shown in This hybrid architecture is designed to address the complexities of time series d that exhibit both long-term dependencies and seasonal patterns, making the model ve tile across different time series forecasting tasks.Furthermore, TCNs offer efficient pa lel computation, significantly reducing both the number of training parameters and This hybrid architecture is designed to address the complexities of time series data that exhibit both long-term dependencies and seasonal patterns, making the model versatile across different time series forecasting tasks.Furthermore, TCNs offer efficient parallel computation, significantly reducing both the number of training parameters and the training time compared to traditional RNN-and Transformer-based solutions.This efficiency, when combined with the direct computational characteristics of LSTF-Linear networks, renders the TCN-Linear model particularly suitable for large-scale time series datasets.
In the model, we employed the mean squared error (MSE) as the loss function-a commonly used metric in regression problems that calculates the mean squared error between the predicted values and the actual values.The Adaptive Moment Estimation (ADAM) optimizer [38] was utilized for network training due to its efficiency, robustness, and ease of configuration, making it one of the preferred optimizers in deep learning applications.Its role is to adjust the network parameters to minimize the loss function.Additionally, we adopted early stopping to prevent overfitting.This technique terminates the training process prematurely when the validation error stops decreasing after a certain number of epochs, thereby ensuring the model's generalization capability.

Experimental Evaluation
In this section, we evaluate the predictive capability, training cost, and applicability to real financial data of the proposed model using three classical chaotic systems (the Lorenz system, the Mackey-Glass system, and the Rossler system) and real-life stock data.

Lorenz
The Lorenz equations were introduced in 1963 by Edward N. Lorenz [39] during his research on atmospheric convection, marking the inception of chaotic research.The Lorenz model is a dynamic system comprising three ordinary differential equations, representing the three-dimensional state of convective rolls.
when the parameters are set to σ = 10, b = 8/3, r = 28, the system behaves in a chaotic state.In this state, with initial values set to x(0) = 1, y(0) = 0, and z(0) = 1, we generated time series for the system's three variables by employing the ODE45 integration method at a sampling frequency of 200 Hz within the time interval (t in [0, 55]), which had 11,000 points.We removed the first 3000 transient values and divided the remaining 8000 data points into training, validation, and test sets in a 6:2:2 ratio.

Mackey-Glass
The Mackey-Glass system, introduced in 1977 by Michael C. Mackey and Leon Glass [40], is a delay differential equation frequently utilized as a benchmark in chaotic time series analysis. .
when the parameters are set to a = 0.2, b = 0.1, c = 10, τ = 17, the system behaves in a chaotic state.In this state, we generated a time series for the system by employing the ODE45 integration method at a sampling frequency of 10 Hz within the time interval (t in [0, 1100]), which had 11,000 points.We removed the first 3000 transient values and divided the remaining 8000 data points into training, validation, and test sets in a 6:2:2 ratio.

Rossler
The Rossler model, introduced in 1976 by the German biophysicist Otto E. Rössler [41], is a chaotic system.Compared to the Lorenz model, the equations of the Rossler model are simpler, and its phase diagram exhibits a clear spiral structure.This demonstrates that complex chaotic behavior can be observed even in exceedingly simple systems.
when the parameters are set to a = 0.2, b = 0.2, c = 5.9, the system behaves in a chaotic state.
In this state, with the initial values set to x(0) = 0, y(0) = 0, and z(0) = 0, we generated time series for the system's three variables by employing the ODE45 integration method at a sampling frequency of 50 Hz within the time interval (t in [0, 220]), which had 11,000 points.We removed the first 3000 transient values and divided the remaining 8000 data points into training, validation, and test sets in a 6:2:2 ratio.

Google Stock Price
We collected stock trading data of Google Inc. (San Francisco, CA, USA) from 2014 to 2024 from public databases.This dataset encompasses key financial indicators over a decade, including the opening price, closing price, highest price, lowest price of the day, and trading volume of Google's stock.As illustrated in Figure 6, we utilized the pairplot method from the seaborn library to create a diagonal chart that showcases the relationships between multiple variables, and selected the variables strongly correlated with the closing price as features.Subsequently, we divided the 4858 data points into training, validation, and test sets in a 6:2:2 ratio.

Experimental Configuration
The main hardware environments of the experiments were as follows: AMD R5-5600X CPU, NVIDIA RTX 3080Ti 16 GB GPU, and 16 GB of RAM, and the operating system of the computer was Windows 10.The software configuration used for simulation experiments in this study included CUDA Version 12.3, GPU Driver Version 546.29, torch 1.9.0+cu111, and python 3.9.18,and the parameter settings are shown in Table 1.

Prediction Evaluation Index
To evaluate the performance of each model, we set the MAE, MSE, RMSE, MAPE, and R 2 , and their definitions are as follows: where n is the length of the predicted series, y i is the true value, ŷi is the predicted value, and y represents the mean of the true value of the sequence, respectively.

Lorenz
As shown in Figures 7 and 8, these models can capture the dynamic changes of the Lorenz system effectively.Their blue prediction curves align well with the red actual value curves, except for the Transformer network, which exhibits significant noise and fluctuation in its predictions.The results presented in Figure 9 and Table 2 further suggest that the TCN-Linear network, with the smallest number of training parameters, outperforms the other models to varying degrees across various indexes.

Lorenz
As shown in Figures 7 and 8, these models can capture the dynamic changes of the Lorenz system effectively.Their blue prediction curves align well with the red actual value curves, except for the Transformer network, which exhibits significant noise and fluctuation in its predictions.The results presented in Figure 9 and Table 2 further suggest that the TCN-Linear network, with the smallest number of training parameters, outperforms the other models to varying degrees across various indexes.

Mackey-Glass
Due to the Mackey-Glass system's time series containing only one dimension, the networks can fit its curves more smoothly, as demonstrated in Figure 10.However, in the phase space reconstruction shown in Figure 11, the Transformer network still exhibits more noise compared to the other models.According to the results in Figure 12 and Table 3, the error values of all models are quite low, with TCN-Linear continuing to exhibit the best performance among them.

Mackey-Glass
Due to the Mackey-Glass system's time series containing only one dimension, the networks can fit its curves more smoothly, as demonstrated in Figure 10.However, in the phase space reconstruction shown in Figure 11, the Transformer network still exhibits more noise compared to the other models.According to the results in Figure 12 and Table 3, the error values of all models are quite low, with TCN-Linear continuing to exhibit the best performance among them.

Mackey-Glass
Due to the Mackey-Glass system's time series containing only one dimension, the networks can fit its curves more smoothly, as demonstrated in Figure 10.However, in the phase space reconstruction shown in Figure 11, the Transformer network still exhibits more noise compared to the other models.According to the results in Figure 12 and Table 3, the error values of all models are quite low, with TCN-Linear continuing to exhibit the best performance among them.

Rossler
Similar to the previous two experiments, the networks demonstrated excellent performance in fitting the time series of the Rossler system, as can be seen from Figures 13-15.In Table 4 the RMSE, MAE, and MSE values were all very low, and the R 2 values were very close to 1. Once again, TCN-Linear emerged as the top performer.To demonstrate the model's applicability to real financial sequences, we selected Google's stock data from the past decade for analysis and compared TCN-Linear's performance with that of hybrid models CNN-GRU [42], Seq2Seq [43], and Bi-LSTM [44], which have shown promising results in this domain.
Due to the high frequency of fluctuations in real stock data, it is challenging for models to fully learn and fit the actual values without overfitting the data.As depicted in Figure 16, the prediction curves roughly outline the general trend of the stock price.The indexes in Table 5 show a noticeable deterioration compared to the previous chaotic systems; however, the TCN-Linear model still demonstrates excellent predictive capabilities.To demonstrate the model's applicability to real financial sequences, we selected Google's stock data from the past decade for analysis and compared TCN-Linear's performance with that of hybrid models CNN-GRU [42], Seq2Seq [43], and Bi-LSTM [44], which have shown promising results in this domain.
Due to the high frequency of fluctuations in real stock data, it is challenging for models to fully learn and fit the actual values without overfitting the data.As depicted in Figure 16, the prediction curves roughly outline the general trend of the stock price.The indexes in Table 5 show a noticeable deterioration compared to the previous chaotic systems; however, the TCN-Linear model still demonstrates excellent predictive capabilities.

Google Stock Price
To demonstrate the model's applicability to real financial sequences, we selected Google's stock data from the past decade for analysis and compared TCN-Linear's performance with that of hybrid models CNN-GRU [42], Seq2Seq [43], and Bi-LSTM [44], which have shown promising results in this domain.
Due to the high frequency of fluctuations in real stock data, it is challenging for models to fully learn and fit the actual values without overfitting the data.As depicted in Figure 16, the prediction curves roughly outline the general trend of the stock price.The indexes in Table 5 show a noticeable deterioration compared to the previous chaotic systems; however, the TCN-Linear model still demonstrates excellent predictive capabilities.

Conclusions
In this paper, a novel hybrid TCN-Linear model for the prediction of chaotic time series is proposed.To enhance the capacity of the LSTF-Linear model, we integrated it with the Temporal Convolutional Network model, which has long-term memory and parallel computing capabilities, thereby circumventing the gradient issues associated with RNNs and the loss of temporal scale information due to the attention mechanisms in Transformers.Experiments conducted on time series generated by several classical chaotic systems and real stock sequences demonstrate that our model is capable of capturing the future trends of dynamic systems and making accurate predictions.It achieved the lowest error metrics compared to other models, with the R 2 value closest to 1.The novel structure of our model offers fresh insights into solving LTSF problems.However, there are still some limitations in our work.For instance, it is challenging to maintain low error rates in multi-step predictions of multi-dimensional or high-frequency variable data.Moreover, there is still a long way to go in terms of accurately restoring the dynamic behaviors and patterns of chaotic systems.We believe that Recurrent Neural Networks and Reservoir Computing hold promising potential in nonlinear dynamics analysis, so these are areas we aim to explore in future improvements.Our future research will focus on designing models that balance computational resources and prediction accuracy, and applying them to more complex real-world engineering applications such as weather systems, turbulent flow data, and industrial fault diagnosis.

Conclusions
In this paper, a novel hybrid TCN-Linear model for the prediction of chaotic time series is proposed.To enhance the capacity of the LSTF-Linear model, we integrated it with the Temporal Convolutional Network model, which has long-term memory and parallel computing capabilities, thereby circumventing the gradient issues associated with RNNs and the loss of temporal scale information due to the attention mechanisms in Transformers.Experiments conducted on time series generated by several classical chaotic systems and real stock sequences demonstrate that our model is capable of capturing the future trends of dynamic systems and making accurate predictions.It achieved the lowest error metrics compared to other models, with the R 2 value closest to 1.The novel structure of our model offers fresh insights into solving LTSF problems.However, there are still some limitations in our work.For instance, it is challenging to maintain low error rates in multi-step predictions of multi-dimensional or high-frequency variable data.Moreover, there is still a long way to go in terms of accurately restoring the dynamic behaviors and patterns of chaotic systems.We believe that Recurrent Neural Networks and Reservoir Computing hold promising potential in nonlinear dynamics analysis, so these are areas we aim to explore in future improvements.Our future research will focus on designing models that balance computational resources and prediction accuracy, and applying them to more complex real-world engineering applications such as weather systems, turbulent flow data, and industrial fault diagnosis.

Entropy 2024 ,
26, x FOR PEER REVIEW 8 of 16 price as features.Subsequently, we divided the 4858 data points into training, validation, and test sets in a 6:2:2 ratio.

3. 2 .
Experiment Settings 3.2.1.Experimental Configuration The main hardware environments of the experiments were as follows: AMD R5-5600X CPU, NVIDIA RTX 3080Ti 16 GB GPU, and 16 GB of RAM, and the operating system of the computer was Windows 10.The software configuration used for simulation experiments in this study included CUDA Version 12.3, GPU Driver Version 546.29, torch

Table 2 .
Evaluation scores of Lorenz.

Table 2 .
Evaluation scores of Lorenz.

Table 2 .
Evaluation scores of Lorenz.

Table 3 .
Evaluation scores of Mackey-Glass.

Table 3 .
Evaluation scores of Mackey-Glass.

Table 3 .
Evaluation scores of Mackey-Glass.

Table 3 .
Evaluation scores of Mackey-Glass.

Table 5 .
Evaluation scores of the Google stock price.

Table 5 .
Evaluation scores of the Google stock price.