El Niño Index Prediction Based on Deep Learning with STL Decomposition

: ENSO is an important climate phenomenon that often causes widespread climate anomalies and triggers various meteorological disasters. Accurately predicting the ENSO variation trend is of great signiﬁcance for global ecosystems and socio-economic aspects. In scientiﬁc practice, researchers predominantly employ associated indices, such as Niño 3.4, to quantitatively characterize the onset, intensity, duration, and type of ENSO events. In this study, we propose the STL-TCN model, which combines seasonal-trend decomposition using locally weighted scatterplot smoothing (LOESS) (STL) and temporal convolutional networks (TCN). This method uses STL to decompose the original time series into trend, seasonal, and residual components. Each subsequence is then individually predicted by different TCN models for multi-step forecasting, and the predictions from all models are combined to obtain the ﬁnal result. During the veriﬁcation period from 1992 to 2022, the STL-TCN model effectively captures index features and improves the accuracy of multi-step forecasting. In historical event simulation experiments, the model demonstrates advantages in capturing the trend and peak intensity of ENSO events.


Introduction
El Niño-Southern Oscillation (ENSO) is one of the most intense ocean-atmosphere coupling phenomena worldwide [1], characterized by a clear periodicity with a cycle of 2-7 years [2]. Although originating in the tropical Pacific region, ENSO affects various parts of the world through atmospheric teleconnections. It often causes widespread climate anomalies, triggers various meteorological disasters, and has impacts on ecosystems and socio-economics [3][4][5]. In the context of global warming, climate change may aggravate the impact of various disasters caused by it [6]. ENSO has gained significant attention and widespread research interest worldwide [7], seeking to monitor the current status of ENSO and predict its future evolution to prepare for possible abnormal impacts and mitigate the negative effects of ENSO.
To provide a more intuitive representation and effective monitoring of ENSO, previous researchers have summarized a set of indices to characterize the occurrence, intensity, and type of ENSO events. Different indices have been proposed and adopted by research institutions in different countries. For example, the Japan Meteorological Agency uses the JMA index [8]; the Australian Bureau of Meteorology uses the BOM index; the NECP uses the Niño 3.4 index for event monitoring; and the latest standard of the China Meteorological Administration also uses the Niño 3.4 index to define events [9].
Traditional ENSO prediction methods are divided into two main categories: dynamical and statistical models [10]. The dynamical models model and numerically simulate the phenomenon based on the physical laws related to the formation and development of ENSO. While dynamical models effectively capture the underlying physical laws [11], they face

Data
The data used in this study are from the Niño 3.4 index, provided by the Physical Sciences Laboratory (PSL) of the National Oceanic and Atmospheric Administration (NOAA).
This dataset is derived from the monthly mean sea-surface temperature data (Had-ISST1) provided by the Hadley Centre of the UK Met Office, using the 1981-2010 period as the climate baseline. The Niño 3.4 index data used in this study cover the period from January 1871 to December 2022, with a total of 1824 data points. The visualization of the data is shown in Figure 1.

Seasonal and Trend Decomposition Using LOESS
STL is a general and robust method for decomposing time series, whereas LOESS is a method for estimating non-linear relationships [28]. The STL decomposition algorithm was originally proposed by Cleveland [29], and its basic idea is to decompose the original time series ( ) into three components: trend ( ), seasonal ( ), and remainder ( ). The trend component is the low-frequency component of the data, representing the trend and direction of change; the seasonal component is the high-frequency component of the data, representing the regular change of the data over time, usually with a fixed period and amplitude; and the residual component is the remaining component of the original series after subtracting the trend component and the seasonal component, containing the noise in the series as follow: LOESS in STL allows the smoothing of data while preserving the essential features of the data. It smooths the time series by assigning weights to the neighborhood of each data point based on distance, and then performs a polynomial regression fit at each data point, using the points closest to them as explanatory variables. This decomposition is additive so that summing the components yields the original series. Compared to traditional linear regression models, locally-weighted regression can better accommodate nonlinear data relationships and is more robust to outliers and noise.
STL is mainly divided into two procedures: inner loop and outer loop. The inner loop is nested in the outer loop, and the specific process of kth is as follows:

Seasonal and Trend Decomposition Using LOESS
STL is a general and robust method for decomposing time series, whereas LOESS is a method for estimating non-linear relationships [28]. The STL decomposition algorithm was originally proposed by Cleveland [29], and its basic idea is to decompose the original time series (X v ) into three components: trend (T v ), seasonal (S v ), and remainder (R v ). The trend component is the low-frequency component of the data, representing the trend and direction of change; the seasonal component is the high-frequency component of the data, representing the regular change of the data over time, usually with a fixed period and amplitude; and the residual component is the remaining component of the original series after subtracting the trend component and the seasonal component, containing the noise in the series as follow: LOESS in STL allows the smoothing of data while preserving the essential features of the data. It smooths the time series by assigning weights to the neighborhood of each data point based on distance, and then performs a polynomial regression fit at each data point, using the points closest to them as explanatory variables. This decomposition is additive so that summing the components yields the original series. Compared to traditional linear regression models, locally-weighted regression can better accommodate nonlinear data relationships and is more robust to outliers and noise. STL is mainly divided into two procedures: inner loop and outer loop. The inner loop is nested in the outer loop, and the specific process of kth is as follows:

1.
Detrending. Subtract the trend component from the original series: Cycle-subseries smoothing. In Step (1) the detrended time series is broken into cyclesubseries. Each subseries is smoothed using LOESS, and the smoothed subseries are combined into a new series, denoted as C

3.
Low-Pass filtering of smoothed cycle subseries. The series obtained from Step (2) is processed using low-pass filtering, and then the regression operation is performed using LOESS, denoted as L v (k+1) .
Trend smoothing. The series obtained in step (5) is smoothed using LOESS to obtain a new trend component T In the outer loop, the seasonal and trend components obtained in the inner loop are used to calculate the remainder component R

Temporal Convolutional Networks
Temporal Convolutional Networks (TCN) is a sequence modeling approach based on convolutional neural networks, specifically designed for modeling and predicting time series data. It was originally proposed by Bai et al. [30] and consists primarily of causal convolutions, dilated convolutions, and residual connections.
Causal convolution is a convolutional operation that preserves the temporal order of the input sequence. The convolution kernel only operates on the past portion of the input sequence and cannot access the future portion, thereby ensuring the causality of the convolution. This causality property makes the model more interpretable and stable, as it helps to avoid issues of information leakage or spurious correlations.
Dilated convolution, also known as atrous convolution, overcomes the limitation of standard causal convolution when dealing with long temporal sequences. Standard causal convolution requires an increase in the number of network layers or larger filters to capture longer dependencies. However, by introducing dilated convolution, we can enlarge the receptive field of neurons without increasing the model parameters and computational complexity. This allows the model to effectively capture features from farther distances. Refer to Figure 2 for an illustration of this concept. 2. Cycle-subseries smoothing. In Step (1) the detrended time series is broken into cyclesubseries. Each subseries is smoothed using LOESS, and the smoothed subseries are combined into a new series, denoted as ( +1) . 3. Low-Pass filtering of smoothed cycle subseries. The series obtained from Step (2) is processed using low-pass filtering, and then the regression operation is performed using LOESS, denoted as ( +1) . 4. Detrending of smoothed cycle subseries. ( +1) = ( +1) − ( +1) . 5. Deseasonalizing. − ( +1) . 6. Trend smoothing. The series obtained in step (5) is smoothed using LOESS to obtain a new trend component ( +1) .
In the outer loop, the seasonal and trend components obtained in the inner loop are used to calculate the remainder component ( +1) = − ( +1) -( +1) .

Temporal Convolutional Networks
Temporal Convolutional Networks (TCN) is a sequence modeling approach based on convolutional neural networks, specifically designed for modeling and predicting time series data. It was originally proposed by Bai et al. [30] and consists primarily of causal convolutions, dilated convolutions, and residual connections.
Causal convolution is a convolutional operation that preserves the temporal order of the input sequence. The convolution kernel only operates on the past portion of the input sequence and cannot access the future portion, thereby ensuring the causality of the convolution. This causality property makes the model more interpretable and stable, as it helps to avoid issues of information leakage or spurious correlations.
Dilated convolution, also known as atrous convolution, overcomes the limitation of standard causal convolution when dealing with long temporal sequences. Standard causal convolution requires an increase in the number of network layers or larger filters to capture longer dependencies. However, by introducing dilated convolution, we can enlarge the receptive field of neurons without increasing the model parameters and computational complexity. This allows the model to effectively capture features from farther distances. Refer to Figure 2 for an illustration of this concept. A residual block consists of two causal dilated convolutional layers, each followed by an optional batch normalization and activation function, as illustrated in Figure 3. By utilizing causal and dilated convolutions, these convolutional layers can pass the output of the previous layer to the next layer for processing. Each residual block also includes a skip connection, which aims to address the gradient vanishing problem by allowing information to bypass one or more layers in the network. This ensures the preservation of the original input data even in very deep networks. A residual block consists of two causal dilated convolutional layers, each followed by an optional batch normalization and activation function, as illustrated in Figure 3. By utilizing causal and dilated convolutions, these convolutional layers can pass the output of the previous layer to the next layer for processing. Each residual block also includes a skip connection, which aims to address the gradient vanishing problem by allowing information to bypass one or more layers in the network. This ensures the preservation of the original input data even in very deep networks.

A Multi-Step El Niño Index Forecasting Strategy
Early time series forecasting uses existing historical data to predict a single data point in the future, e.g., to predict tomorrow's temperature, i.e., single-step forecasting, but forecasting a single value provides more limited information. In many cases, multi-step forecasting methods are needed to better predict future trends and changes, i.e., historical time series [ , … , ] are used to predict H values [ , … , ], where H > 1 indicates the period of the forecast. At present, there are four main methods for multi-step prediction: direct strategy, recursive strategy, direct and recursive fusion strategy, and multiple outputs.
In this paper, we used a direct multi-step forecasting strategy, also known an independent strategy, to train multiple models to predict multiple values, with each model predicting one value independently of the other. The specific calculation process is shown as follows:

Proposed Model
The STL-TCN model combines STL and TCN. The Niño 3.4 index is decomposed into simpler and meaningful components via STL. This helps the TCN to capture the trend, cyclical and seasonal features in the series, which in turn improves the forecast accuracy. The specific steps are as follows:

A Multi-Step El Niño Index Forecasting Strategy
Early time series forecasting uses existing historical data to predict a single data point in the future, e.g., to predict tomorrow's temperature, i.e., single-step forecasting, but forecasting a single value provides more limited information. In many cases, multi-step forecasting methods are needed to better predict future trends and changes, i.e., historical time series [y 1 , . . . , y N ] are used to predict H values [y N+1 , . . . , y N+H ], where H > 1 indicates the period of the forecast. At present, there are four main methods for multi-step prediction: direct strategy, recursive strategy, direct and recursive fusion strategy, and multiple outputs.
In this paper, we used a direct multi-step forecasting strategy, also known an independent strategy, to train multiple models to predict multiple values, with each model predicting one value independently of the other. The specific calculation process is shown as follows:ŷ The STL-TCN model combines STL and TCN. The Niño 3.4 index is decomposed into simpler and meaningful components via STL. This helps the TCN to capture the trend, cyclical and seasonal features in the series, which in turn improves the forecast accuracy. The specific steps are as follows: Normalize the three components.

3.
Prediction of each of these three components using TCN neural network.

4.
The trend component forecast, seasonal component forecast, and residual component forecast of Niño 3.4 index will be inverse normalized and summed to obtain the final Niño 3.4 forecast, The detailed process can be seen in Figure 4. In this experiment the whole data set is divided into training and test sets according to the ratio of 8:2. The specific division is shown in Table 1. The model consists of STL, TCN layers, and a fully connected layer. The main hyperparameters of TCN include the size of the convolutional kernel, dilation factors, and the number of convolutional kernels. In this study, the hyperparameters for TCN are set as follows: the one-dimensional convolutional kernel size is set to 7; the dilation factors are sequentially set to 1, 2, and 4; the residual modules consist of three layers, and the number of convolutional kernels in each layer is set to 128, 64, and 32, respectively; and the dropout parameter is set to 0.2. The optimizer used is the Adam algorithm, with a learning rate of 0.001. The maximum number of training epochs is set to 20, the batch size is set to 4, and the random seed is fixed.

Evaluation Metrics
To evaluate the model's performance, the following three evaluation metrics were selected in this study.  In this experiment the whole data set is divided into training and test sets according to the ratio of 8:2. The specific division is shown in Table 1. The model consists of STL, TCN layers, and a fully connected layer. The main hyperparameters of TCN include the size of the convolutional kernel, dilation factors, and the number of convolutional kernels. In this study, the hyperparameters for TCN are set as follows: the one-dimensional convolutional kernel size is set to 7; the dilation factors are sequentially set to 1, 2, and 4; the residual modules consist of three layers, and the number of convolutional kernels in each layer is set to 128, 64, and 32, respectively; and the dropout parameter is set to 0.2. The optimizer used is the Adam algorithm, with a learning rate of 0.001. The maximum number of training epochs is set to 20, the batch size is set to 4, and the random seed is fixed.

Evaluation Metrics
To evaluate the model's performance, the following three evaluation metrics were selected in this study. (1) Root mean square error (RMSE). (2) Mean absolute error (MAE).
(3) Pearson correlation coefficient (PCC). The calculation formulae are as follows: y i represents the observed value (true value);ŷ i represents the predicted value; y and y are the averages of y i and concernino i = 1, 2, . . . , n, respectively.

Models Are Trained Using Different Time Windows
The time window refers to the duration of past observed values considered when predicting future data points. Selecting an appropriate time window is crucial for the performance of prediction models in many forecasting problems. If the time window is too short, the model might fail to capture long-term trends and patterns, thus limiting its predictive capabilities. On the other hand, if the time window is too long, the model may capture excessive historical information, including noise and irrelevant details, that do not hold representativeness for future predictions, thereby affecting the generalization of the model's performance. Therefore, it is essential to carefully choose the time window, considering the characteristics of the data and the forecasting objectives, to achieve optimal predictive performance.
Predicting the Niño 3.4 index is a typical task in time series forecasting. The time window within a time series model has a significant impact on its performance, as it directly affects the structure of the temporal data and consequently influences the model's training and performance. To determine the optimal time window, this study varied the time window of the model and evaluated its performance. The time windows were increased by intervals of 3 months, ranging from 3 to 36. Specifically, the following time windows were tested: {3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36}. These time windows were used to train the model, and performance metrics such as PCC, RMSE, and MAE were computed. Table 2 presents the predictive performance of different time windows for a 12-month lead time. It can be observed that when the time window is 12 months, the PCC reaches its highest value of 0.62, while the RMSE and MAE reach their lowest values of 0.70 • C and 0.55 • C, respectively. In contrast, the predictive performance is poorer when the time window is 3 months. This indicates that shorter time windows may not provide sufficient information for forecasting long-term sequences, making it challenging for the model to capture effective periodic and seasonal features. From Figure 5, it can be seen that as the time window exceeds 12 months, the errors increase while the correlation coefficients decrease, resulting in a deterioration of the model's predictive performance. This suggests that a larger time window is not necessarily better. Therefore, when predicting the Niño 3.4 index, a time window of 12 months is most suitable.

Comparison and Analysis of Different Models
To verify the accuracy and validity of the prediction model proposed in this paper, other more commonly used models were constructed for comparison experiments, specifically the gate recurrent unit (GRU), multiple layer perception (MLP), LSTM, TCN and STL-LSTM. Figure 6 illustrates the variations in PCC, RMSE, and MAE metrics for different models when forecasting the Niño 3.4 index ahead by 1-24 months, while Table 3 presents detailed results for selected months. From the figure, it can be observed that as the lead time increases, the predictive performance of all models tends to decline. At a lead time of one month, the correlation coefficient metrics for all models are quite similar, with STL-TCN slightly higher. However, except for the STL-TCN and STL-LSTM models, the predictive performance of the remaining models rapidly deteriorates, providing effective forecasts for only about six months. The STL-TCN model achieves the longest effective forecasting period, followed by the STL-LSTM model.

Comparison and Analysis of Different Models
To verify the accuracy and validity of the prediction model proposed in this paper, other more commonly used models were constructed for comparison experiments, specifically the gate recurrent unit (GRU), multiple layer perception (MLP), LSTM, TCN and STL-LSTM. Figure 6 illustrates the variations in PCC, RMSE, and MAE metrics for different models when forecasting the Niño 3.4 index ahead by 1-24 months, while Table 3 presents detailed results for selected months. From the figure, it can be observed that as the lead time increases, the predictive performance of all models tends to decline. At a lead time of one month, the correlation coefficient metrics for all models are quite similar, with STL-TCN slightly higher. However, except for the STL-TCN and STL-LSTM models, the predictive performance of the remaining models rapidly deteriorates, providing effective forecasts for only about six months. The STL-TCN model achieves the longest effective forecasting period, followed by the STL-LSTM model. Overall, the STL-TCN model exhibits the best predictive performance with the lowest being the RMSE and MAE. The MLP model performs relatively poorly, which demonstrates the weaker ability of traditional feed-forward neural networks, such as MLP, in handling temporal relationships. TCN, on the other hand, excels at capturing long-term dependencies in the data, leading to better performance. Both LSTM and GRU demonstrate similar predictive performance, although GRU has fewer parameters. However,  Overall, the STL-TCN model exhibits the best predictive performance with the lowest being the RMSE and MAE. The MLP model performs relatively poorly, which demonstrates the weaker ability of traditional feed-forward neural networks, such as MLP, in handling temporal relationships. TCN, on the other hand, excels at capturing long-term dependencies in the data, leading to better performance. Both LSTM and GRU demonstrate similar predictive performance, although GRU has fewer parameters. However, based on correlation coefficient indicators, the basic GRU model slightly underperforms compared to the basic LSTM model. The STL-TCN model shows a significant improvement in predictive performance compared to the TCN model, and, similarly, the STL-LSTM model exhibits notable enhancements over the LSTM model. These findings indicate that applying STL to extract components such as trend, seasonality, and residuals effectively reduces the inter-component interactions and contributes to improving the accuracy of multi-step predictions.
To further explore the impact of different decomposed sequences on predictive performance, we conducted predictions for each sequence and present the results in Figure 7d, illustrating the decomposed trend component (T), seasonal component (S) and residual component (R). By comparing the prediction results of each component, we can better understand their contributions to the final prediction of the Niño 3.4 index. Based on the results, we can observe that the seasonal component exhibits the best prediction performance. This may be attributed to its regular and periodic nature, which makes it easier to predict. The results of the trend component exhibit the closest similarity to the final forecasting results of the Niño 3.4. The remainder component is also known as the noise or the residuals of the decomposition. It contains the random fluctuations, measurement errors, or other factors that cannot be attributed to the underlying trend or seasonal effects. These unpredictable elements make it challenging to capture their future behavior accurately.
values; and an RMSE of 0.20 °C and MAE of 0.16 °C indicate relatively small forecasting errors. However, with the increase in the prediction time, the fluctuation of the predicted value gradually decreases and a lag phenomenon appears. This phenomenon may stem from the increased uncertainty due to the increase in forecast length. The lowest match between the forecast results and the actual value curve is found 12 months ahead of time, with the PCC dropping to 0.62, RMSE increasing to 0.70 °C and MAE increasing to 0.55 °C, indicating an increase in forecast error.

ENSO Event Prediction and Analysis
To further validate the prediction effectiveness of the STL-TCN model for ENSO events, the model was used to predict ENSO events in 1998ENSO events in /1999ENSO events in , 2009   However, with the increase in the prediction time, the fluctuation of the predicted value gradually decreases and a lag phenomenon appears. This phenomenon may stem from the increased uncertainty due to the increase in forecast length. The lowest match between the forecast results and the actual value curve is found 12 months ahead of time, with the PCC dropping to 0.62, RMSE increasing to 0.70 • C and MAE increasing to 0.55 • C, indicating an increase in forecast error.

ENSO Event Prediction and Analysis
To further validate the prediction effectiveness of the STL-TCN model for ENSO events, the model was used to predict ENSO events in 1998/1999, 2009/2010, 2018/2019, and 2020/2021. As can be seen in Figure 8  intensities, especially in capturing event trends and peak intensities, and provides an effective tool for ENSO event studies.

Conclusions
This paper presents a novel hybrid model that combines the STL decomposition algorithm and the TCN model for Niño 3.4 index prediction. Taking RMSE, MAE, and R as the evaluation indicators of prediction accuracy, through experimental verification and comparative analysis, the following conclusions are drawn: 1. Combining the STL time series decomposition method and the TCN model, the cumulative error of long time series forecasting can be significantly reduced, and the forecasting accuracy can be greatly improved at the same time.

Conclusions
This paper presents a novel hybrid model that combines the STL decomposition algorithm and the TCN model for Niño 3.4 index prediction. Taking RMSE, MAE, and R as the evaluation indicators of prediction accuracy, through experimental verification and comparative analysis, the following conclusions are drawn:

1.
Combining the STL time series decomposition method and the TCN model, the cumulative error of long time series forecasting can be significantly reduced, and the forecasting accuracy can be greatly improved at the same time.

2.
Compared with the popular LSTM, the STL-TCN model performs better and has better prediction results.

3.
The STL-TCN model can effectively forecast the ENSO events in 1998/1999, 2009/2010, 2018/2019 and 2020/2021, and the prediction results of these events can fit the fluctuations and trends of their changes well.
In this study, the Niño 3.4 index was selected as the object for time series prediction, but ENSO also contains other indices. Therefore, applying model migration learning to other relevant ENSO indices for prediction can be considered in the future.