A Novel Model for Spot Price Forecast of Natural Gas Based on Temporal Convolutional Network

: Natural gas is often said to be the most environmentally friendly fossil fuel. Its usage has increased signiﬁcantly in recent years. Meanwhile, accurate forecasting of natural gas spot prices has become critical to energy management, economic growth, and environmental protection. This work offers a novel model based on the temporal convolutional network (TCN) and dynamic learning rate for predicting natural gas spot prices over the following two weekdays. The residual block structure of TCN provides good prediction accuracy


Introduction
According to the International Energy Agency's "Gas Market Report Q3 2021", natural gas consumption is likely to increase by 3.6% in 2021 as the global economic recovery continues, despite a 1.9% decline in global natural gas consumption in 2020 [1]. Unless significant policy changes are implemented to curb global natural gas consumption, demand will continue to grow in the coming years, reaching around 4.3 trillion cubic meters by 2024, higher than pre-pandemic levels. The consumption of natural gas is rapidly increasing in line with the goal of reducing worldwide greenhouse gas emissions. In 2020, the value reached a record-breaking 24.7% of total primary energy consumption [2].
With the proportion of fossil fuels such as coal and oil consumed declining annually, the share of natural gas in the global consumption of primary energy may surpass those of coal and oil in succession after 2030 [3]. As a sort of clean energy, natural gas, with a high calorific value and a low environmental hazard, acts as an excellent chemical raw material. The application of natural gas can reduce global carbon emissions and is crucial for the energy transition to mitigate the greenhouse effect. At the same time, it can address the intermittent and unpredictable nature of wind and photovoltaic generation [4].
A recent study examined natural gas return predictions by using the return connectivity index [5]. The research produces favorable results in predicting natural gas returns, which is helpful for future predictions of natural gas spot prices. With the assumption that natural gas will play a greater role in the economy, natural gas prices have become more critical. Due to lockdowns over most of the countries caused by the COVID-19 pandemic, the decreased demand for natural gas in the first half of 2020 resulted in all-time low spot prices [6]. Furthermore, the conflict between Russia and Ukraine restricted the supply of natural gas in 2021, leading to significant price variations [7]. Over a certain time period, natural gas spot prices are likely to be affected by this conflict. Accurate forecasting of natural gas prices is a valuable tool for corporate development and personal investment [8]. Moreover, it could enhance the energy market's stability, which is a necessary condition for the global economy's rapid growth.

Related Work
According to a survey of the current literature, the majority of research on energy price forecasting focuses on electricity and oil prices [9][10][11]. Using traditional statistical techniques such as multiple linear regression [12] and autoregressive integrated moving average (ARIMA) [13], as well as machine learning and deep learning-based techniques.
In a previous study on natural gas spot price forecasting, trader positions were utilized to forecast the natural gas market's spot price trend for the coming month and to conduct seasonal analysis on the natural gas price trend [14]. However, the study could only make qualitative predictions about the price trend but could not properly forecast the daily spot price of natural gas. The research object was monthly forward products (futures) rather than natural gas spot market prices, employing a unique type of neural network named multilayer perceptron (MLP) to compute the price forecasts. The results of this work were marginally worse than those obtained by utilizing the generalized autoregressive conditional heteroskedasticity (GARCH) technique [15]. Another research study examining natural gas spot prices evaluated numerous nonlinear models, taking local linear regression (LLR), dynamic local linear regression (DLLR), and artificial neural networks (ANN) into account. Daily, weekly, and monthly Henry Hub spot natural gas prices in the United States were carried out to train and test the model's predictive ability from 1997 to 2012. Although the MAPE of the DLLR model was lower than that of artificial neural networks (ANN), the model's ability to predict prices was not very satisfactory [16]. The volatility of Henry Hub natural gas future prices was analyzed with the GARCH model. Since energy production and delivery require significant cost and time, the market is extremely sensitive to imbalances between demand and supply capacity, resulting in considerable price volatility. Additionally, the authors believed that the significant volatility of pricing is influenced by weather conditions and capacity restrictions in the manufacturing sector [17].
The hybrid intelligent framework combining rule-based expert systems (RES) and the group method of data handling (GMDH) could be carried out to produce estimates [18]. In comparison to the GMDH and multilayer feedforward neural networks, the proposed hybrid model developed superior prediction accuracy. In addition, the dynamic prediction method based on the nonlinear autoregressive method with external input (NARX) neural networks can be used to forecast the spot price of natural gas in Germany's NetConnect market [19]. The authors took five variables (temperature, exchange rate, and the locations of the three major gas hubs) that affect the model's training effect into account. However, the prediction performance did not improve considerably. Further, to forecast natural gas prices, a weighted mixed model was presented. The hybrid model incorporates support vector regression and a long short-term memory network, as well as an enhanced search for pattern sequence similarity (IPSS). Data on the spot price of natural gas in the United States before June 2018 were used to train the model parameters, and data from June 2018 to May 2019 were used to assess the model's prediction abilities. The results indicate that the mixed model is more competitive [20]. To screen input features for natural gas price prediction, a feature selection method was applied. Natural gas prices were predicted using the feature selection technique combined with the traditional time series model ARIMA and the machine learning model SVR. The findings demonstrate that when predicting the spot price of natural gas, a machine learning model that preselects variables may frequently obtain superior prediction results, and the effectiveness and advantages of the feature selection technique have been demonstrated [21]. To forecast the current price of natural gas, the least squares regression boosting (LSBoost) technique was carried out. The model's performance was validated with daily Henry Hub natural gas spot prices from January 2001 to December 2017. When linear regression, linear support vector machine (SVM), quadratic support vector machine (SVM), and cubic support vector machine (SVM) were compared, the LSBoost approach showed the best prediction effect, as indicated by the highest R-square and the lowest MAE and RMSE [22].
In light of the growing popularity of machine learning methods for natural gas price prediction, a study examined four widely used machine learning models: artificial neural networks (ANN), support vector machines (SVM), gradient boosting machines (GBM), and Gaussian process regression (GPR) [23]. Monthly Henry Hub natural gas spot prices from January 2001 to October 2018 were implemented to validate the model's predictive ability.
The findings indicate that these four machine learning approaches perform differently when it comes to forecasting natural gas prices. In general, ANN outperforms SVM, GBM, and GPR in terms of prediction performance. However, there are more restrictions and worse predictive abilities with typical machine learning than with deep learning. To forecast natural gas prices, a hybrid model based on deep learning was deployed [24]. This hybrid model combines the convolutional neural network (CNN) and LSTM, where the CNN is used to extract features from natural gas data and the latter is utilized to learn the time dependence of time series over the long and short term. Although the prediction model has a decent prediction performance, this is primarily due to the relatively steady time series of the test data. Any increase in data reduces the model's accuracy.
In summary, deep learning methods are increasingly being used in energy price forecasting since they outperform traditional statistical and machine learning methods in prediction accuracy. The research in this paper indicates that TCN has a favorable influence on natural gas price forecasting, making it an excellent candidate for natural gas price forecasting. The dynamic learning rate is utilized to further enhance the model's predictive ability.
The following section summarizes this paper's highlights.
(1) We propose a unique time series forecasting model for forecasting natural gas spot prices. The unique dilated causal convolutions in TCN efficiently extend the model's receptive field and minimize the amount of computation, allowing the model to improve prediction accuracy while also decreasing its operating time. The residual block structure in TCN ensures the deep network's predictive ability. In addition, this model can more correctly capture natural gas price changes and forecast natural gas prices. Furthermore, the proposed model is simple and efficient, avoiding the redundancy introduced by hybrid models while maintaining excellent prediction accuracy. (2) The dynamic learning rate setting further improves the model's predictive ability. The dynamic learning rate overcomes the problem of the model failing to converge due to a high learning rate, and the model easily falls into a locally optimal solution owing to a low learning rate, allowing the model to identify the optimal solution faster and with greater stability. (3) Comparing the model proposed in this research with the other three deep learning models, LSTM, GRU, and 1D-CNN, our model performs best at forecasting natural gas spot prices, demonstrating the usefulness and usability of TCN in natural gas spot price prediction. Accurate natural gas price forecasts can serve as critical supplements for personal investment planning, business strategic deployment, and the formation of national policies. Accurately forecasting natural gas prices will aid in ensuring national energy security and global economic stability, which are critical for practical purposes.

Methods
TCN is the primary component of the natural gas price prediction model proposed in this paper, and it significantly enhances the model's prediction accuracy. The proposed model's overall structure is depicted in Figure 1. The TCN contains two residual blocks with dilation factors of 1 and 2, respectively. The TCN filters have a total of thirty-two, and the kernel has a total of two. After TCN training, the flatten layer converts the input data to one-dimensional data, and the dense layer outputs the data at two points in the future, namely, the natural gas price predictions for the first and second days in the future. The advantages of TCN are demonstrated by introducing the TCN, 1D-CNN, LSTM, and GRU principles, respectively.
Energies 2023, 16, x FOR PEER REVIEW 4 of 16 ensuring national energy security and global economic stability, which are critical for practical purposes.

Methods
TCN is the primary component of the natural gas price prediction model proposed in this paper, and it significantly enhances the model's prediction accuracy. The proposed model's overall structure is depicted in Figure 1. The TCN contains two residual blocks with dilation factors of 1 and 2, respectively. The TCN filters have a total of thirty-two, and the kernel has a total of two. After TCN training, the flatten layer converts the input data to one-dimensional data, and the dense layer outputs the data at two points in the future, namely, the natural gas price predictions for the first and second days in the future.
The advantages of TCN are demonstrated by introducing the TCN, 1D-CNN, LSTM, and GRU principles, respectively.

TCN
TCN is a relatively recent time series forecasting model that performs exceptionally well while processing time series [25]. Notably, TCN is not an upgrade over traditional RNN but rather an improvement over CNN. TCN combines dilated causal convolutions [26] and residual blocks [27]. Figure 2 illustrates the structure of dilated causal convolutions. At the corresponding time, the causality performance of dilated causal convolutions is determined by the data from the previous time, and information for future time cannot be retrieved. The dilatability performance for the output is as follows: as the number of convolutional layers rises, the dilation factors d increase, suggesting that the gap between sampling points continues to grow. In other words, as the number of convolutional layers rises, is able to obtain more historical information about the input sequence via skip connections. The dilation factors provide with a larger receptive field, allowing it to extract more input data while minimizing calculations [28]. The receptive field in this context refers to the capability to perceive a particular amount of historical input information. A larger receptive field increases the ability to extract historic information, hence enhancing the prediction performance.

TCN
TCN is a relatively recent time series forecasting model that performs exceptionally well while processing time series [25]. Notably, TCN is not an upgrade over traditional RNN but rather an improvement over CNN. TCN combines dilated causal convolutions [26] and residual blocks [27]. Figure 2 illustrates the structure of dilated causal convolutions. At the corresponding time, the causality performance of dilated causal convolutions is determined by the data from the previous time, and information for future time cannot be retrieved. The dilatability performance for the output y 9 is as follows: as the number of convolutional layers rises, the dilation factors d increase, suggesting that the gap between sampling points continues to grow. In other words, as the number of convolutional layers rises, y 9 is able to obtain more historical information about the input sequence via skip connections. The dilation factors provide y 9 with a larger receptive field, allowing it to extract more input data while minimizing calculations [28]. The receptive field in this context refers to the capability to perceive a particular amount of historical input information. A larger receptive field increases the ability to extract historic information, hence enhancing the prediction performance. The residual block has been shown to be an efficient method of connecting deep learning network layers [29]. As the number of layers in deep learning networks increases, the deep network may become saturated or degraded, resulting in a training impact that is less effective than that of the shallow network. That is, as the number of network layers increases, the predictive performance does not keep improving. Instead, it may even worsen. The residual block connection ensures that the deep network's prediction impact is effective. As illustrated in Figure 3 and Equation (1), the residual block is composed of an identity map and a residual function ℱ( ).

1D-CNN
Unless otherwise indicated, CNN denotes a 2D CNN and is a classical deep-learning architecture. CNN was inspired by the visual recognition mechanisms of animals and is frequently utilized in image recognition [30]. The contemporary architecture for CNN was devised in 1989 [31], and gradient learning was later used to enhance CNN's performance further [32]. A CNN's basic structure is composed of three layers: a convolutional layer, a pooling layer, and a fully connected layer. Since AlexNet won the ImageNet championship in 2012, CNN has been significantly enhanced, with its primary structure expanded to six components: a convolutional layer, a pooling layer, an activation function, a loss function, regularization, and optimization [33].
The distinction between 1D-CNN and 2D-CNN is mostly reflected in the convolution kernel's movement direction. The 1D-CNN convolution kernel has a single moving direction, to be precise. As illustrated in Figure 4, the convolution operation is performed from left to right. The 2D-CNN convolution kernel contains two moving directions, which makes it extremely well-suited for processing two-dimensional image input. The convolution kernel of the 1D-CNN has a single movement direction and is frequently used for time series predictions [34]. The residual block has been shown to be an efficient method of connecting deep learning network layers [29]. As the number of layers in deep learning networks increases, the deep network may become saturated or degraded, resulting in a training impact that is less effective than that of the shallow network. That is, as the number of network layers increases, the predictive performance does not keep improving. Instead, it may even worsen. The residual block connection ensures that the deep network's prediction impact is effective. As illustrated in Figure 3 and Equation (1), the residual block is composed of an identity map and a residual function F (x). The residual block has been shown to be an efficient method of connecting deep learning network layers [29]. As the number of layers in deep learning networks increases, the deep network may become saturated or degraded, resulting in a training impact that is less effective than that of the shallow network. That is, as the number of network layers increases, the predictive performance does not keep improving. Instead, it may even worsen. The residual block connection ensures that the deep network's prediction impact is effective. As illustrated in Figure 3 and Equation (1), the residual block is composed of an identity map and a residual function ℱ( ).

1D-CNN
Unless otherwise indicated, CNN denotes a 2D CNN and is a classical deep-learning architecture. CNN was inspired by the visual recognition mechanisms of animals and is frequently utilized in image recognition [30]. The contemporary architecture for CNN was devised in 1989 [31], and gradient learning was later used to enhance CNN's performance further [32]. A CNN's basic structure is composed of three layers: a convolutional layer, a pooling layer, and a fully connected layer. Since AlexNet won the ImageNet championship in 2012, CNN has been significantly enhanced, with its primary structure expanded to six components: a convolutional layer, a pooling layer, an activation function, a loss function, regularization, and optimization [33].
The distinction between 1D-CNN and 2D-CNN is mostly reflected in the convolution kernel's movement direction. The 1D-CNN convolution kernel has a single moving direction, to be precise. As illustrated in Figure 4, the convolution operation is performed from left to right. The 2D-CNN convolution kernel contains two moving directions, which makes it extremely well-suited for processing two-dimensional image input. The convolution kernel of the 1D-CNN has a single movement direction and is frequently used for time series predictions [34].

1D-CNN
Unless otherwise indicated, CNN denotes a 2D CNN and is a classical deep-learning architecture. CNN was inspired by the visual recognition mechanisms of animals and is frequently utilized in image recognition [30]. The contemporary architecture for CNN was devised in 1989 [31], and gradient learning was later used to enhance CNN's performance further [32]. A CNN's basic structure is composed of three layers: a convolutional layer, a pooling layer, and a fully connected layer. Since AlexNet won the ImageNet championship in 2012, CNN has been significantly enhanced, with its primary structure expanded to six components: a convolutional layer, a pooling layer, an activation function, a loss function, regularization, and optimization [33].
The distinction between 1D-CNN and 2D-CNN is mostly reflected in the convolution kernel's movement direction. The 1D-CNN convolution kernel has a single moving direction, to be precise. As illustrated in Figure 4, the convolution operation is performed from left to right. The 2D-CNN convolution kernel contains two moving directions, which makes it extremely well-suited for processing two-dimensional image input. The convolution kernel of the 1D-CNN has a single movement direction and is frequently used for time series predictions [34].
The 1D-CNN convolution kernel structure is illustrated in Figure 4. represents the output of the 1D-CNN convolution kernel; , the original one-dimensional data; ( ) , the trainable weight associated with the convolution kernel, ∈ 1,2,3,4 .

LSTM
LSTM is a type of RNN that is an enhancement over traditional RNNs [35]. As RNN involves iterative operations, the iteration value in the gradient operation may continually decrease or rise, resulting in gradient disappearance and gradient explosion. Specifically, the problem of gradient disappearance and explosion during the training process of a long time series of conventional RNNs is addressed by incorporating some control units into the LSTM [36]. As a result, LSTM outperforms traditional RNNs in forecasting longer time series.
The internal structure of LSTM is shown in Figure 5. The equations of LSTM are as follows: = tanh( ℎ + + ) and represent the cell state of the LSTM at the previous and current times, respectively; represents the input at the current time; determines whether to retain the prior cell state ; ℎ and ℎ are output variables that describe the state of the hidden layer at the previous and current times, respectively; determines whether to retain the state of the current layer; denotes the present condition of the cell that should be added. The 1D-CNN convolution kernel structure is illustrated in Figure 4. y k represents the output of the 1D-CNN convolution kernel; x j , the original one-dimensional data; w j−3(k−1) , the trainable weight associated with the convolution kernel, k ∈ {1, 2, 3, 4}.

LSTM
LSTM is a type of RNN that is an enhancement over traditional RNNs [35]. As RNN involves iterative operations, the iteration value in the gradient operation may continually decrease or rise, resulting in gradient disappearance and gradient explosion. Specifically, the problem of gradient disappearance and explosion during the training process of a long time series of conventional RNNs is addressed by incorporating some control units into the LSTM [36]. As a result, LSTM outperforms traditional RNNs in forecasting longer time series.
The internal structure of LSTM is shown in Figure 5. The equations of LSTM are as follows: Energies 2023, 16

GRU
GRU is a recently developed and enhanced RNN. It not only resolves the gradient disappearance and explosion issues associated with traditional RNN model training but also avoids complex calculations comparable to those associated with LSTM [37]. As illustrated in Figure 6, GRU is primarily composed of three gating units: an update gate, a reset gate, and an output gate [38]. The update gate in GRU takes the place of the LSTM's input and forget gates. The update gate regulates the amount of prior state information that is transferred into the present state. The bigger the value of the update gate, the more past state information is incorporated. c t−1 and c t represent the cell state of the LSTM at the previous and current times, respectively; x t represents the input at the current time; f t determines whether to retain the prior cell state c t−1 ; h t−1 and h t are output variables that describe the state of the hidden layer at the previous and current times, respectively; i t determines whether to retain the state of the current layer; ∼ c t denotes the present condition of the cell that should be added.

GRU
GRU is a recently developed and enhanced RNN. It not only resolves the gradient disappearance and explosion issues associated with traditional RNN model training but also avoids complex calculations comparable to those associated with LSTM [37]. As illustrated in Figure 6, GRU is primarily composed of three gating units: an update gate, a reset gate, and an output gate [38]. The update gate in GRU takes the place of the LSTM's input and forget gates. The update gate regulates the amount of prior state information that is transferred into the present state. The bigger the value of the update gate, the more past state information is incorporated. ℎ −1 ℎ Figure 5. LSTM neuron structure.

GRU
GRU is a recently developed and enhanced RNN. It not only resolves the gradient disappearance and explosion issues associated with traditional RNN model training but also avoids complex calculations comparable to those associated with LSTM [37]. As illustrated in Figure 6, GRU is primarily composed of three gating units: an update gate, a reset gate, and an output gate [38]. The update gate in GRU takes the place of the LSTM's input and forget gates. The update gate regulates the amount of prior state information that is transferred into the present state. The bigger the value of the update gate, the more past state information is incorporated. Figure 6. GRU neuron structure.
The internal structure of GRU is shown in Figure 6. The equations of GRU are as follows: z t signifies the update gate, which is used to regulate the amount of state information introduced into the current state from the previous moment. The bigger the value of the update gate, the more previous state information is brought in. h (t−1) indicates the output vector at an earlier time, while x t is the current input vector. r t denotes the reset gate, which is used to regulate the amount of information from the previous moment that is stored in the candidate information at the current time.
∼ h t represents the reserved data at the moment of the update. h t is the current output vector, which is a combination of the reserved vector ∼ h t and the output vector updated through the output gate at the previous moment.

Settings
All models were entirely created in Python using the Keras framework. The experimental environment consisted of a 64-bit Windows 10 operating system, an Intel Core i5-10500H processor, and a 2.5 GHz main frequency. All experiments in this work were conducted in the same settings.
As shown in Figure 7, the experimental data used for this research come from the Henry Hub natural gas spot prices of the United States EIA from 7 January 1997 to 13 September 2022. They are quoted in USD per million btu [39]. The experimental data were divided into three parts: 70% of the data were used as the training set, 20% of the data were used as the validation set, and 10% of the data were used for the test. The spot prices of Henry Hub natural gas from the past ten working days were used to predict the data for the next two weekdays. MAPE, MAE, and RMSE were used as performance indicators in this experiment to evaluate the prediction accuracy of the model. These definitions are as follows: is the i-th predicted value and is the i-th true value. As shown in Formulas (12)- (14), the MAE value is calculated by subtracting the predicted value from the true value and averaging the sum of the absolute values. denotes the duration of the time series.  MAPE, MAE, and RMSE were used as performance indicators in this experiment to evaluate the prediction accuracy of the model. These definitions are as follows:

Comparison of Prediction Results of Several Deep Learning Models
∼ y i is the i-th predicted value and y i is the i-th true value. As shown in Formulas (12)- (14), the MAE value is calculated by subtracting the predicted value from the true value and averaging the sum of the absolute values. n denotes the duration of the time series.

Comparison of Prediction Results of Several Deep Learning Models
To assess the influence of the TCN model on natural gas spot price prediction, comparative models such as LSTM, GRU, and 1D-CNN were employed. The MAPE, MAE, and RMSE of different deep learning models' forecasts of natural gas prices for the next weekday are included in Table 1. Due to the volatility of the model training effect, each model was trained ten times, and the average was used to calculate the final MAPE, MAE, and RMSE. Each model was debugged regularly to produce the best prediction performance. As demonstrated in Table 1, TCN shows the best predictive performance of all the test indicators, as indicated by the lowest MAPE, MAE, and RMSE. The traceability curve was used to illustrate the performance of TCN on other model predictions. Figure 8 illustrates the traceability comparison curve. Figure 8a,b show the curve-tracking effect of natural gas spot price in the plateau and the period of fluctuation, respectively. TCN's traceability curve approximates the true natural gas price well and provides the best traceability in areas where natural gas prices fluctuate, followed by GRU, 1D-CNN, and LSTM. Interestingly, 1D-CNN underperformed LSTM when gas prices were low, and LSTM underperformed 1D-CNN when gas prices were high. In comparison, TCN and GRU demonstrated good performance at all times. test indicators, as indicated by the lowest MAPE, MAE, and RMSE. The traceability curve was used to illustrate the performance of TCN on other model predictions. Figure 8 illustrates the traceability comparison curve. Figure 8a,b show the curve-tracking effect of natural gas spot price in the plateau and the period of fluctuation, respectively. TCN's traceability curve approximates the true natural gas price well and provides the best traceability in areas where natural gas prices fluctuate, followed by GRU, 1D-CNN, and LSTM. Interestingly, 1D-CNN underperformed LSTM when gas prices were low, and LSTM underperformed 1D-CNN when gas prices were high. In comparison, TCN and GRU demonstrated good performance at all times.

Multi-Step Prediction Performance and Elapsed Time in Several Models
To analyze the multi-step forecasting performance of different models, Table 2 counts the performance of several deep learning models, simultaneously predicting the price of natural gas in the next five days. The error in the table is the average absolute percentage error not affected by the price fluctuation of natural gas itself. As the time step increased, all model prediction errors became more significant to varying degrees. It is worth noting that the error growth slowed down as the time step increased, which may explain why the model could always learn some characteristics of natural gas price fluctuations, even in multi-step forecasts. However, the overall error of multi-step prediction gradually increased, and the reliability gradually deteriorated.

Multi-Step Prediction Performance and Elapsed Time in Several Models
To analyze the multi-step forecasting performance of different models, Table 2 counts the performance of several deep learning models, simultaneously predicting the price of natural gas in the next five days. The error in the table is the average absolute percentage error not affected by the price fluctuation of natural gas itself. As the time step increased, all model prediction errors became more significant to varying degrees. It is worth noting that the error growth slowed down as the time step increased, which may explain why the model could always learn some characteristics of natural gas price fluctuations, even in multi-step forecasts. However, the overall error of multi-step prediction gradually increased, and the reliability gradually deteriorated.  Figure 9 is a scatter diagram of natural gas prices predicted by the TCN model for the next four days, where the distribution of predicted values can be seen more clearly. Figure 9a-d are scatter diagrams of natural gas price forecasts for the next 1 day, 2 days, 3 days, and 4 days, respectively. As the forecast step size increases, the points in the scatter plot gradually depart from the center line, indicating that the difference between the predicted and actual values grows. An increasing number of points pass the purple dotted line with a MAPE of 10%. The similarity coefficient R 2 of TCN's prediction results on the second day is more significant than 0.8, so the prediction results on the second day can be generally considered to be reliable. second day is more significant than 0.8, so the prediction results on the second day can be generally considered to be reliable.  Table 2 finally counts the elapsed time of several deep learning models. The 1D-CNN ran the fastest because it only contained convolution operations. TCN had a slightly longer elapsed time than 1D-CNN due to the addition of residual connections. LSTM ran the longest due to the recursive operation mode and complex gating unit. GRU had one  Table 2 finally counts the elapsed time of several deep learning models. The 1D-CNN ran the fastest because it only contained convolution operations. TCN had a slightly longer elapsed time than 1D-CNN due to the addition of residual connections. LSTM ran the longest due to the recursive operation mode and complex gating unit. GRU had one less gating unit than LSTM, and the running time was less than LSTM. GRU had one less gating unit than LSTM, and the elapsed time was less than LSTM but much longer than TCN.

Ablation Experiment with Dynamic Learning Rate Setting
Each time the model relearned the optimal parameters, it was possible to obtain a local optimal result rather than a global optimal one. The dynamic learning rate ensured that the model converged rapidly during training and did not easily maintain the local optimal value. This chapter discusses the design of an ablation experiment used to assess the effect of the dynamic learning rate. The epoch was set to 150, and after 100 epochs, the learning rate decayed to 0.4 times its initial value. Following a significant number of experimental tests, the best starting learning rate of TCN was set at 0.00005, while that of 1D-CNN, GRU, and LSTM was set to 0.0001. The ablation experiment of the dynamic learning rate setting was utilized to better illustrate the effect of the dynamic learning rate setting using TCN as an example. The findings of the ablation experiment on dynamic learning rate are reported in Table 3. Notably, the results in the table represent the average effect of ten training sessions. "Proposed" indicates that the settings applied in this paper contain dynamic learning rate settings, whereas "w/o DLR" indicates that the dynamic learning rate settings are omitted. As demonstrated in Table 3, MAE, MAPE, and RMSE have different degrees of variation when the dynamic learning rate is removed. The nearly doubled MAPE indicates that the dynamic learning rate can indeed steadily enhance the model's robustness.

Performance from Current Gas Price Forecast Studies
This study compares the forecast findings of eight studies to demonstrate the general predictive performance of existing natural gas price forecasting studies. Although each study's data sets are inconsistent, the comparison allows for a qualitative analysis of gas price forecasts. Table 4 summarizes the comparison results. The model proposed in this paper outperforms half of the studies, showing lower error. Some studies' performance metrics are better than those presented in this paper. Specifically, Livieris et al.'s study may highlight the advantages of ensemble models. The input data for the study by Mouchtaris et al. consists of twenty-one explanatory variables, which may improve the precision of natural gas price forecasting. By merging ARIMA with artificial neural networks, Siddiqui et al. obtained good prediction results. On the basis of the integrated model, Naderi et al. added a meta-heuristic bat optimization algorithm to enhance the accuracy of natural gas price predictions. To sum up, most of the related research improves the prediction accuracy by increasing the input variables or the complexity of the model. In reality, it is difficult to guarantee the high quality and reliability of all input variables. In contrast, the model suggested in this research is not a complex integrated model and can achieve excellent prediction performance by utilizing only a single historical natural gas spot price. It is worth noting that the test data in this paper contain violent fluctuations affected by the La Niña phenomenon in 2021 and the conflict between Russia and Ukraine. However, the research in this paper can still maintain a good prediction performance, which is enough to prove the accuracy and reliability of the study in this paper.

Discussion
The result of Section 4.2 demonstrates that, when compared to the other three deep learning models, TCN shows a greater performance in natural gas spot price prediction, as indicated by the lowest MAPE, MAE, and RMSE. Even when the dynamic learning rate setting was used, the predictive performances of LSTM and GRU were relatively low. While both LSTM and GRU represent advancements and advantages over traditional RNNs, this does not mean that they are equally suitable for forecasting natural gas spot prices. Additionally, both LSTM and GRU use a recursive time series learning mode, which increases the time required for model training, and LSTM is more time-consuming.
The 1D-CNN extracts features from the input time series via convolution, which significantly reduces model training time as compared with the recursive time series prediction model. Although CNN is not favored for time series prediction, numerous studies have found it to be effective. For instance, the improved CNN presented by the Google team for voice synthesis prediction [26] and the improved CNN proposed by the Facebook team for translation [43] have demonstrated promising results.
Natural gas spot prices are influenced by a variety of factors [44], and it is difficult to forecast some anomalous conditions created by major political events, economic turmoil, or natural disasters. TCN is a relatively new non-recursive time series prediction model that utilizes the residual block structure of the dilated causal convolutions kernel to improve the model's training speed and effect. The real-world experimental comparisons demonstrate TCN's advantage in predicting natural gas spot prices.
The necessity and effectiveness of the dynamic learning rate setting were demonstrated by comparing and deleting the change in MAPE before and after the dynamic learning rate setting. The dynamic learning rate effectively prevented the model training from falling into the local optimal solution and had an effect on the model's prediction performance, allowing the model to identify the optimal solution more quickly and consistently. Even further, the dynamic learning rate setting enhanced the model's predictive ability to a greater extent. Compared with some existing prediction models, it was revealed that, with the exception of a few studies due to the advantages of data set selection, the prediction results were superior to those outlined in this study, and the majority of existing studies produced results that are inferior to our work. This demonstrates that the data set chosen can have an effect on the model's predictive ability, as well as demonstrating that this research produces accurate predictions.

Conclusions
Accurate natural gas spot price forecasts are critical for personal and corporate investment and the energy market's stability. Our work aimed to propose a new natural gas spot price prediction model capable of forecasting the natural gas spot price for the next two weekdays with greater accuracy. This new natural gas spot price prediction model is structured by TCN, where the dilated causal convolutions significantly reduce the number of complex calculations necessary for model training, saving significant time in comparison to LSTM and GRU. The dilated causal convolutions expand the model's receptive field, allowing it to extract characteristics for a longer time, significantly improving the perfor-Energies 2023, 16, 2321 13 of 15 mance of natural gas spot price prediction. The residual block structure in TCN ensures the deep network's predictive capacity and boosts the model's nonlinear representation capability. The dynamic learning rate successfully overcomes the issue of incorrectly setting the model's learning rate, which has a detrimental effect on the accuracy and robustness of the model's prediction, while also improving the model's prediction performance. Daily spot prices for natural gas in the United States from January 1997 to September 2022 confirmed the proposed model's forecasting ability. TCN showed the lowest error in natural gas spot price predictions when compared to the other three deep learning models (LSTM, GRU, and 1D-CNN) on real data sets, demonstrating exceptional prediction results. The novel TCN-based natural gas price spot forecasting model has demonstrated superior forecasting performance and good robustness, making it an excellent candidate for natural gas spot price forecasting. For future exploration, an automatic optimization approach for model parameters will be examined for incorporation into the model in order to alleviate a significant amount of time-consuming hyperparameter modification effort.

Conflicts of Interest:
The authors declare no conflict of interest.