Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures

Jiang, Qiuli; Lin, Zebei; Hu, Jiao; Liu, Xuhui

doi:10.3390/app152011169

Open AccessArticle

Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures

by

Qiuli Jiang

^*,

Zebei Lin

,

Jiao Hu

and

Xuhui Liu

School of Economics, Management and Law, Jilin Normal University, Siping 136000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 11169; https://doi.org/10.3390/app152011169

Submission received: 23 September 2025 / Revised: 12 October 2025 / Accepted: 16 October 2025 / Published: 18 October 2025

Download

Browse Figures

Versions Notes

Abstract

As a core clean energy source in the global energy transition, natural gas price fluctuations directly affect the energy market supply demand balance, industrial chain cost control, etc. Thus, accurate natural gas price prediction is crucial for market participants’ decision making and policymakers’ regulation. To tackle the issue that traditional single models fail to capture data patterns of the New York Mercantile Exchange (NYMEX) natural gas futures daily prices—due to their nonlinearity, high volatility, and multi-scale features—this study proposes a hybrid model: VMD-CNN-BiLSTM-attention, integrating Variational Mode Decomposition (VMD), Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and an attention mechanism. A one-step to four-step forecasting comparison was conducted using NYMEX natural gas futures daily closing prices, with the proposed model vs. CNN-BiLSTM-Attention and Autoregressive Integrated Moving Average (ARIMA) models. The empirical results show that the VMD-CNN-BiLSTM-attention model outperforms the comparison models in terms of Mean Absolute Percentage Error (MAPE), Mean Absolute Error (MAE), etc. Specifically, its four-step forecast MAPE stays ≤3.5% and

R^{2}

≥ 98%, demonstrating a stronger ability to capture complex price fluctuations, better accuracy, and stability than traditional single models and deep learning models without VMD, and provides reliable technical support for short-to-medium-term natural gas price prediction.

Keywords:

VMD-CNN-BiLSTM-attention model; variational mode decomposition (VMD); natural gas; price forecasting

1. Introduction

Natural gas, a relatively clean fossil fuel with lower carbon emissions than coal and oil, has emerged as a pivotal cornerstone in the global energy transition toward low-carbon sustainability [1]. Its extensive applications include residential heating, industrial production, and power generation, making it indispensable for balancing energy security, economic development, and environmental protection. Against the backdrop of accelerating global decarbonization endeavors, the demand for natural gas has witnessed a substantial surge. Globally, natural gas consumption accounts for approximately 24% of primary energy consumption, while in China, its proportion has escalated from a negligible level to 8.4% in recent years, with projections indicating further growth [2,3,4,5,6]. Nevertheless, this growing dependence is accompanied by significant price volatility, which poses formidable challenges to energy market stability, industrial investment planning, and household energy expenditures [7,8,9,10].

The fluctuation in natural gas prices stems from the complex interaction of multiscale and multidimensional factors, thereby positioning it as a focal point of research in the field of energy economics. Essentially, the dynamics of supply and demand, such as fluctuations in natural gas production, consumption, and inventory levels, as endogenous factors (i.e., the standard dynamics between supply and demand), exert a direct impact on short-term price oscillations [11]. Geopolitical events and unforeseen incidents further amplify the price uncertainty. For example, the Russia-Ukraine conflict disrupted Europe’s conventional natural gas supply chains, precipitating a surge in imports of U.S. liquefied natural gas and restructuring of global maritime transportation networks, which in turn induced sharp fluctuations in regional market prices [12].

Therefore, the accurate forecasting of natural gas prices is of paramount importance to multiple stakeholders. For policymakers, reliable forecasts can serve as a foundation for formulating energy security strategies such as adjusting import structures or optimizing reserve capacities. Market participants (e.g., producers and traders) facilitate risk management and investment decision making, mitigating losses arising from price volatility. For households and industries, stable price expectations are conducive to cost planning and adjustments to energy consumption patterns [13].

Currently, there has been extensive exploration of natural gas price forecasting. The well-established classical statistical methods in this field include the Autoregressive Moving Average (ARMA) model [14,15], Autoregressive Integrated Moving Average (ARIMA) model [16,17,18], Principal Component Analysis (PCA) [19], and Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model [20]. For instance, Ruslan et al. analyzed prices using a series of univariate GARCH models, which provided a reference for regulators and investors in predicting market volatility and formulating relevant strategies. However, natural gas prices are affected by multiple complex factors, such as geopolitics, supply-demand relationships, and weather changes, and their fluctuations often exhibit strong nonlinear and nonstationary characteristics. This makes it difficult for classical statistical methods to fully capture the complex dynamics of price changes, leading to significant limitations in forecasting accuracy.

To overcome the constraints of traditional statistical techniques, scholars have introduced artificial intelligence-based methods, including the Structural Heterogeneous Autoregressive Vector Autoregression (SHVAR) model [21], Support Vector Machines (SVM) [22], and boosting algorithms [23]. For instance, Su et al. employed an advanced least-squares regression boosting algorithm to predict natural gas prices. By optimizing the regression model, this algorithm remarkably enhanced the predicted R2 value, which is the coefficient of determination, indicating that the model fits the data extremely well. Simultaneously, the algorithm also decreased the Mean Absolute Error (MAE), indicating that the gap between the projected and factual values was effectively narrowed.

However, in practical forecasting, a single model can only capture specific data patterns. When dealing with complex datasets, especially natural gas price data, which are influenced by multiple factors and exhibit highly nonlinear and dynamically volatile characteristics, they are highly prone to overfitting the details in the training data (including random noise and outliers). By contrast, hybrid models that integrate multiple methods can effectively compensate for the limitations of a single approach. Consequently, model combinations have gained increasing popularity and are widely applied to address various challenges in natural gas price forecasting. For natural gas price forecasting, a hybrid approach was utilized by Wang et al., which fuses the complete ensemble empirical mode decomposition with adaptive noise-sample entropy (CEEMDAN–SE) and a Gated Recurrent Unit (GRU) network optimized through a Particle Swarm Optimization algorithm incorporating an Adaptive Learning Strategy (PSO–ALS–GRU). This model exhibits superior performance for analyzing long-term dependencies and addressing complex nonlinear problems [24]. But lacks a component to extract local subtle features (e.g., short-term price correlations with inventory data), as GRU networks focus primarily on global temporal trends rather than local fine-grained patterns [25]. Wang and Wang combined Long Short-Term Memory (LSTM), Wavelet Packet Decomposition (WPD), and stochastic time-effective weights (SW) to construct a new hybrid model (WPD-SW-LSTM). They used single models such as SVM, Back Propagation Neural Network (BPNN), LSTM, and their hybrid models for comparison and improved a new error measurement method to evaluate prediction results, achieving high-precision forecasting of oil futures prices [26]. While the stochastic weights improved robustness, the unidirectional LSTM cannot capture backward temporal dependencies (e.g., how future price reversals affect the interpretation of current trends), limiting its understanding of sequential context [25]. Jiang et al. proposed a hybrid prediction model based on Fuzzy Entropy Variational Mode Decomposition (VMD), WPD, and LSTM, targeting the complexity and nonlinear characteristics of natural gas production and consumption data, and demonstrated that its performance is significantly superior to other comparable models with certain practical value [27]. Although multi-step decomposition enhanced multiscale feature extraction, the model lacks a dynamic weighting mechanism for key features—e.g., it assigns equal importance to trivial signals (e.g., minor daily demand fluctuations) and critical signals (e.g., geopolitical event shocks), leading to suboptimal focus on core influencing factors [28].

Beyond these specific gaps, two broader limitations persist in existing hybrid models: (1) Most decomposition methods (e.g., CEEMDAN, PSO) suffer from mode mixing (i.e., overlapping frequency components in decomposed modes), which distorts feature extraction [29]; (2) Few models integrate both local feature capture and bidirectional dependency mining—two capabilities critical for accurately modeling daily gas prices, which are influenced by both short-term local events and long-term global trends.

To fill the aforementioned gaps, this study proposes a novel hybrid model—VMD-CNN-BiLSTM-Attention—integrating Variational Mode Decomposition (VMD), Convolutional Neural Network (CNN), Bidirectional LSTM (BiLSTM), and an attention mechanism. Its key innovations (directly addressing the gaps in existing literature) are:

(1): Adaptive decomposition with VMD: Unlike CEEMDAN or PSO, VMD decomposes price series into intrinsic mode functions (IMFs) with minimal mode mixing [29], enabling more accurate multiscale feature extraction and laying a solid foundation for subsequent modeling.
(2): Local feature capture with CNN: CNN is introduced to extract local correlations (e.g., short-term price-volume relationships, weekly demand cycles) that are overlooked by GRU/LSTM-based models—compensating for the local feature gap in Wang et al.’s [24] CEEMDAN–SE–PSO–GRU.
(3): Bidirectional dependency mining with BiLSTM: Replacing unidirectional LSTM/GRU with BiLSTM allows the model to mine both forward (past→present) and backward (future→present) temporal dependencies—addressing the sequential context limitation in Wang and Wang’s [26] WPD-SW-LSTM.
(4): Dynamic feature weighting with attention mechanism: The attention mechanism assigns higher weights to critical features (e.g., geopolitical shocks, inventory shortages) and lower weights to trivial signals—solving the equal-weighting problem in Jiang et al.’s [27] VMD-WPD-LSTM.

In addition to methodological innovations, this study aims to enhance the practical value of gas price forecasting: by improving accuracy and robustness, the model can provide more reliable decision support for policymakers (e.g., refining emergency reserve policies), market participants (e.g., optimizing hedging strategies), and households (e.g., rationalizing energy consumption plans)—ultimately contributing to the stability of the global natural gas market and the advancement of low-carbon energy transitions.

2. Materials and Methods

2.1. Variational Mode Decomposition (VMD)

Variational Mode Decomposition (VMD) is an adaptive nonrecursive signal decomposition algorithm. It breaks down a nonstationary and nonlinear signal into a collection of band-limited intrinsic mode functions (IMFs). These IMFs exhibit distinct frequency characteristics [30,31]. The core idea is to construct and solve a variational problem to minimize the total bandwidth of all modes while ensuring that the sum of the modes is equal to the original signal.

Variational Problem Formulation

For a given signal, VMD aims to decompose it into K modes with the corresponding center frequencies. The definition of the variational problem is given by Equation (1).

\{\begin{cases} \min_{{u_{k}}, {ω_{k}}} \{\sum_{k = 1}^{K} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2}\} \\ s . t . \sum_{k = 1}^{K} u_{k} (t) = f (t) \end{cases}

(1)

where

δ (t)

is the Dirac function, ∗ denotes the convolution, j is the imaginary unit, and

\partial_{t}

represents the time derivative. The objective function minimizes the bandwidth of each mode via the Hilbert transform and frequency shifting, whereas the constraint ensures that the modes reconstruct the original signal.

Unconstrained Variational Formulation

To solve the constrained problem, the quadratic penalty term

α

and Lagrange multiplier

λ (t)

are introduced and transformed into an unconstrained Lagrangian, as shown in Equation (2).

L ({u_{k}}, {w_{k}}, λ) = α \sum_{k = 1}^{K} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j w_{k} t}‖}_{2}^{2} + {‖f (t) - \sum_{k = 1}^{K} u_{k} (t)‖}_{2}^{2} + 〈λ (t), f (t) - \sum_{k = 1}^{K} u_{k} (t)〉

(2)

VMD outperforms empirical mode decomposition (EMD) by avoiding mode mixing and boundary effects, making it robust for nonstationary signals such as energy prices [32,33]. Its adaptive frequency segmentation and mathematical rigor enable the precise extraction of transient and trend components, thereby facilitating subsequent prediction tasks.

2.2. Convolutional Neural Networks (CNN)

The Convolutional Neural Network (CNN) module is a key component in the aforementioned hybrid model for natural gas price prediction and is primarily responsible for extracting local spatial features from time-series data [34,35]. Its core function is to perform sliding computations on input one-dimensional sequences (such as IMF components of natural gas prices or original price sequences) via convolutional kernels (filters), capture the correlation patterns between adjacent data points, and provide more representative features for subsequent time-series modeling.

1D Convolution Operation

For the input time-series data of natural gas prices (such as IMF components or original sequences), let the input sequence be

X \in ℝ^{T \times C}

, where T is the number of time steps, and C is the number of input channels.

The convolution kernel is defined as

K \in ℝ^{k \times C \times F}

, where:

k is the size of the convolution kernel, which represents the local time window scanned each time.

F is the number of convolution kernels, where each kernel captures a specific local pattern.

The output feature map

Y \in ℝ^{T^{'} \times F}

from the convolution operation is calculated using Equation (3).

Y [p, f] = σ (\sum_{c = 1}^{C} \sum_{i = 0}^{k - 1} X [p + i, c] \cdot K [i, c, f] + b [f])

(3)

Max Pooling Operation

After convolution, a max pooling layer is used to compress the features, reduce redundancy, and enhance robustness. Let s be the size of the pooling window be s. The output feature map

P \in ℝ^{T \times F}

is calculated using Equation (4).

P [q, f] = \max (Y [q \cdot s, f], Y [q \cdot s + 1, f], \dots, Y [q \cdot s + s - 1, f])

(4)

2.3. Bidirectional Long Short-Term Memory (BiLSTM)

Long Short-Term Memory (LSTM) represents a specialized form of Recurrent Neural Network (RNN), characterized by the inclusion of LSTM units, specifically designed to tackle the vanishing gradient problem that plagues traditional RNN architectures [36]. The Working principle diagram of LSTM is shown in Figure 1.

However, LSTM has significant limitations, whereas Bidirectional Long Short-Term Memory (BiLSTM), by virtue of the bidirectional temporal modeling mechanism, has achieved breakthroughs over traditional LSTM in multiple aspects.

The BiLSTM model consisted of two LSTM layers: a forward LSTM layer and a backward LSTM layer. Each LSTM unit contains a forget gate, input gate, and output gate. These gating mechanisms control the degree of information flow using a sigmoid function. The Working principle diagram of BiLSTM is shown in Figure 2.

The forget gate determines which information in the cell state

C_{t - 1}

at the previous moment should be forgotten and which should be retained. The formula is shown in (5):

f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f})

(5)

where

x_{t}

is the input vector at time t, representing features related to the natural gas price at the current moment (such as the previous day’s price and market trading volume),

h_{t - 1}

is the hidden state of the LSTM unit at the previous time step;

W_{f}

,

U_{f}

are weight matrices,

b_{f}

is the bias vector, and

σ

is the Sigmoid activation function.

The input gate determines new information inflow into the cell state, comprising two parts: an input control vector via sigmoid activation (controlling new information inflow) and a candidate cell state

{\tilde{c}}_{t}

via tanh activation (providing new information). The calculation formulas are as follows:

i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i})

(6)

{\tilde{c}}_{t} = \tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c})

(7)

where

W_{i}

,

W_{c}

,

U_{i}

, and

U_{c}

are weight matrices, and

b_{i}

and

b_{c}

are bias vectors.

The output gate determines the amount of information from the cell state that must be output to the hidden state

h_{t}

.

It generates an output control vector

o_{t}

via sigmoid activation and then element-wise multiplies the tanh-processed cell state with

o_{t}

to obtain the hidden state

h_{t}

. The calculation formulae are as follows:

o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o})

(8)

h_{t} = o_{t} ⊙ \tanh (c_{t})

(9)

where

W_{o}

and

U_{o}

are weight matrices, and

b_{o}

is a bias vector.

The core mechanism of BiLSTM lies in the bidirectional processing of input sequences, and the forward LSTM layer processes data sequentially from the start to the end of the sequence, whereas the backward LSTM layer processes data in reverse order from the end to the start of the sequence. At each time step, the hidden states from both directions are concatenated and integrated, and the forward hidden state

h_{t}

and backward hidden state

h_{t}^{'}

are concatenated to form the final output

o_{t}

. The formula is shown in (10):

o_{t} = [h_{t}; h_{t}^{'}]

(10)

2.4. Attention Mechanism Module

In the hybrid model architecture, the attention mechanism serves as a bridge connecting time-series features and prediction targets, thereby enhancing the model’s ability to capture nonlinear patterns by dynamically focusing on critical information [37,38]. The core idea of the attention mechanism is to represent the degree of attention that a model pays to different parts of the input data using weights (or scores). These weights are typically calculated in a learnable manner to reflect the importance of different input parts for the current task. The working principle diagram of the attention mechanism is shown in Figure 3.

The mathematical model of the attention mechanism usually comprises the following steps:

Calculating weights

For a given query and set of keys, the similarity or relevance scores between them were calculated. The formula is shown in (11):

e_{t} = \tanh (W_{h} \cdot h_{t} + b_{h})

(11)

Normalization

The Softmax function was used to normalize the scores to obtain the weight corresponding to each key. The formula is shown in (12):

α_{t} = \frac{\exp (ε_{t})}{\sum_{τ = 1}^{T} \exp (ε_{τ})}

(12)

Weighted summation

Weighted summation is performed on the corresponding values according to the weights to obtain the final attention output. The formula is shown in (13):

c = \sum_{t = 1}^{T} α_{t} \cdot h_{t}

(13)

where

e_{t}

denotes the attention score,

α_{t}

denotes the normalized attention weight, and c denotes the weighted context vector.

3. Natural Gas Price Prediction Model Based on VMD-CNN-BiLSTM-Attention

3.1. Data Preparation

This study focuses on the natural gas futures contract (trading symbol: NG) listed on the New York Mercantile Exchange (NYMEX), a cornerstone instrument in the global natural gas derivatives market that carries unparalleled significance in both price benchmarking and market liquidity [39,40,41,42]. As the most actively traded natural gas futures contract worldwide, it serves as a critical pricing anchor for market participants, including producers, utilities, traders, and financial institutions, by providing transparent price signals that guide investment decisions, risk-management strategies, and long-term supply contracts. Its enduring influence stems from a unique combination of institutional recognition, deep market participation (with average daily trading volumes often exceeding hundreds of thousands of contracts), and well-established clearing and settlement mechanism that ensures market integrity.

To ensure robustness and generalizability, two datasets were utilized in this research. The first dataset consists of the daily closing prices of NYMEX natural gas futures, which forms the main data source for model training and evaluation. The second dataset comprises Henry Hub natural gas spot prices, which were employed for robustness verification. The NYMEX futures price captures the expectations of market participants in derivative trading, while the Henry Hub spot price represents the actual physical delivery market, reflecting real supply–demand conditions. Although these two price series are correlated, they exhibit distinct volatility and market behaviors, enabling a comprehensive evaluation of model stability across different market environments.

The daily NYMEX price data were obtained from Wind Information (https://www.wind.com.cn/ (accessed on 16 August 2025)) and cover the period from 3 April 1990, to 17 June 2025, totaling 8816 valid observations. For consistency, a chronological time-based partitioning strategy was applied to both datasets, allocating approximately 80% of the observations to the training set and the remaining 20% to the out-of-sample test set. The training subset spans 3 April 1990, to 30 April 2018 (7023 samples), while the test subset covers 1 May 2018, to 17 June 2025 (1793 samples). For the Henry Hub dataset, a comparable split was used: 7 January 1997–7 January 2018 for training, and 8 January 2018–7 August 2024 for testing.

To avoid information leakage, all normalization parameters (e.g., minimum and maximum values for min–max scaling) were computed exclusively from the training subset and subsequently applied to the test subset. Similarly, the VMD decomposition, feature extraction, and network parameter tuning were conducted only within the training phase. The Henry Hub robustness experiment was implemented as a fully independent re-training process using the same model architecture and hyperparameter configuration as the main experiment.

This partitioning approach ensures that the model strictly follows the time-series forecasting principle of “predicting the unseen future based on historical information” and that the reported results accurately represent out-of-sample predictive performance.

3.2. Data Normalization

To reduce the impact of noise on the natural gas market and improve the prediction accuracy, we normalized the NYMEX prices [43]. Multiple normalization methods can enhance neural network training, including the approach adopted in this study, which scales data to the range of [0, 1] using the following equation:

x_{scaled} = \frac{x - \min (X)}{\max (X) - \min (X)}

(14)

3.3. Inverse Normalized Value

The standardized processing of natural gas prices results in negative values for some data, which are actually the relative value of the standardized mean and not the true meaning of the original data. Therefore, inverse normalization operations must be performed when practical meaning and magnitude of the data are required [44]. The formula for this calculation is as follows:

x_{original} = x_{scaled} \times (\max (X) - \min (X)) + \min (X)

(15)

3.4. Model Construction

As shown in Figure 4, the flow chart clearly presents the core working logic of the model. Starting with raw natural gas price data as input, it first decomposes the price sequence into multiple Intrinsic Mode Function (IMF) components using VMD technology, completing multi-scale feature extraction and data preprocessing. Subsequently, the process branches into two main paths: one normalizes each IMF component independently, while the other directly processes the original price sequence. Both paths feed the data into a deep model composed of a 1D convolutional layer, max pooling layer, BiLSTM layer, attention mechanism, flattening layer, and a fully connected layer. Finally, through the model’s mining of sequence features, natural gas price prediction results were output, intuitively demonstrating the complete workflow of “data preprocessing, feature extraction, and model prediction”. The model parameters are shown in Table 1, and a portion of the code is provided in the Supplementary Materials.

3.5. Evaluation Metrics

In the construction of natural gas price prediction models, this study employs a classical evaluation indicator system to conduct systematic testing of model performance, ensuring the reliability and validity of prediction results. The specific indicators selected include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (

R^{2}

). Through collaborative analysis of multi-dimensional indicators, the prediction accuracy and goodness of fit of the model are quantified from different perspectives, comprehensively evaluating the model’s prediction capability and stability.

The aforementioned metrics form a comprehensive evaluation framework from multiple dimensions, including the relative error, absolute error, goodness of fit, and trend consistency. This framework ensures that the model can accurately characterize short-term fluctuations in natural gas prices while effectively capturing long-term trends. The relevant results provide a scientific basis for energy market analysis and decision making, with the following specific formulas:

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(16)

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(17)

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(18)

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 %

(19)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(20)

4. Results and Discussion

The VMD algorithm cannot be directly applied to the decomposition of daily natural gas price sequences. This is because its input requires not only the original daily natural gas price data, but also the number of modes (K), whose value must be determined beforehand. The existing literature has proposed various methods for determining the value of K [45,46]. In research on the VMD of energy prices (e.g., natural gas, crude oil) and financial time series, the selection of K must balance the effectiveness of feature decomposition and model efficiency. In academic and engineering practice, K is usually restricted to a reasonable range of 3–7.

For daily frequency data, K = 5 is a commonly adopted intermediate value that has been validated in numerous empirical studies [47,48]. To further justify this choice, a sensitivity analysis was conducted with different mode numbers (K = 3, 5, and 7). As summarized in Table 2, increasing K led to slightly improved forecasting performance, with the average RMSE decreasing from 0.1965 (K = 3) to 0.1288 (K = 7), and

R^{2}

increasing from 98.50% to 99.26%. However, when K exceeds 5, the number of decomposed Intrinsic Mode Function (IMF) components increases considerably, which substantially raises model training costs and complexity, while the incremental improvement in accuracy becomes marginal.

Therefore, K = 5 provides an appropriate balance between feature extraction integrity and computational feasibility. It allows the model to capture multi-scale characteristics of natural gas price dynamics—from short-term high-frequency fluctuations (e.g., intraday supply–demand disturbances) to medium- and long-term variations (e.g., seasonal cycles and macroeconomic influences)—without introducing redundant components. Accordingly, this study adopted K = 5 as the mode number for VMD decomposition. The decomposition results for daily NYMEX prices are shown in Figure 5.

As shown in Table 3, this decomposition process eliminates redundant information between components, while preserving all fluctuation scales of practical significance, confirming that K = 5 can optimally decompose the original price sequence. This preprocessing step not only mitigates the non-stationarity issue by converting complex raw data into a set of stationary and interpretable IMF components but also lays a foundation for targeted feature extraction in the subsequent VMD-CNN-BiLSTM-attention prediction model.

4.1. Evaluations of Various Models

As shown in Table 4, from the one-step to four-step-ahead forecasting results, the VMD-CNN-BiLSTM-attention model consistently outperforms the CNN-BiLSTM-attention model across all evaluation metrics, confirming the effectiveness of VMD-based signal preprocessing in improving short-term forecasting accuracy. In the one-step-ahead forecasting task, the VMD-CNN-BiLSTM-attention model achieves an MSE of 0.0169, RMSE of 0.1301, MAE of 0.0823, MAPE of 2.45%, and

R^{2}

of 99.35%. In contrast, the CNN-BiLSTM-attention model exhibits higher error levels (MSE = 0.0325, RMSE = 0.1802, MAE = 0.1152, MAPE = 3.13%) and a lower fitting degree (

R^{2}

= 98.75%).

When the forecasting horizon is extended to two-, three-, and four-step-ahead, both models show a trend of gradually increasing errors and a slight decrease in

R^{2}

, which is consistent with the inherent challenge of uncertainty accumulation in multistep time series forecasting. However, the VMD-CNN-BiLSTM-attention model still maintains superior stability and accuracy. Specifically, in the two-step-ahead task, it achieves MSE = 0.0213, RMSE = 0.1460, MAE = 0.0941, MAPE = 2.79%, and

R^{2}

= 99.18%. In the three-step-ahead task, it records MSE = 0.0247, RMSE = 0.1572, MAE = 0.1013, MAPE = 2.99%, and

R^{2}

= 99.05%. Even in the four-step-ahead task, it remains at a low MAPE of 3.46% and a high

R^{2}

of 98.81%, whereas the CNN-BiLSTM-attention model’s MAPE increases to 6.21% and its

R^{2}

drops to 95.74%.

This performance gap between the proposed hybrid model and the benchmarks suggests that VMD effectively decomposes the original price sequence into IMFs with distinct frequency characteristics. This enables the subsequent deep learning framework to more accurately capture frequency-specific fluctuation patterns (e.g., ultra-short-term volatility and short-term supply–demand disturbances) while mitigating high-frequency noise interference in long-term trend learning. This finding is consistent with the concept proposed by Zhang et al. (2025) [49], which demonstrated that integrating multiple feature representations enhances model robustness and forecasting stability. Consequently, it offers a more reliable solution for multistep short-term price forecasting.

To clarify the multi-step forecasting protocol, we adopt a “separate model for each prediction horizon” strategy. For forecasting horizons ranging from one-step to four-step ahead, we do not use a recursive approach (i.e., predicting the next step first and then feeding that prediction back to predict subsequent steps). Instead, we train an independent model for each specific forecasting horizon. Specifically, when generating data sequences for training and testing, given an input sequence length of 14 days (as in our case), we construct training samples aimed at predicting each future day n (where n ranges from 1 to 4). Then, distinct CNN-BiLSTM-attention models (with or without VMD pre-processing) are trained for each n. Each of these models takes a historical sequence of the same length as input but is optimized to directly output the prediction for its designated future day. This ensures that each model focuses on learning the patterns specific to its respective forecasting horizon, rather than relying on potentially error—prone recursive predictions.

4.2. Discussion

As shown in Figure 6, the prediction curve of the VMD-CNN-BiLSTM-attention model exhibited the highest degree of fit with the actual price curve. This visualization result strongly corresponds to the conclusion of “this model, demonstrating the optimal prediction performance” derived from the quantitative analysis of error metrics in the early stage, further confirming its reliability and superiority in the short-term prediction of NYMEX natural gas futures prices.

In the comparative experiment on 1–4-day short-term price prediction, an in-depth analysis was conducted on the performance differences between the VMD-CNN-BiLSTM-attention and CNN-BiLSTM-attention models. The results indicate that in the 1-th Day Ahead prediction scenario, although both models can initially capture the temporal evolution trend of prices, the hybrid model integrated with VMD performs better. Its core advantage lies in the implementation of multiscale mode decomposition on the nonstationary futures price sequence through VMD technology, which effectively separates and mitigates noise interference and fluctuation coupling in the original data. This provides purer and more representative input features for the subsequent local feature extraction by the CNN module, long-term dependency capture by the BiLSTM module, and key information focusing by the attention mechanism, ultimately achieving a higher degree of fitting between the prediction curve and actual price.

When the prediction horizon is extended to 2–4 days, the prediction errors of both models show an increasing trend, which conforms to the general rule of “positive correlation between prediction horizon and error” in the field of time series prediction. However, the comparison reveals that the error growth rate of the VMD-CNN-BiLSTM-Attention model is significantly lower. Particularly in the price periodic fluctuation scenarios covered in the dataset (such as medium-term fluctuations driven by seasonal supply-demand imbalances and short-term pulse fluctuations caused by unexpected event shocks), its advantage in predicting price inflection points is more prominent.

5. Robustness Test

5.1. The Various Forecasting Time Frames

To validate the stability of the proposed natural gas price prediction models across different time horizons, this study extends the prediction steps to 5-step, 6-step, 7-step, and 8-step ahead, and conducts a comparative evaluation of the VMD-CNN-BiLSTM-attention model and the CNN-BiLSTM-attention model using five key metrics: MSE, RMSE, MAE, MAPE, and

R^{2}

.

As presented in Table 5, the VMD-CNN-BiLSTM-attention model consistently outperformed the CNN-BiLSTM-attention model across all the prediction horizons. Specifically, for the VMD-CNN-BiLSTM-attention model, its MSE remains at 0.0288–0.0428, RMSE at 0.1696–0.2068, MAE at 0.1103–0.1367, MAPE at 3.24–3.92%, and

R^{2}

at 98.89–98.36%; in contrast, the CNN-BiLSTM-attention model exhibited significantly higher error metrics (MSE: 0.1388–0.2143, RMSE: 0.3726–0.4630, MAE: 0.2472–0.3097, MAPE: 7.03–8.89%) and a lower

R^{2}

(91.78–94.67%). Additionally, as the prediction horizon extends from 5-step to 8-step ahead, both models show a slight deterioration in prediction accuracy (e.g., MSE and MAPE increase marginally), but the VMD-CNN-BiLSTM-attention model maintains a more stable and superior performance. This indicates that the integration of VMD effectively enhances the model’s ability to capture long-term temporal dependencies in the natural gas price series, thereby improving its robustness across extended prediction horizons.

5.2. Alternative Proxy Variables for Natural Gas Prices

In this section, an alternative proxy variable for natural gas prices, specifically, the Henry Hub Natural Gas Spot Price, is examined. This data-partitioning approach aligns with the prevailing convention of allocating four-fifths of the data for training and one-fifth for testing. As can be seen from the evaluation results in Table 6, within the 1-step to 4-step forecasting horizon, the prediction performance of the VMD-CNN-BiLSTM-attention model is significantly superior to that of the CNN-BiLSTM-attention model.

5.3. A Comparative Analysis of the Prognostication Horizon Dimensions

Rossi and Inoue (2012) argued that rigidly defining and selecting prediction horizon dimensions may lead to marked differences in out-of-sample results, thereby emphasizing that choosing an optimal window size is critical for an effective evaluation in this context [50]. When defining and partitioning the time range of the dataset in this study, the standard practice principle of “4/5 for training and 1/5 for testing” in time series forecasting was strictly followed. The training set consisted of 5101 samples from 4 January 2000 to 18 May 2020, which was used for the parameter learning and training process of the model. The test set included 1274 samples from 19 May 2020 to 27 June 2025, and it was employed to independently verify the generalization ability of the model. This partitioning logic ensures compliance with the core requirement of time-series analysis, that is, “predicting the future based on historical data,” and effectively avoids the risk of data leakage.

As shown in Table 7, in the 1-step to 4-step forecasting tasks, the prediction performance of the VMD-CNN-BiLSTM-attention model was significantly superior to that of the CNN-BiLSTM-attention model. In terms of error metrics, the former exhibits lower values in MSE, RMSE, MAE, and MAPE than the latter. In terms of goodness of fit, the

R^{2}

of the former was higher than that of the latter, indicating a stronger ability to capture data patterns. Meanwhile, both types of models showed a consistent trend: as the forecasting horizon increased, the model’s prediction error increased slightly, and the goodness of fit decreased slightly. This is in line with the general rule in time series forecasting that “short-term forecasting accuracy is higher than long-term forecasting accuracy.”

5.4. Comparative Analysis of Forecasting Performance: Benchmarked Against ARIMA

To comprehensively evaluate the predictive performance of the VMD-CNN-BiLSTM-attention and CNN-BiLSTM-attention models, this study selected the Autoregressive Integrated Moving Average (ARIMA) model as the benchmark model (Table 6). The ARIMA model is widely applied and highly recognized in the field of time series forecasting, with its core advantage being its ability to effectively capture the trend and seasonal characteristics of data, thus providing a reliable reference for subsequent comparisons of the performance of other models [51,52,53].

Table 8 presents the prediction results of each model with ARIMA as the benchmark, covering key evaluation metrics such as the MSE and MAE. By comparing these metrics, this study conducted a comprehensive and objective analysis of the applicability, advantages, and disadvantages of the VMD-CNN-BiLSTM-Attention, CNN-BiLSTM-Attention, and ARIMA models on a specific dataset. This not only provides empirical evidence for subsequent model selection optimization and parameter tuning but also offers references for researchers to explore the application value of different models in practical scenarios. Additionally, through parameter optimization calculations, this study determined that the optimal parameters of the ARIMA model were (2, 1, 5).

As shown in Table 6, all three types of models exhibit a trend where “the longer the forecasting horizon, the slight increase in errors and slight decrease in goodness of fit”—a pattern that aligns with the general rule in time series forecasting that “short-term forecasting accuracy is higher than long-term forecasting accuracy.” However, the VMD-CNN-BiLSTM-Attention model exhibited the smallest attenuation magnitude. It is the only model that can maintain the (MAPE)within 3.5% even in the 4-step forecasting, making it more suitable for short-to-medium-term price prediction needs in practical business scenarios.

6. Conclusions

This study focuses on accurately predicting the daily prices of NYMEX natural gas futures. Addressing the limitation that traditional single models struggle to handle the nonlinearity, high volatility, and multi-scale characteristics of price series, this study proposes a hybrid prediction model, VMD-CNN-BiLSTM-attention, which integrates Variational Mode Decomposition (VMD), Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (BiLSTM), and an attention mechanism. Using daily closing price data from 3 April 1990 to 27 June 2025, comparative experiments for one-step to four-step forecasting were conducted between the proposed model and two benchmark models (CNN-BiLSTM-Attention and ARIMA). The key conclusions are as follows.

The VMD-CNN-BiLSTM-attention model significantly improved the prediction accuracy and stability

The empirical results demonstrate that the proposed VMD-CNN-BiLSTM-Attention model consistently outperforms the benchmark CNN-BiLSTM-Attention model across all key performance indicators, including MSE, RMSE, MAE, and MAPE. In the one-step to four-step forecasting horizons, the proposed model maintains remarkably low prediction errors, with MAPE values ranging from 2.45% to 3.46% and forecasting accuracy remaining above 98.81%. Moreover, the model exhibits the smallest performance degradation as the forecasting horizon increases: its MAPE increases by only 1.01 percentage points (from 2.45% in one-step forecasting to 3.46% in four-step forecasting), which is significantly lower than that of the CNN-BiLSTM-Attention model (3.08 percentage points, from 3.13% to 6.21%). These findings confirm that the proposed hybrid model possesses a stronger capability to capture the complex nonlinear dynamics of natural gas prices and provides more stable and reliable short- to medium-term forecasting performance. The synergy of multi-technical integration is the core source of the model’s advantages.

These results imply that incorporating signal decomposition and feature-fusion mechanisms can substantially enhance the adaptability of forecasting systems to nonstationary energy markets. This has important implications for energy trading strategies, risk management, and operational planning, where timely and precise short-term forecasts can reduce financial uncertainty and support data-driven decision-making.

The functional complementarity of the components of the model significantly enhances the prediction effect. VMD decomposes the original price series into multiple stable Intrinsic Mode Functions (IMFs), effectively separating the interference of fluctuations at different frequencies and reducing the impact of data nonlinearity on the model; CNN accurately extracts local features of each IMF (e.g., short-term price mutation signals); BiLSTM captures long-term temporal dependencies of the series (e.g., medium-term price trends); The attention mechanism assigns higher weights to key time steps, further strengthening the model’s ability to focus on core driving information of price fluctuations.

This “decomposition–feature extraction–temporal modeling–key information enhancement” pipeline demonstrates a systematic framework that can be generalized to other financial or commodity markets exhibiting multi-frequency volatility. In particular, it provides methodological insights for integrating data-driven deep learning with signal-processing theory to handle multi-scale time series.

This integrated design addresses the shortcoming of traditional models—“easily missing key patterns when dealing with complex series alone”—and is the fundamental reason for its leading performance.

2.: Research limitations and future optimization directions

This study constructs a model based only on single-series price data and does not incorporate external factors that may affect natural gas prices (e.g., U.S. EIA inventory data, weather indices, and the linkage effect of crude oil prices). In the future, multisource features can be introduced to further improve the generalization ability of the model. Meanwhile, the number of VMD modes (set to five in this study) is determined empirically; future research can optimize mode division through adaptive algorithms or combine the attention mechanism to weighted fuse the prediction results of different IMFs, thereby further exploring the value of multimodal data.

Expanding the model to include cross-market and exogenous indicators will also enhance its applicability to dynamic energy systems forecasting, contributing to more resilient and data-informed policy and investment decisions.

The relatively high

R^{2}

values reported in Table 2 can be explained by the characteristics of the dataset and the model design. Specifically, the model uses the original price series as input rather than log returns or differenced values. Natural gas prices exhibit strong autocorrelation and trend persistence; therefore, the high

R^{2}

primarily reflects the model’s ability to track the level persistence of prices rather than to capture short-term volatility. The VMD component effectively removes high-frequency noise and preserves the main trend, while the CNN layer extracts local features and filters out short-term fluctuations. As a result, the predicted series is a smoothed curve that closely overlaps with the actual price level, which naturally increases the

R^{2}

metric—although the RMSE remains a more reliable indicator of predictive accuracy. Furthermore, the data normalization to the [0, 1] range compresses the variance, which causes the ratio of residual to total variation in

R^{2}

computation to approach unity

In conclusion, the VMD-CNN-BiLSTM-attention model provides an efficient and feasible technical solution for the short-to-medium-term forecasting of natural gas futures prices. The design concept of multi-technical integration also offers a reference for forecasting research on other highly nonlinear time series (e.g., crude oil and electricity prices), demonstrating strong promotional significance.

Supplementary Materials

The supplementary materials for this study are hosted on Kaggle Notebook. https://www.kaggle.com/code/liuxuhui/notebookdb70dc3973 (accessed on 12 September 2025).

Author Contributions

Writing—original draft preparation, Z.L.; conception and writing—review and editing, Q.J.; methodology, J.H.; data collection, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by the Jilin Provincial Department of Science and Technology, with the project number YDZJ202501ZYTS600.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

We are very grateful to the academic editors and reviewers for their valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, C.; Lin, M.; Feng, M.; Liu, H.; Yang, Z.; Zhang, G.; Yang, C.; Guan, C.; Liang, Y.; Wang, Y.; et al. Development, challenges and strategies of natural gas industry under carbon neutral target in China. Pet. Explor. Dev. 2024, 51, 476–497. [Google Scholar] [CrossRef]
Salehnia, N.; Falahi, M.A.; Seifi, A.; Adeli, M.H.M. Forecasting natural gas spot prices with nonlinear modeling using Gamma test analysis. J. Nat. Gas Sci. Eng. 2013, 14, 238–249. [Google Scholar] [CrossRef]
Jing, H.; Zhang, S.; Zhao, D.; Wang, Z.; Liao, J.A.; Li, Z. Seismic Disaster Risk Assessment of Oil and Gas Pipelines. Appl. Sci. 2025, 15, 9135. [Google Scholar] [CrossRef]
Tang, Y.; Chen, X.H.; Sarker, P.K.; Baroudi, S. Asymmetric effects of geopolitical risks and uncertainties on green bond markets. Technol. Forecast. Soc. Change 2023, 189, 122348. [Google Scholar] [CrossRef]
Pavlík, M.; Kurimský, F.; Ševc, K. Renewable Energy and Price Stability: An Analysis of Volatility and Market Shifts in the European Electricity Sector (2015–2025). Appl. Sci. 2025, 15, 6397. [Google Scholar] [CrossRef]
Wang, T.; Zhang, D.; Broadstock, D.C. Financialization, fundamentals, and the time-varying determinants of US natural gas prices. Energy Econ. 2019, 80, 707–719. [Google Scholar] [CrossRef]
Karkowska, R.; Urjasz, S. How does the Russian-Ukrainian war change connectedness and hedging opportunities? Comparison between dirty and clean energy markets versus global stock indices. J. Intern. Financ. Markets Instit. Money 2023, 85, 101768. [Google Scholar] [CrossRef]
Zhang, S.; Wang, L. The Russia-Ukraine war, energy poverty, and social conflict: An analysis based on global liquified natural gas maritime shipping. Appl. Geogr. 2024, 166, 103263. [Google Scholar] [CrossRef]
Zhao, Q.; Li, H.; Zhang, Q.; Wang, Y. A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN. Appl. Sci. 2025, 15, 9792. [Google Scholar] [CrossRef]
Zhu, M.; Qi, H.; Qin, P. IGWO-MALSTM: An Improved Grey Wolf-Optimized Hybrid LSTM with Multi-Head Attention for Financial Time Series Forecasting. Appl. Sci. 2025, 15, 6619. [Google Scholar] [CrossRef]
Kong, J.; Zhao, X.; He, W.; Yang, X.; Jin, X. EL-MTSA: Stock Prediction Model Based on Ensemble Learning and Multimodal Time Series Analysis. Appl. Sci. 2025, 15, 4669. [Google Scholar] [CrossRef]
Yu, L.; Wang, Z.; Tang, L. A decomposition–ensemble model with data-characteristic-driven reconstruction for crude oil price forecasting. Appl. Energy 2015, 156, 251–267. [Google Scholar] [CrossRef]
Latif, M.; Herawati, S. The application of eemd and neural network based on polak-ribiére conjugate gradient algorithm for crude oil prices forecasting. In Proceedings of the MATEC Web of Conferences, Online, 23 May 2016; p. 03013. [Google Scholar]
Wang, Q.; Li, S.; Li, R.; Ma, M. Forecasting US shale gas monthly production using a hybrid ARIMA and metabolic nonlinear grey model. Energy 2018, 160, 378–387. [Google Scholar] [CrossRef]
Banaś, J.; Utnik-Banaś, K. Evaluating a seasonal autoregressive moving average model with an exogenous variable for short-term timber price forecasting. For. Policy Econ. 2021, 131, 102564. [Google Scholar] [CrossRef]
Mulwa, D.; Kazuzuru, B.; Misinzo, G.; Bett, B. Forecasting and Intervention Time Series Analysis Using Autoregressive Integrated Moving Average (ARIMA) Models: Evaluating the impact of 2018 and 2021 Rift Valley Fever Outbreaks on Kenyan food Price Index. Soc. Sci. Humanit. Open 2024. [Google Scholar] [CrossRef]
Sung, J.; Shi, X.; Teske, S.; Li, M. Chinese natural gas phase-out pathways: A novel hybrid scenario-specific projection approach to achieve Net Zero. Energy 2025, 328, 136387. [Google Scholar] [CrossRef]
Lai, Y.; Dzombak, D.A. Use of the autoregressive integrated moving average (ARIMA) model to forecast near-term regional temperature and precipitation. Weather Forecast. 2020, 35, 959–976. [Google Scholar] [CrossRef]
Wei, X.; Ouyang, H. Carbon price prediction based on a scaled PCA approach. PLoS ONE 2024, 19, e0296105. [Google Scholar] [CrossRef]
Ruslan, S.M.M.; Mokhtar, K. Stock market volatility on shipping stock prices: GARCH models approach. J. Econ. Asymm. 2021, 24, e00232. [Google Scholar] [CrossRef]
Hailemariam, A.; Smyth, R. What drives volatility in natural gas prices? Energy Econ. 2019, 80, 731–742. [Google Scholar] [CrossRef]
Mouchtaris, D.; Sofianos, E.; Gogas, P.; Papadimitriou, T. Forecasting natural gas spot prices with machine learning. Energies 2021, 14, 5782. [Google Scholar] [CrossRef]
Su, M.; Zhang, Z.; Zhu, Y.; Zha, D. Data-driven natural gas spot price forecasting with least squares regression boosting algorithm. Energies 2019, 12, 1094. [Google Scholar] [CrossRef]
Wang, J.; Cao, J.; Yuan, S.; Cheng, M. Short-term forecasting of natural gas prices by using a novel hybrid method based on a combination of the CEEMDAN-SE-and the PSO-ALS-optimized GRU network. Energy 2021, 233, 121082. [Google Scholar] [CrossRef]
Shiri, F.M.; Perumal, T.; Mustapha, N.; Mohamed, R. A comprehensive overview and comparative analysis on deep learning models. J. Artif. Intell. 2024, 6, 301–360. [Google Scholar] [CrossRef]
Wang, J.; Wang, J. A New Hybrid Forecasting Model Based on SW-LSTM and Wavelet Packet Decomposition: A Case Study of Oil Futures Prices. Comput. Intell. Neurosci. 2021, 2021, 7653091. [Google Scholar] [CrossRef]
Jiang, S.; Zhao, X.-T.; Li, N. Predicting the monthly consumption and production of natural gas in the USA by using a new hybrid forecasting model based on two-layer decomposition. Environ. Sci. Pollut. Res. 2023, 30, 40799–40824. [Google Scholar] [CrossRef] [PubMed]
Wang, P.Y.; Chen, C.T.; Su, J.W.; Wang, T.Y.; Huang, S.H. Deep learning model for house price prediction using heterogeneous data analysis along with joint self-attention mechanism. IEEE Access 2021, 9, 55244–55259. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, Y. Application of hybrid model based on CEEMDAN, SVD, PSO to wind energy prediction. Environ. Sci. Pollut. Res. 2022, 29, 22661–22674. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Lu, Q.; Liao, J.; Chen, K.; Liang, Y.; Lin, Y. Predicting natural gas prices based on a novel hybrid model with variational mode decomposition. Comput. Econ. 2024, 63, 639–678. [Google Scholar] [CrossRef]
Li, J.; Wu, Q.; Tian, Y.; Fan, L. Monthly Henry Hub natural gas spot prices forecasting using variational mode decomposition and deep belief network. Energy 2021, 227, 120478. [Google Scholar] [CrossRef]
Huang, L.; Yang, X.; Lai, Y.; Zou, A.; Zhang, J. Crude Oil Futures Price Forecasting Based on Variational and Empirical Mode Decompositions and Transformer Model. Mathematics 2024, 12, 4034. [Google Scholar] [CrossRef]
Ren, G.; Wang, Y.; Shi, Z.; Zhang, G.; Jin, F.; Wang, J. Aero-engine remaining useful life estimation based on CAE-TCN neural networks. Appl. Sci. 2022, 13, 17. [Google Scholar] [CrossRef]
Jiangyan, Z.; Ma, J.; Wu, J. A regularized constrained two-stream convolution augmented transformer for aircraft engine remaining useful life prediction. Eng. Appl. Artif. Intell. 2024, 133, 108161. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Larochelle, H.; Hinton, G.E. Learning to combine foveal glimpses with a third-order Boltzmann machine. Adv. Neural Inf. Process. Syst. 2010, 23, 1243–1251. [Google Scholar]
Mnih, V.; Heess, N.; Graves, A.; Kavukcuoglu, K. Recurrent models of visual attention. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 2, pp. 2204–2212. [Google Scholar]
Fernandes, L.H.; de Araújo, F.H.; Silva, I.E. The (in) efficiency of nymex energy futures: A multifractal analysis. Phys. A 2020, 556, 124783. [Google Scholar] [CrossRef]
Niu, H.; Wang, J. Return volatility duration analysis of NYMEX energy futures and spot. Energy 2017, 140, 837–849. [Google Scholar] [CrossRef]
Delagrammatikas, G. Offshore wind farm in Southeast Aegean Sea. Master’s Thesis, University of Piraeus, Piraeus, Greece, 2021. [Google Scholar]
Wu, M.-E.; Chen, T.-C.; Chung, C.-P.; Li, G.-R.; Chiang, D.-W.; Yang, D.-Y. PASS: Portfolio Analysis of Selecting Strategies on quantitative trading via NSGA-II. Eng. Optim. 2024, 1–28. [Google Scholar] [CrossRef]
Fang, T.; Zheng, C.; Wang, D. Forecasting the crude oil prices with an EMD-ISBM-FNN model. Energy 2023, 263, 125407. [Google Scholar] [CrossRef]
Zhu, H.; Hao, H.-K.; Lu, C. Enhanced support vector machine-based moving regression strategy for response prediction and reliability estimation of complex structure. Aerosp. Sci. Technol. 2024, 155, 109634. [Google Scholar] [CrossRef]
Jianwei, E.; Bao, Y.; Ye, J. Crude oil price analysis and forecasting based on variational mode decomposition and independent component analysis. Phys. A 2017, 484, 412–427. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, K.; Qin, L.; An, X. Deterministic and probabilistic interval prediction for short-term wind power generation based on variational mode decomposition and machine learning methods. Energy Convers. Manag. 2016, 112, 208–219. [Google Scholar] [CrossRef]
Cui, S.; Lyu, S.; Ma, Y.; Wang, K. Improved informer PV power short-term prediction model based on weather typing and AHA-VMD-MPE. Energy 2024, 307, 132766. [Google Scholar] [CrossRef]
Zeng, W.; Zhou, P.; Wu, Y.; Wu, D.; Xu, M. Multicavitation States Diagnosis of the Vortex Pump Using a Combined DT-CWT-VMD and BO-LW-KNN Based on Motor Current Signals. IEEE Sens. J. 2024, 24, 30690–30705. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Q.; Hu, Y.; Liu, H. Multi-feature stock price prediction by LSTM networks based on VMD and TMFG. J. Big Data 2025, 12, 74. [Google Scholar] [CrossRef]
Rossi, B.; Inoue, A. Out-of-sample forecast tests robust to the choice of window size. J. Busin. Econ. Stat. 2012, 30, 432–453. [Google Scholar] [CrossRef]
Kong, L.; Li, G.; Rafique, W.; Shen, S.; He, Q.; Khosravi, M.R.; Wang, R.; Qi, L. Time-aware missing healthcare data prediction based on ARIMA model. IEEE/ACM Trans. Comput. Biol. Bioinf. 2022, 21, 1042–1050. [Google Scholar] [CrossRef]
Zhong, W.; Zhai, D.; Xu, W.; Gong, W.; Yan, C.; Zhang, Y.; Qi, L. Accurate and efficient daily carbon emission forecasting based on improved ARIMA. Appl. Energy 2024, 376, 124232. [Google Scholar] [CrossRef]
Zeng, H.; Zhang, H.; Guo, J.; Ren, B.; Cui, L.; Wu, J. A novel hybrid STL-transformer-ARIMA architecture for aviation failure events prediction. Reli. Eng. Syst. Safety 2024, 246, 110089. [Google Scholar] [CrossRef]

Figure 1. Working principle diagram of the LSTM.

Figure 2. Working principle diagram of BiLSTM.

Figure 3. Working principle diagram of the attention mechanism.

Figure 4. Workflow Diagram of the VMD-CNN-BiLSTM-Attention Model.

Figure 5. Daily NYMEX prices using the VMD method.

Figure 6. Partial predicted values of VMD-CNN-BiLSTM-Attention and CNN-BiLSTM-Attention. at the one—(a), two—(b), three—(c), and four—(d) step levels.

Table 1. Detailed Configuration of the VMD-CNN-BiLSTM-Attention Model.

Layer/Module	Parameters/Configuration
Input	Shape = (n_in, 1)
Conv1D	Filters = 64, Kernel size = 3, Activation = ReLU, Padding = same
MaxPooling1D	Pool size = 2
BiLSTM	Units = 128, Return sequences = True
Dropout	Rate = 0.2
Attention	Custom time-step attention
Dense1	64 units, Activation = ReLU
Dense2 (Output)	1 unit

Table 2. VMD mode number (K) sensitivity analysis.

K	RMSE	MAPE	$R^{2}$
3	0.1965	3.66%	98.50%
5	0.1711	3.49%	98.85%
7	0.1288	2.66%	99.35%

Table 3. IMF Characteristics Obtained by VMD Decomposition.

Intrinsic Mode Function (IMF)	Frequency Level	Corresponding Significance for Price Fluctuation
IMF 1	Highest	Captures ultra-short-term noise, such as intraday trading frictions (e.g., short-term imbalance between buy and sell orders) and instantaneous news shocks (e.g., sudden market rumors)
IMF 2	Second Highest	Reflects short-term supply and demand disturbances, including weekly inventory reports (e.g., unexpected increase or decrease in natural gas inventories) and short-term weather events (e.g., sudden cold waves)
IMF 3	Medium	Embodies seasonal cycles (1–3 months), consistent with monthly production plan adjustments (e.g., fine-tuning of gas field production capacity) and seasonal demand changes (e.g., winter heating stockpiling)
IMF 4	Second Lowest	Represents medium- and long-term trends (6–12 months), driven by quarterly macroeconomic indicators (e.g., GDP growth rate) and policy adjustments (e.g., changes in natural gas export regulations)
IMF 5	Lowest	Covers long-term structural trends, such as cross-annual energy policy adjustments (e.g., revision of carbon neutrality goals) and global economic cycles (e.g., economic recession and recovery)

Table 4. Comparison of the prediction results obtained from different models.

One-Step Forecasting	MSE	RMSE	MAE	MAPE	$R^{2}$
VMD-CNN-BiLSTM-Attention	0.0169	0.1301	0.0823	2.45%	99.35%
CNN-BiLSTM-Attention	0.0325	0.1802	0.1152	3.13%	98.75%
Two-step forecasting
VMD-CNN-BiLSTM-Attention	0.0213	0.1460	0.0941	2.79%	99.18%
CNN-BiLSTM-Attention	0.0573	0.2394	0.1543	4.40%	97.79%
Three-step forecasting
VMD-CNN-BiLSTM-Attention	0.0247	0.1572	0.1013	2.99%	99.05%
CNN-BiLSTM-Attention	0.0837	0.2892	0.1890	5.32%	96.78%
Four-step forecasting
VMD-CNN-BiLSTM-Attention	0.0309	0.1759	0.1165	3.46%	98.81%
CNN-BiLSTM-Attention	0.1107	0.3328	0.2201	6.21%	95.74%

Table 5. Evaluations of forecasts from various prediction models across different time frames.

Five-Step Forecasting	MSE	RMSE	MAE	MAPE	$R^{2}$
VMD-CNN-BiLSTM-Attention	0.0288	0.1696	0.1103	3.24%	98.89%
CNN-BiLSTM-Attention	0.1388	0.3726	0.2472	7.03%	94.67%
Six-step forecasting
VMD-CNN-BiLSTM-Attention	0.0337	0.1835	0.1192	3.46%	98.71%
CNN-BiLSTM-Attention	0.1664	0.4079	0.2730	7.65%	93.61%
Seven-step forecasting
VMD-CNN-BiLSTM-Attention	0.0373	0.1930	0.1268	3.75%	98.57%
CNN-BiLSTM-Attention	0.1925	0.4387	0.2949	8.48%	92.61%
Eight-step forecasting
VMD-CNN-BiLSTM-Attention	0.0428	0.2068	0.1367	3.92%	98.36%
CNN-BiLSTM-Attention	0.2143	0.4630	0.3097	8.89%	91.78%

Table 6. Evaluation forecasts generated by various prediction models for Henry Hub Natural.

One-Step Forecasting	MSE	RMSE	MAE	MAPE	$R^{2}$
VMD-CNN-BiLSTM-Attention	0.1695	0.4117	0.1332	3.84%	95.00%
CNN-BiLSTM-Attention	0.3968	0.6298	0.1812	5.21%	87.72%
Two-step forecasting
VMD-CNN-BiLSTM-Attention	0.1751	0.4185	0.1374	3.83%	94.42%
CNN-BiLSTM-Attention	0.5012	0.7040	0.2463	6.82%	84.38%
Three-step forecasting
VMD-CNN-BiLSTM-Attention	0.1945	0.4411	0.1491	4.21%	93.83%
CNN-BiLSTM-Attention	0.6530	0.8080	0.3210	9.97%	78.54%
Four-step forecasting
VMD-CNN-BiLSTM-Attention	0.2212	0.4705	0.1555	4.43%	93.11%
CNN-BiLSTM-Attention	0.6618	0.8135	0.3249	9.71%	79.46%

Table 7. Comparison of forecasts generated by various models using different time frames for predictions.

One-Step Forecasting	MSE	RMSE	MAE	MAPE	$R^{2}$
VMD-CNN-BiLSTM-Attention	0.0286	0.1691	0.1161	3.28%	99.10%
CNN-BiLSTM-Attention	0.0504	0.2244	0.1569	4.46%	98.42%
Two-step forecasting
VMD-CNN-BiLSTM-Attention	0.0292	0.1708	0.1189	3.34%	99.08%
CNN-BiLSTM-Attention	0.0851	0.2917	0.1994	5.53%	97.33%
Three-step forecasting
VMD-CNN-BiLSTM-Attention	0.0338	0.1837	0.1266	3.60%	98.83%
CNN-BiLSTM-Attention	0.1388	0.3726	0.2587	7.12%	96.14%
Four-step forecasting
VMD-CNN-BiLSTM-Attention	0.0432	0.2079	0.1447	4.19%	98.64%
CNN-BiLSTM-Attention	0.1684	0.4104	0.2890	7.75%	94.71%

Table 8. Evaluation of various forecasting models against alternative benchmarks.

One-Step forecasting	MSE	RMSE	MAE	MAPE	$R^{2}$
VMD-CNN-BiLSTM-Attention	0.0169	0.1301	0.0823	2.45%	99.35%
CNN-BiLSTM-Attention	0.0325	0.1802	0.1152	3.13%	98.75%
ARIMA	0.5160	0.7184	0.4919	13.02%	83.80%
Two-step forecasting
VMD-CNN-BiLSTM-Attention	0.0213	0.1460	0.0941	2.79%	99.18%
CNN-BiLSTM-Attention	0.0573	0.2394	0.1543	4.40%	97.79%
ARIMA	0.5556	0.7454	0.5103	13.49%	82.55%
Three-step forecasting
VMD-CNN-BiLSTM-Attention	0.0247	0.1572	0.1013	2.99%	99.05%
CNN-BiLSTM-Attention	0.0837	0.2892	0.1890	5.32%	96.78%
ARIMA	0.5979	0.7733	0.5313	14.03%	81.22%
Four-step forecasting
VMD-CNN-BiLSTM-Attention	0.0309	0.1759	0.1165	3.46%	98.81%
CNN-BiLSTM-Attention	0.1107	0.3328	0.2201	6.21%	95.74%
ARIMA	0.6385	0.7991	0.5499	14.50	79.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiang, Q.; Lin, Z.; Hu, J.; Liu, X. Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures. Appl. Sci. 2025, 15, 11169. https://doi.org/10.3390/app152011169

AMA Style

Jiang Q, Lin Z, Hu J, Liu X. Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures. Applied Sciences. 2025; 15(20):11169. https://doi.org/10.3390/app152011169

Chicago/Turabian Style

Jiang, Qiuli, Zebei Lin, Jiao Hu, and Xuhui Liu. 2025. "Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures" Applied Sciences 15, no. 20: 11169. https://doi.org/10.3390/app152011169

APA Style

Jiang, Q., Lin, Z., Hu, J., & Liu, X. (2025). Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures. Applied Sciences, 15(20), 11169. https://doi.org/10.3390/app152011169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Application of the VMD-CNN-BiLSTM-Attention Model in Daily Price Forecasting of NYMEX Natural Gas Futures

Abstract

1. Introduction

2. Materials and Methods

2.1. Variational Mode Decomposition (VMD)

2.2. Convolutional Neural Networks (CNN)

2.3. Bidirectional Long Short-Term Memory (BiLSTM)

2.4. Attention Mechanism Module

3. Natural Gas Price Prediction Model Based on VMD-CNN-BiLSTM-Attention

3.1. Data Preparation

3.2. Data Normalization

3.3. Inverse Normalized Value

3.4. Model Construction

3.5. Evaluation Metrics

4. Results and Discussion

4.1. Evaluations of Various Models

4.2. Discussion

5. Robustness Test

5.1. The Various Forecasting Time Frames

5.2. Alternative Proxy Variables for Natural Gas Prices

5.3. A Comparative Analysis of the Prognostication Horizon Dimensions

5.4. Comparative Analysis of Forecasting Performance: Benchmarked Against ARIMA

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI