Crude Oil Prices Forecast Based on Mixed-Frequency Deep Learning Approach and Intelligent Optimization Algorithm

Precisely forecasting the price of crude oil is challenging due to its fundamental properties of nonlinearity, volatility, and stochasticity. This paper introduces a novel hybrid model, namely, the KV-MFSCBA-G model, within the decomposition–integration paradigm. It combines the mixed-frequency convolutional neural network–bidirectional long short-term memory network-attention mechanism (MFCBA) and generalized autoregressive conditional heteroskedasticity (GARCH) models. The MFCBA and GARCH models are employed to respectively forecast the low-frequency and high-frequency components decomposed through variational mode decomposition optimized by Kullback–Leibler divergence (KL-VMD). The classification of these components is performed using the fuzzy entropy (FE) algorithm. Therefore, this model can fully exploit the advantages of deep learning networks in fitting nonlinearities and traditional econometric models in capturing volatilities. Furthermore, the intelligent optimization algorithm and the low-frequency economic variable are introduced to improve forecasting performance. Specifically, the sparrow search algorithm (SSA) is employed to determine the optimal parameter combination of the MFCBA model, which is incorporated with monthly global economic conditions (GECON) data. The empirical findings of West Texas Intermediate (WTI) and Brent crude oil indicate that the proposed approach outperforms other models in evaluation indicators and statistical tests and has good robustness. This model can assist investors and market regulators in making decisions.


Introduction
Energy is crucial for promoting national prosperity, improving well-being, and ensuring social stability.The use of energy has caused the transformation of human production technology and significantly promoted the development level of productivity.As global industrialization continues, crude oil, a critical nonrenewable energy product, has become an essential elemental energy, chemical feedstock, and strategic resource, affecting world stability, national economic development, and enterprise decisions.
Crude oil exhibits generic commodity qualities as well as financial and political features [1].Its price volatility can be attributed to various economic and noneconomic factors, further compounded by the collective impact of market and nonmarket forces.Previous research revealed that crude oil prices are influenced by economic variables, including production, consumption, settlement currency, alternative energy prices, and global economic conditions indexes [2][3][4]; financial market factors, such as financial market trading characteristics, international hot money speculation, and exchange rate changes; and other nonmarket factors, such as geopolitical conflicts, extreme climate change, and energy technology progress [5,6].International crude oil price forecasting is an essential issue of the economic research of energy.The diversity and complexity of the influencing factors significantly increase the difficulty of accurate forecasting.
International crude oil prices have experienced significant fluctuations over the past 20 years, characterized by rapid rises and sharp falls, shortened boom cycles, and fewer smooth transition periods [7].In 2001, the Internet bubble burst, and the global economy slowed down.From September to November, the price of Brent crude oil fell from USD 29.43/barrel to USD 17.68/barrel, with a maximum decline of 40%.The financial crisis of 2008 collapsed the prices from USD 140/barrel in June to approximately USD 33/barrel in December, a decline of more than 76% in 6 months.Crude oil prices declined and bottomed out in early 2016 owing to the shale oil revolution.Oil prices plunged by 42% in 2018 as the United States (US)-China trade war escalated, and oil producers continued to increase production in response to the supply gap caused by US sanctions on Iran.In 2020, with the global new coronavirus outbreak, the global energy demand fell by approximately 4.5%, and oil demand fell by an unprecedented 9.3%.As the market panic spread, the settlement price of West Texas Intermediate (WTI) crude oil futures in May was USD −37.63/barrel, closing negative for the first time in history.After 2021, the widespread use of COVID-19 vaccines and government economic stimulus measures drove up the energy demand and global energy prices.International oil prices increased by over 60% in 2021.
As economic globalization deepens, energy plays an increasingly pivotal role in supporting economic development and social stability.The accurate forecasting of crude oil prices is crucial for market analysts, investors, policymakers, and enterprises.This forecasting not only offers valuable insights into market dynamics but also informs the formulation of energy policies and guides investment decisions and strategies.Therefore, developing a scientific and more accurate method for predicting crude oil prices is imperative.Early methods employed for crude oil price forecasting primarily relied on econometric models.With the rapid development of artificial intelligence and big data methods, machine learning techniques have been widely used.In recent years, scholars have found that the decomposition-integration technique can further enhance the accuracy and robustness of prediction by decomposing complex nonlinear, high-volatility, and irregular time series data into multiple sub-series that are easier to process and predict, and then predicting these sub-series separately and finally integrating the results.For the components derived from the decomposition, existing studies have applied and refined various machine learning methods for forecasting.However, only a few studies have combined deep learning methods with traditional econometric models, but in fact, each of them has its own advantages and disadvantages [8].Traditional econometric models often possess strong explanatory power.For instance, the generalized autoregressive conditional heteroskedasticity (GARCH) model effectively captures heteroskedasticity in time-series data, aiding in understanding the mechanisms behind fluctuations in crude oil prices.However, these models are constrained by their reliance on assumptions such as residual normal distribution and linear relationships.In contrast, deep learning networks excel at handling large-scale data and complex nonlinear relationships.Nevertheless, their internal mechanisms are challenging to interpret, and they exhibit sensitivity to hyperparameters while being difficult to optimize.In addition, most studies have only used past price information for future prediction, ignoring the influence of low-frequency economic variables.Actually, the introduction of monthly economic variables will bring about the issue of mixed-frequency forecasting.
Therefore, under the decomposition-integration paradigm, following the principle of data-driven modeling [9,10], we construct a novel hybrid model to predict crude oil prices accurately, which combines mixed-frequency deep learning approaches, the traditional econometric model, and the intelligent optimization algorithm.This study seeks to leverage the combined methods to fully exploit their respective strengths, thereby effectively enhancing the predictive accuracy of the model.
The primary contributions of the paper are as follows: Entropy 2024, 26, 358 3 of 27 (1) In this paper, the deep learning approach and GARCH model are integrated to accurately predict the low-frequency and high-frequency mode components derived from decomposition.Thus, the proposed model effectively combines the strengths of deep learning and traditional econometric models, demonstrating superior predictive accuracy compared to other models.The convolutional neural network-bidirectional long short-term memory network-attention mechanism (CBA) model has a long-term memory capability, effectively illustrating the bidirectional characteristics and multilevel saliency factors, and is a good fit for nonlinear series [11].Furthermore, the GARCH model can well portray short-term volatility clustering [12].(2) In this paper, the idea of mixed-frequency (MF) prediction is incorporated into the deep learning method, and then the mixed-frequency long short-term memory (MFLSTM) and MFCBA models are constructed to predict each low-frequency component.Incorporating the monthly low-frequency global economic conditions (GECON) index into the deep learning model by the mixed-data sampling (MIDAS) technique can significantly enhance forecast accuracy.(3) In this paper, Kullback-Leibler (KL) divergence is used to determine the optimal combination of the variational mode decomposition (VMD) method's number of decomposition layers K and penalty factor α rather than relying on subjective judgment, resulting in a more efficient and robust decomposition and improved prediction performance.In addition, whether the LSTM and CBA parameters are reasonably chosen significantly impacts the prediction accuracy.Therefore, the sparrow search algorithm (SSA) is applied to determine the best parameter combination for the LSTM, MFLSTM, and MFCBA deep learning models.The intelligent optimization algorithm SSA, chosen for the proposed model, converges faster than the alternatives, and the prediction accuracy of the model optimized using SSA surpasses others.
The rest of the sections are arranged in the following way: Section 2 conducts a literature review of relevant studies.Section 3 introduces the VMD algorithm optimized by KL divergence (KL-VMD), the fuzzy entropy (FE) algorithm, the CBA deep learning model, the SSA, and the novel KL-VMD-MF-SSA-CBA-GARCH (KV-MFSCBA-G) model framework.In Section 4, we conduct empirical analyses, model comparisons, and statistical tests.Section 5 is the discussion where we compare the model with existing models in the literature, compare SSA with other optimization algorithms, and analyze the economic significance and future directions of this study.Section 6 concludes the paper.

Literature Review
Oil prices have complex properties, such as nonlinearity and dynamics, making accurate forecasting tricky.Over the years, numerous researchers have investigated various modeling methods to increase the precision of oil price forecasts.The proposed methods primarily include traditional econometric models, machine learning methods, hybrid models under the decomposition-integration paradigm, and econophysics approaches.
In general, linear and stationary time series are assumed in applying such models.However, crude oil prices typically do not satisfy these conditions.Therefore, traditional models might face challenges in effectively capturing oil prices' complex and nonlinear characteristics.
With the continuous progress of artificial intelligence, data mining, and other emerging technologies, machine learning models, including support vector machines (SVM), extreme learning machines (ELM), extreme gradient boosting (XGBoost), neural networks, and random forest, have become practical tools to cope with the characteristics of sequence randomness, nonlinearity, multi-noise, and dynamic changes, and have been popularly adopted in oil price forecasting.For example, Xie et al. (2006) [23] used SVM for oil price forecasting and found that its predictive accuracy outperformed that of ARIMA and back-propagation neural network models.Moshiri and Foroutan (2006) [24] compared the prediction ability of the artificial neural network (ANN) with the ARIMA and GARCH models and concluded that the ANN approach exhibits enhanced forecasting performance for crude oil prices.Mingming [25][26][27][28] constructed multi-wavelet recurrent neural network, XGBoost, random vector functional link network, and ELM, respectively.They all attained high prediction accuracy in oil price forecasting.Karasu and Altan (2022) [29] proposed a model incorporating LSTM, technical indicators, and the chaotic Henry gas solubility optimization technique, which can cope well with crude oil prices' chaotic and nonlinear characteristics.
Most machine learning methods using oil price data for iterative learning and training can accurately portray the nonlinearity of time series, leading to enhanced prediction accuracy.However, these methods also encounter challenges, such as reduced interpretability of the prediction results, a tendency to fall into local minima, parameter sensitivity, and overfitting [7,30].
Therefore, to further enhance forecasting accuracy, many studies have employed hybrid models within the decomposition-integration framework for oil price forecasting, with most achieving satisfactory results.The decomposition-integration technique decomposes crude oil prices into distinct subsequences, each characterized by a relatively simple structure.Subsequently, econometric or machine learning models are applied to predict each subsequence.Finally, the predicted value is derived by adding the forecast values of each component [31].For example, Fang et al. (2023) [30] forecasted Brent crude oil spot prices using a feedforward neural network (FNN) after decomposition using empirical mode decomposition (EMD) with the slope-based method.Wu et al. (2019) [32] combined the ensemble EMD (EEMD) method and LSTM to predict international oil prices, revealing that the model has a broad application.Jiang et al. (2022) [33] combined the EEMD and gated recurrent units (GRU) with the seagull optimization algorithm (SOA) for forecasting.Huang and Deng (2021) [34] used the improved signal-energy rule to optimize VMD, and the moving window (MW) method was applied to propose the VMD-LSTM-MW method, which is confirmed to be superior for the oil price empirical results.Zhao et al. (2021) [35] applied VMD optimized by particle swarm optimization (PSO) to decompose crude oil prices.Subsequently, they adopted the ARMA model to predict the smooth series and the SVM model to predict the unsmooth series.They empirically verified that the combined model surpassed others in accuracy and robustness.Jovanovic et al. (2022) and T. Li et al. (2021) [36,37] both used VMD and then used LSTM and random sparse Bayesian learning, respectively, to make crude oil price predictions.The research findings illustrate that their models outperform several other approaches for various evaluation indicators.
Additionally, some recent studies have adopted non-traditional approaches, such as the use of econophysics.For instance, Leng and Li (2020) [38] employed Bayesian and econophysical methods to analyze the dynamic prediction of crude oil prices.Gharib et al. (2021) [39] utilized econophysics techniques to detect bubbles and predict the collapse time of oil prices.Aslam et al. (2022) [40] employed robust econophysics-based multifractal detrended cross-correlation analysis to examine the nonlinear structure of the interconnection between geopolitical risks and the energy market.Li et al. (2022) [41] investigated the risk resonance effect between crude oil prices and the Chinese stock market using econophysics, asset pricing theory, and machine learning.Combining the literature, we find that although the hybrid forecasting models within the decomposition-integration framework have exhibited commendable forecasting capabilities, there remains ample scope for enhancement: (1) Many studies have proven the VMD algorithm's role in modeling crude oil prices [34][35][36][37].
Nevertheless, VMD requires the pre-specification of K and α before decomposition.Therefore, the objective selection of the parameters for VMD deserves attention.(2) For each component obtained after decomposition, studies typically explore and develop diverse machine learning methods for prediction.Recently, some studies have also explored quadratic decomposition for highly complex components or residual terms, followed by modeling using machine learning techniques to enhance prediction accuracy [5,7,42,43].However, deep learning approaches and econometric models each have their own strengths and weaknesses.Few studies have investigated the combination of them to predict components with different frequency characteristics.(3) A large proportion of these studies rely primarily on historical crude oil price data for future predictions, ignoring the potential impact of exogenous variables.However, monthly economic indicators might affect oil price dynamics, such as the growth of the global economy, which will increase the overall demand in the global commodity market, significantly increasing prices [3].
Thus, we construct the KL-VMD-MF-SSA-CBA-GARCH (KV-MFSCBA-G) model within the decomposition-integration paradigm.This model begins with decomposing crude oil prices into components with different frequency characteristics using VMD optimized by KL divergence.Subsequently, the FE algorithm is applied to identify and classify the component frequency characteristics into low-frequency and high-frequency terms.Then, MFSCBA is employed to forecast low-frequency trend terms combined with relatively low-frequency macroeconomic variable data, while the GARCH model is applied to forecast high-frequency disturbance terms.The SSA is used to optimize the parameter combination of deep learning models, which partially mitigates the challenge of parameter tuning difficulty.Lastly, the final forecasts for WTI and Brent are derived by aggregating the predictions of each component.

Methodology
This section mainly explains the KL-VMD method, FE algorithm, CBA deep learning model, SSA, and the procedures for building the KV-MFSCBA-G framework.

Variational Mode Decomposition Optimized by Kullback-Leibler Divergence
VMD is an enhancement based on the EMD introduced by Dragomiretskiy and Zosso in 2014 [44].The EMD can recognize complicated signal properties with no previous knowledge.However, end-point effects and mode component aliasing restrict its decomposition performance.Therefore, scholars have proposed the VMD algorithm to compensate for these limitations.Unlike the recursive decomposition of EMD, VMD uses a variational decomposition-essentially, multiple adaptive Wiener filters.This transformation facilitates adaptive segmentation of the signal components, improving noise robustness and attenuating the end-point effect.
The primary goal of VMD is to establish and determine the following variational problem: where f represents the signal, K denotes the number of modes, u k is the kth mode component, and ω k is the kth frequency center.δ(t) is the Dirac function, and * is the convolution operator.By imposing the quadratic penalty function and Lagrange multiplier λ, the restricted variational issue is converted into an unconstrained variational task, as seen below: where α is the quadratic penalty factor used to decrease the disturbance from Gaussian noise.The alternating direction method of multipliers, Parseval's theorem, and Fourier isometric transform can be applied to solve Equation (3).After alternate optimization iterations, the expressions for u k , w k , and λ are as follows: where γ is the noise tolerance, and n is the number of iterations.ûn+1 k (ω), ûi (ω), f (ω), and λ(ω) are the Fourier transforms of u n+1 k (t), u i (t), f (t), and λ(t), respectively.The procedure for the iterative of VMD is as follows: (1) Initialize û1 k , ω 1 k , λ 1 , and N, n = 0. (2) Use Equations ( 4) and ( 5) to update ûk and ω k .
(3) Update λ using Equation ( 6). ( 4) Assume that the accuracy convergence criterion is ε > 0; if it does not satisfy < ε and n < N, then revert to step (2).Otherwise, end the iteration and print the last ûk and ω k .
Although VMD overcomes the drawbacks of the traditional EMD and its improved methods, the values of K and α must be set before the decomposition.The selection of parameters can significantly influence the decomposition results.Therefore, determining the optimal parameter combination [K, α] is imperative.We apply the KL divergence (relative entropy) to optimize VMD's K value and penalty factor α [45,46].
The KL divergence is the degree of similarity between two probability distributions P and Q, which is thus calculated as: where P(x i ) represents the probability distribution of the actual data, and Q(x i ) denotes the distribution predicted by the model.A decrease in the KL divergence indicates a higher degree of alignment between the estimated probability distribution and the actual probability distribution.When determining the optimal parameters for VMD, the range for K is set between 3 and 10, and the range for α is set between 100 and 2500 with a step size of 100.The optimal combination is identified by selecting the values of K and α corresponding to the minimum relative entropy.

Fuzzy Entropy
Scholars have introduced diverse entropy measures to assess the disorder and complexity features of time series, including approximate and sample entropies.However, both methodologies define vector similarity using the unit step function.In reality, the boundary between modes is frequently ambiguous.Seeking to improve sample entropy, Chen et al. 2007 [47] proposed an FE method based on the fuzzy theory, which uses the affiliation function to compute the fuzzy similarity between various hidden modes.Specifically, for a given time series x(t), t = 1, 2, . . ., T, its FE is calculated as follows.
Based on time series x(t), construct the embedding vector X(i); the embedding dimension is m.
The distance of two vectors X(i) and X(j), d m ij , is the Chebyshev distance.
where x 0 (i) = ∑ m−1 j=0 x(i + j) /m is the baseline vector of X(i).
The similarity D m ij between X(i) and X(j) is estimated based on fuzzy affiliation functions with parameters n and r.
where u(d m ij , n, r) is the fuzzy affiliation function.(4) Define the function φ m (n, r).
Calculate the value of φ m (n, r) based on D m ij .
Equation ( 13) shows that the FE is affected by the parameters m, n, r, and T. The embedding dimension, denoted as m, is generally taken as m = 2. n determines the gradient of the similarity tolerance threshold, and a larger n corresponds to a steeper gradient.Chen et al. [47] recommend using a smaller integer value, such as 2 or 3, when calculating the gradient.r is the similarity tolerance threshold typically set within the range of 0.1σ sd ∼ 0.25σ sd .T is the sample length.

Convolutional Neural Network (CNN)
An FNN with convolutional, pooling, and fully connected layers constitutes the CNN.Convolutional layers are central and use kernels for convolutional computation and feature generation.Pooling layers perform secondary subsampling to prevent overfitting, and fully connected layers at the end of the model integrate features for the final output.This paper applies 1D-CNN to price data for efficient local feature extraction.

Bidirectional Long Short-Term Memory Network (BiLSTM)
LSTM is a unique RNN introduced by Hochreiter and Schmidhuber in 1997 [48].LSTM can learn long-term dependence information, thus solving the gradient vanishing or exploding problem, and is more applicable to predict time series with complex nonlinear and stochastic features.
At time t, the following formulas can be used to determine the forgetting gate ( f t ), input gate (i t ), output gate (o t ), candidate memory unit ( ∼ C t ), memory cell (C t ), and the hidden state (h t ) of LSTM: where x t represents the input value at time t, σ is the Sigmoid function with a range of (0, 1), tanh(•) is a hyperbolic tangent function with a range of (−1, 1), ⊙ represents the Hadamard product.
are the weight and bias parameters of the forgetting gate, input gate, output gate, and memory cell, respectively.
BiLSTM is an improvement of LSTM, stacking the forward and backward-propagating LSTM layers, creating a bidirectional recurrent structure.It can fully consider past and future information, effectively capture interaction characteristics within data, and has been applied in crude oil price prediction [49].Therefore, we employ BiLSTM to extract time series features.
BiLSTM's combination of hidden layer states is described as follows: where LSTM denotes the LSTM unit operation process.x t represents the input.h f , h b , h f −1 , and h b−1 are the hidden states for the forward and backward propagation of cells at the current and previous time, respectively.W f and W b are the weights of the forward and backpropagating units, respectively.c t is the bias optimization parameter.

Attention Mechanism (Attention)
Due to the extensive extraction of temporal and spatial features, this paper optimizes the model using the attention mechanism, which concentrates on crucial information by allocating distinct weights to input characteristics.The prioritization of vital information is evident in the weight calculation process, where higher importance causes greater assigned weights.The computation process is outlined below: (1) Calculate the correlation vector.
Suppose the output of the BiLSTM layer is h 1 , h 2 , ..., h t , ..., h T , t ∈ [1, T], and input them to the attention layer.The similarity score S t (h t ) for each moment is calculated by the tanh function as follows: where S t (h t ) represents the degree of correlation between the state h t and the output state.W h and b h are the weight and bias, respectively.
(2) Conduct attention scoring.The attention weight a t of the hidden layer unit h t is obtained from the so f tmax function as follows: where a t denotes the importance of the state, and so f tmax(•) is the activation function.
To obtain the output C optimized by the attention mechanism, each h t is multiplied by its corresponding a t and summed as seen below:

CNN-BiLSTM-Attention
A CNN layer, BiLSTM layer, and attention mechanism comprise the CBA model (Figure 1).The steps of the method are described below: (1) The CNN layer consists of convolutional, pooling, and dropout layers, and is designed to capture spatial features from the input data.The attention weight  of the hidden layer unit ℎ is obtained from the smax function as follows: where  denotes the importance of the state, and  (•) is the activation function.
To obtain the output  optimized by the attention mechanism, each ℎ is multiplied by its corresponding  and summed as seen below:

CNN-BiLSTM-Attention
A CNN layer, BiLSTM layer, and attention mechanism comprise the CBA model (Figure 1).The steps of the method are described below: (1) The CNN layer consists of convolutional, pooling, and dropout layers, and is designed to capture spatial features from the input data.

Sparrow Search Algorithm
The SSA is an intelligent optimization algorithm introduced by Xue J and Shen B in 2020 [50], which searches for the best solution by simulating the sparrow's foraging process.This algorithm is relatively innovative and has strengths in efficiently finding optimal solutions and demonstrating rapid convergence.The optimization process of the SSA is outlined below: (1) The discoverer position  , is updated using the following formula: where  , represents the position of the th sparrow in dimension . and  are the warning and safety values, respectively.

Sparrow Search Algorithm
The SSA is an intelligent optimization algorithm introduced by Xue J and Shen B in 2020 [50], which searches for the best solution by simulating the sparrow's foraging process.This algorithm is relatively innovative and has strengths in efficiently finding optimal solutions and demonstrating rapid convergence.The optimization process of the SSA is outlined below: (1) The discoverer position X t+1 i,j is updated using the following formula: where X i,j represents the position of the ith sparrow in dimension j.R 2 and S T are the warning and safety values, respectively.
(2) The joiner position X t+1 i,j is updated using the following formula: Entropy 2024, 26, 358 10 of 27 where X p and X worst are the optimal and worst positions, respectively.
(3) Suppose that 10-20% of the sparrows in the flock are alert to the threat.Those who are aware of the danger will promptly relocate to a safe zone.The position of their vigilantes X t+1 i,j can be expressed as seen below: where X best represents the current global optimal position.f i , f g , and f w are the fitness values of the current individual sparrow, the global optimal, and the global worst, respectively.
The following are the steps of SSA: (1) Set the initial value of the population, the ratio of predators and joiners, and the number of iterations.(2) After computing the fitness values, sort them in descending order.
(5) Apply Equation (21) to update the vigilante position.(6) Compute the fitness value and update the sparrow positions.(7) Evaluate if the stop criterion is met.If so, quit and print the result; otherwise, repeat steps ( 2)-(6).

KL-VMD-MF-SSA-CBA-GARCH Model
In this paper, we constructed a nonlinear mixed-frequency decomposition-integration approach to forecast crude oil prices, namely, the KV-MFSCBA-G model.This model combines the KL-VMD method, FE algorithm, mixed-frequency prediction idea, CNN, BiLSTM, attention mechanism, SSA, and GARCH model.Deep learning methods and traditional econometric models are selected for modeling in response to components with different frequency characteristics.Mixed-frequency data (low-frequency macroeconomic data) are effectively combined for prediction in deep learning networks.Furthermore, we employ SSA to optimize the parameter combination of CBA.Three steps and details of the model are presented in Figure 2.

Step 1: Price Decomposition and Characteristic Recognition
The primary objective is to decompose the raw data to derive several components.The KL-VMD method can effectively decompose international crude oil prices into K mode components with distinct frequency characteristics.These components are independent, have a straightforward structure, and demonstrate robust regularity.The mode components can be categorized into two groups according to the frequency characteristics. (

1) Low-frequency trend terms
The low-frequency trend terms, marked by a small amplitude, reflect external environmental influences on crude oil price changes.This contributes to long-term stable price trends and sometimes can be significantly impacted by major events, causing rapid value shifts.
(2) High-frequency disturbance terms The high-frequency disturbance terms have random short-period variations in crude oil prices caused by transient factors.Despite its frequent short-term changes, it lacks long-term impact.In addition, volatility clustering is observed, where significant price swings correspond to sharp fluctuations, and smaller price changes coincide with minor fluctuations.

Step 1: Price Decomposition and Characteristic Recognition
The primary objective is to decompose the raw data to derive several components.The KL-VMD method can effectively decompose international crude oil prices into  mode components with distinct frequency characteristics.These components are independent, have a straightforward structure, and demonstrate robust regularity.The mode components can be categorized into two groups according to the frequency characteristics. (

1) Low-frequency trend terms
The low-frequency trend terms, marked by a small amplitude, reflect external environmental influences on crude oil price changes.This contributes to long-term stable price trends and sometimes can be significantly impacted by major events, causing rapid value shifts. (

2) High-frequency disturbance terms
The high-frequency disturbance terms have random short-period variations in crude oil prices caused by transient factors.Despite its frequent short-term changes, it lacks long-term impact.In addition, volatility clustering is observed, where significant price swings correspond to sharp fluctuations, and smaller price changes coincide with minor fluctuations.
The analysis above indicates that different mode components have distinct frequency characteristics.Adopting particular prediction models for components with different frequency features and incorporating mixed-frequency exogenous explanatory variable data

Step 2: Mode Components Forecasting Combined with Mixed-Frequency Data
Different prediction methods are used for different mode component categories, using MFSCBA to forecast low-frequency trend terms and GARCH to forecast high-frequency disturbance terms.Furthermore, we introduce monthly macroeconomic variables for the low-frequency trend terms to enhance the prediction accuracy.Nevertheless, owing to the mismatch in frequency between the introduced exogenous explanatory variables (monthly) and the forecast target (daily), it is imperative to consider mixed-frequency data forecasting.The modeling process is outlined below. (

1) Forecasting low-frequency trend terms
For the low-frequency trend series, its daily historical data is combined with the monthly GECON index [51], the MIDAS approach is employed to accomplish frequency alignment of different frequency data [52], and then the mixed-frequency data are employed as the input vector.The MFCBA approach can effectively extract the interactive features between the data, fully capturing the intrinsic relationships within frequency-aligned mixed-frequency input vectors [11].The inclusion of the monthly low-frequency variable can predict the daily low-frequency trend components more accurately.Furthermore, SSA is used to find the optimal parameter combination for MFCBA to improve the efficiency of parameter selection. (

2) Forecasting high-frequency disturbance terms
The high-frequency disturbance terms exhibit significant stochasticity, time-varying, and clustering.The GARCH model is a powerful tool for handling stochastic processes that possess time-variant and volatility clustering attributes, making it well-suited for forecasting high-frequency components.This paper conducts an ARCH effect test on high-frequency series.If there is an ARCH effect, construct a GARCH prediction model; otherwise, build a deep learning prediction model [8].

Step 3: Ensemble of Mode Components Forecasting Results
In step 3, the forecasts of all mode components with different frequency characteristics are summed to derive the final crude oil prices forecast.

Forecast Evaluation Criteria and Statistical Tests
This paper employs three measures, root mean square error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE), to assess the forecasting accuracy of different models.Their definitions are listed below: where x(t) and x(t) are the actual and forecasted values of the crude oil price at time t, respectively.This paper employs the Diebold-Mariano (DM) test and model confidence set (MCS) test to assess the statistical significance of the proposed model.The MCS test includes a group of tests within the set M 0 to eliminate models with low predictive power.The set M * α comprises the optimal predictive models at a confidence level of 1 − α.For a given model k (k ∈ M 0 ), the model belongs to the set M * α if the p-value of its MCS test is larger than α.A larger p-value indicates better predictions [53].Four indicators are selected as criteria: MSE, MAE, heteroskedasticity-adjusted MSE (HMSE), and heteroskedasticity-adjusted MAE (HMAE).Two statistics, T R and T max , are obtained by bootstrap with 5000 replications, and α is set to 0.25.Additionally, the null hypothesis H 0 of the DM test is that the target model A and the benchmark model B possess identical forecasting capacities.H 0 is rejected at p < 0.05, meaning they have different effects.Furthermore, if the statistic is positive, model B is better than model A [54].

Data Description
For the empirical research, we employ WTI and Brent crude oil spot prices obtained from the US Energy Information Administration.Figure 3 shows the price curves of WTI and Brent.For WTI, the total dataset spans from 2 January 1986, to 21 February 2023, excluding the closing spot price on weekend trading days, with 9357 samples.The total dataset for Brent covers the period from 20 May 1987, to 21 February 2023, with 9077 samples.The sample set is divided into the training and test sets, accounting for 90% and 10%, respectively.Table 1 presents the descriptive statistics for WTI and Brent.
ples.The sample set is divided into the training and test sets, accounting for 90% and 10%, respectively.Table 1 presents the descriptive statistics for WTI and Brent.When forecasting low-frequency trend terms, we incorporate the monthly GECON index developed by Baumeister et al. (2022) [51], which reflects the current state of the global economy.This index is derived from 16 indicators related to real economic activity, such as commodity prices and financial indicators.The data can be accessed at https: //sites.google.com/site/cjsbaumeister/datasets,accessed on 11 March 2023.
Figure 4 shows the time trend of GECON.Comparing the time series curve of GECON and crude oil prices, it is evident that there were significant declines around 2001, 2008, and 2020, indicating a high degree of correlation.Furthermore, previous studies have established a close relationship between global economic activity and crude oil prices through modeling, suggesting that GECON can serve as an important indicator for predicting crude oil prices [2][3][4].Therefore, we further enhance the model's predictive ability by performing deep learning network modeling based on monthly and daily mixed-frequency data.
Following the practice of Girardin and Joyeux (2013) [52], Xu et al. (2019) [55], and Cai et al. (2020) [11], the MIDAS method is used to deal with the mixed-frequency problem.The data sampled at different frequencies are frequency aligned, the lagged observations are treated as their variables (which can be considered as a feature engineering process from the perspective of machine learning), and then the mixed-frequency information set is input into the neural network for feature learning.In addition, to avoid the influence of data dimension inconsistency on model training, this study carried out the normalization of data when constructing the mixed-frequency deep learning model.lem.The data sampled at different frequencies are frequency aligned, the lagged observations are treated as their variables (which can be considered as a feature engineering process from the perspective of machine learning), and then the mixed-frequency information set is input into the neural network for feature learning.In addition, to avoid the influence of data dimension inconsistency on model training, this study carried out the normalization of data when constructing the mixed-frequency deep learning model.

Decomposition of Crude Oil Prices
The KL-VMD method is employed to decompose the WTI and Brent prices.The  value and penalty factor  of decomposition are 7 and 900 for WTI, and 7 and 2100 for Brent, respectively.Figures 5 and 6 show the KL-VMD results of WTI and Brent, with mode component frequencies from low to high.Noticeable cyclical variations occur in the low-frequency components, reflecting the long-term trend.However, the high-frequency components are characterized by stochasticity and volatility clustering.Compared with the original data, the decomposed mode components exhibit a simplified structure and high regularity, which improves the fitting and forecasting abilities.

Decomposition of Crude Oil Prices
The KL-VMD method is employed to decompose the WTI and Brent prices.The K value and penalty factor α of decomposition are 7 and 900 for WTI, and 7 and 2100 for Brent, respectively.Figures 5 and 6 show the KL-VMD results of WTI and Brent, with mode component frequencies from low to high.Noticeable cyclical variations occur in the low-frequency components, reflecting the long-term trend.However, the high-frequency components are characterized by stochasticity and volatility clustering.Compared with the original data, the decomposed mode components exhibit a simplified structure and high regularity, which improves the fitting and forecasting abilities.

Recognition of Mode Component Characteristics
We employ the FE algorithm to identify the frequency characteristics of the mode components, with parameter settings  = 2,  = 2,  = 0.15 .Based on the FE values to classify the mode components, Table 2 presents the FE values and classification results of all components.Specifically, the FE values of mode components 1-2 of WTI and Brent are less than  = 0.05; thus, they are identified as low-frequency trend terms.Conversely, the

Recognition of Mode Component Characteristics
We employ the FE algorithm to identify the frequency characteristics of the mode components, with parameter settings m = 2, n = 2, r = 0.15σ sd .Based on the FE values to classify the mode components, Table 2 presents the FE values and classification results of all components.Specifically, the FE values of mode components 1-2 of WTI and Brent are less than λ = 0.05; thus, they are identified as low-frequency trend terms.Conversely, the FE values of mode components 3-7 are greater than λ, and are recognized as high-frequency disturbance terms.

Model Selection and Parameter Description
We prove the superiority of the proposed model through a comparative analysis of different models.Table 3 presents the abbreviations used for the models.
(1) We evaluate the forecasting performance between decomposition-integration models and the single model by introducing Model 1 (LSTM) for comparative analysis.(2) Compared with Model 1, Models 2-4 all employ VMD for price decomposition before forecasting by LSTM.Specifically, Model 3 is optimized based on Model 2 using KL-VMD.Furthermore, Model 4 optimizes the parameters of LSTM using SSA, building on the enhancement in Model 3. Three questions can be evaluated by comparing Models 1-4: whether VMD can improve prediction accuracy, whether KL-VMD is better than VMD, and whether SSA-LSTM is better than LSTM.(3) The remaining models adopt the decomposition-integration framework, employing the FE algorithm to divide mode components, forecasting high-frequency disturbances by GARCH, and optimizing deep learning parameters by SSA.Their distinctions are as follows: KL-VMD or EMD can be used in the mode decomposition stage.Perform pairwise comparisons between Models 5 and 7, 6 and 9, and 7 and 10.These comparisons can evaluate whether the decomposition performance of KL-VMD is superior to EMD.Next, LSTM, MFLSTM, and MFCBA can be used in the low-frequency component prediction stage.Sequentially comparing Models 5-7 and 8-10 can evaluate whether including mixed-frequency data can improve the prediction accuracy of LSTM and whether CBA has an advantage over LSTM.
Table 3.The abbreviations of the models.

Number
Model Abbreviation By selecting the aforementioned 10 models for comparison, this study establishes a series of model comparison combinations for each component within the model architecture.Figure 7 illustrates the detailed comparison combinations, primarily comparing the following four components of the proposed model: (1) In the stage of decomposition, the decomposition effect of KL-VMD is compared with that of VMD and EMD.(2) In the stage of low-frequency trend prediction using mixed-frequency deep learning approaches, MF-SSA-LSTM is compared with SSA-LSTM to determine whether the introduction of mixed-frequency data could improve prediction performance.Then, MF-SSA-CBA is compared with MF-SSA-LSTM to determine whether the specific mixed-frequency deep learning method adopted in this study is superior to LSTM.(3) In the stage of high-frequency disturbance prediction using traditional econometric models, a comparison between KV-SL-G and KV-SL is conducted to verify whether the introduction of the GARCH model can enable the deep learning method and traditional econometric model to "perform their respective roles", thereby enhancing predictive accuracy.(4) Finally, as for the intelligent optimization algorithm SSA, KL-VMD-LSTM is compared with KL-VMD-SSA-LSTM to determine whether SSA can improve the prediction accuracy of deep learning models.In addition, in the discussion section, the prediction effect of models using SSA, SOA, and PSO for parameter optimization is compared to further verify the superiority of SSA.
Regarding the parameters of MFSCBA, we specify the number of sparrows and the maximum iterations in SSA as 10 and optimize the number of hidden units, iterations, and batch sizes in MFCBA with SSA, and the parameter search ranges are [50, 500], [50, 500], and [100, 1000], respectively.Additionally, when the ARCH effect of the residuals is observed, we employ ARIMA-GARCH to model the mean and variance.The maximum lag order of each term is 3, and the Bayesian information criterion is used to make the selection.All calculations are implemented by Matlab R2022a and Python3.11 on 64-bit Windows 10.
predictive accuracy.(4) Finally, as for the intelligent optimization algorithm SSA, KL-VMD-LSTM is compared with KL-VMD-SSA-LSTM to determine whether SSA can improve the prediction accuracy of deep learning models.In addition, in the discussion section, the prediction effect of models using SSA, SOA, and PSO for parameter optimization is compared to further verify the superiority of SSA.Regarding the parameters of MFSCBA, we specify the number of sparrows and the maximum iterations in SSA as 10 and optimize the number of hidden units, iterations, and batch sizes in MFCBA with SSA, and the parameter search ranges are [50,500], [50,500], and [100, 1000], respectively.Additionally, when the ARCH effect of the residuals is observed, we employ ARIMA-GARCH to model the mean and variance.The maximum lag order of each term is 3, and the Bayesian information criterion is used to make the        To further assess the prediction accuracy of different methods, the in-sample fitting and out-of-sample forecasting errors of WTI and Brent under the three evaluation criteria are given in Tables 4 and 5. Upon comparison, the following findings are made:

Prediction Evaluation and Test Results
(1) In general, LSTM exhibits larger errors than all decomposition-integration models, proving the effectiveness of the decomposition-integration paradigm for oil price forecasting.Decomposition transforms the complicated price series into a simplified, stable, and regular structure, significantly enhancing forecasting accuracy.(2) A comparison of the LSTM, V-L, KV-L, and KV-SL models reveals a progressive improvement in prediction accuracy, indicating that VMD contributes to improved accuracy, KL-VMD is better than VMD, and SSA-LSTM is better than LSTM.Thus, it proves the effectiveness of optimizing VMD and the deep learning approach by using KL divergence and the SSA intelligent optimization algorithm, respectively.(3) Comparing the combinations of E-SL-G and KV-SL-G, E-MFSL-G and KV-MFSL-G, and E-MFSCBA-G and KV-MFSCBA-G, respectively, demonstrates that the KL-VMD method has a superior decomposition effect compared to EMD. (4) A comparison of KV-SL-G, KV-MFSL-G, and KV-MFSCBA-G shows that considering mixed-frequency data enhances the prediction accuracy of LSTM.Furthermore, MFCBA exhibits superior performance compared to MFLSTM.Comparing the E-SL-G, E-MFSL-G, and E-MFSCBA-G models can confirm these findings.(5) It is worth pointing out that the prediction accuracy of KV-SL-G is higher than KV-SL, indicating that the FE algorithm is used to divide components into low and high frequencies, and then the GARCH model is introduced to model the high-frequency disturbance, which effectively combines the advantages of the traditional econometric model and deep learning approach, thus improving the prediction accuracy.In general, the KV-MFSCBA-G model nearly achieves the optimal forecasting performance in both WTI and Brent empirical studies, proving its robustness.It gives full play to KL-VMD's capacity for decomposition denoising, the MFCBA model's strong forecasting ability for nonlinear price series, the GARCH model's good portrayal ability for volatility clustering, and SSA's ability to find the optimal parameters efficiently.
Table 6 shows the MCS test results.Only the KV-MFSCBA-G method is always in M * 75% , and most of its p-values are 1.This finding validates the strength and robustness of the novel method.Table 7 presents the DM test results.When comparing LSTM as the target model to the benchmark models, all p-values are smaller than 0.1, and the statistics are greater than 0, indicating that decomposition-integration forecasting models are significantly superior to the single model.Furthermore, when the remaining models are the target models, and KV-MFSCBA-G is the benchmark model, all p-values are smaller than 0.1, and the statistics are positive, confirming that the KV-MFSCBA-G method has a more outstanding crude oil price forecasting performance than the others.
Overall, this section begins with the baseline LSTM model and conducts a longitudinal comparative analysis step-by-step.The results of error measures and statistical tests confirm the superiority of the proposed model and the rationality of the chosen model components and architecture.

Further Comparison against Existing Models in the Literature
Section 4 focuses on the longitudinal comparison of models.Following this, Section 5 will proceed to conduct a horizontal comparison of the performance of the proposed model with existing models in the literature.The selected comparative models include traditional econometric models, such as ARIMA [15,16] and ARIMA-GARCH [21,22]; machine learning methods, such as ANN [24], ELM [28], SVM [23], and XGBoost [26]; and hybrid models, including EEMD-GRU [33], EEMD-LSTM [32], and VMD-SVM-ARMA [35].
Tables 8 and 9 present the in-sample fitting and out-of-sample forecasting performance of these comparison models and the proposed KV-MFSCBA-G model, respectively.Additionally, to enhance the transparency of the model architecture, the prediction accuracy of MF-SSA-CBA (the overall prediction accuracy of the low-frequency part of KV-MFSCBA-G) has been included in the final row of the tables.Through comparison of the forecasting results of the models for WTI and Brent, it is evident that the proposed KV-MFSCBA-G model exhibits the smallest RMSE, MAE, and MAPE values, indicating the highest prediction accuracy.Additionally, the following findings are made: (1) The forecasting accuracy of ARIMA-GARCH exceeds that of ARIMA, indicating that due to the highly volatile and non-constant variance characteristics of crude oil prices, ARIMA-GARCH can more accurately capture its dynamic features, leading to improved forecasting accuracy.(2) The prediction error obtained from forecasting the low-frequency part using MF-SSA-CBA is much smaller compared to that of traditional econometric models and machine learning models.This highlights the ability of MF-SSA-CBA to better capture the underlying multi-scale complex features, long short-term dependencies, and non-linear trends of the low-frequency trend components.Furthermore, during the model training and prediction processes, MF-SSA-CBA can adaptively focus on crucial information, demonstrating greater flexibility and generalization ability, thus improving the predictive performance of the model.(3) Another finding is that SVM demonstrates the best predictive performance among the four machine learning models.Similarly, VMD-SVM-ARMA showed the highest predictive performance among the three decomposition ensemble models.However, the improvement in accuracy of VMD-SVM-ARMA compared to SVM was not significant.The analysis suggests that although VMD can extract multiscale features from data to help deep learning models better understand long-term dependencies, it may not provide SVM with additional useful information and may even introduce noise or redundant information.
Therefore, it is reasonable to explore appropriate deep learning models to enhance model prediction performance for the low-frequency components obtained by KL-VMD.Additionally, combining the mixed-frequency deep learning approach with the traditional econometric model can fully utilize their respective strengths.The MFSCBA method is effective in describing complex nonlinear features, and the GARCH model is capable of capturing the volatility clustering effect, thus resulting in improved prediction accuracy.

Comparison of SSA with Other Intelligent Optimization Algorithms
To demonstrate the superiority of selecting SSA to optimize the MFCBA model in this study, we compared it with intelligent optimization algorithms used in the literature for crude oil price forecasting, including SOA [33] and PSO [35].The parameter optimization search ranges of SOA and PSO are kept consistent with SSA, and the population size and maximum number of iterations of these three optimization algorithms are uniformly set to 10.To differentiate between the optimization algorithms used in each model, we denote our proposed KV-MFSCBA-G model as KV-MF-SSA-CBA-G.The models that use SOA and PSO are denoted as KV-MF-SOA-CBA-G and KV-MF-PSO-CBA-G, respectively.These three models differ only in their parameter optimization algorithms used for the MFCBA model.
From the convergence curves shown in Figure 12, taking WTI's first low-frequency trend term as an example, although all three models reach convergence before 10 iterations, SSA converges faster and more efficiently with a smaller fitness function (RMSE).To demonstrate the superiority of selecting SSA to optimize the MFCBA model in this study, we compared it with intelligent optimization algorithms used in the literature for crude oil price forecasting, including SOA [33] and PSO [35].The parameter optimization search ranges of SOA and PSO are kept consistent with SSA, and the population size and maximum number of iterations of these three optimization algorithms are uniformly set to 10.To differentiate between the optimization algorithms used in each model, we denote our proposed KV-MFSCBA-G model as KV-MF-SSA-CBA-G.The models that use SOA and PSO are denoted as KV-MF-SOA-CBA-G and KV-MF-PSO-CBA-G, respectively.These three models differ only in their parameter optimization algorithms used for the MFCBA model.
From the convergence curves shown in Figure 12, taking WTI's first low-frequency trend term as an example, although all three models reach convergence before 10 iterations, SSA converges faster and more efficiently with a smaller fitness function (RMSE).
Table 10 shows the in-sample and out-of-sample prediction performance of the three models for WTI and Brent.The model optimized by SSA can further improve prediction accuracy compared to SOA and PSO.
These results demonstrate that optimizing the deep learning model using SSA has certain benefits, including faster convergence speed, better global search capability, and less susceptibility to local optima, which effectively improves model prediction performance.Additionally, the program of the SSA had shorter running times in this study.This further validates the rationality of selecting SSA as the intelligent optimization algorithm component of the model architecture.Table 10 shows the in-sample and out-of-sample prediction performance of the three models for WTI and Brent.The model optimized by SSA can further improve prediction accuracy compared to SOA and PSO.These results demonstrate that optimizing the deep learning model using SSA has certain benefits, including faster convergence speed, better global search capability, and less susceptibility to local optima, which effectively improves model prediction performance.Additionally, the program of the SSA had shorter running times in this study.This further validates the rationality of selecting SSA as the intelligent optimization algorithm component of the model architecture.

Economic Significance and Practical Application
Accurately forecasting the trend of international crude oil prices is crucial for ensuring global energy supply security and economic stability.It is also an important research focus in the field of global energy research.This study provides a scientifically precise forecast of international crude oil prices, which holds great economic implications and wide practical application.The findings can serve as a reference and basis for management decisions and risk control by market investors, policymakers, and market analysts.
Firstly, this study can assist investors in accurately analyzing and predicting trends in international crude oil prices.By combining the mixed-frequency deep learning approach, intelligent optimization algorithm, and traditional econometric model, the proposed model can capture the complex, nonlinear dynamic characteristics of crude oil prices.Furthermore, by introducing GECON data, the proposed model not only takes into account the internal dynamic factors of the crude oil market but also situates it within the macro context of the global economic environment, which results in more precise and dependable forecast results.As a result, this study can offer investors better insight into market price forecasts, enabling them to make better investment decisions, avoid market risks, and enhance investment returns.
Secondly, this study provides a reference basis for policymakers to formulate energy and monetary policies.Government agencies and regulatory bodies can utilize the proposed model to monitor and adjust the crude oil market in real time, promptly responding to changes in the global economic environment and market fluctuations.
Finally, this study is of great importance to oil-related enterprises when formulating production and investment strategies.In the face of uncertain crude oil prices, enterprises can use the proposed model to develop more effective strategies, which can help mitigate operational risks caused by market fluctuations and enhance competitiveness and profitability.

Future Directions
This study enhances the prediction performance of the model by incorporating monthly GECON data to construct the mixed-frequency deep learning model, and future research could expand the data sources for predicting crude oil prices by integrating more mixed-frequency data and multi-source heterogeneous data.Additionally, it would be a valuable research direction to introduce explainable artificial intelligence technology to open the black box of deep learning models and improve the interpretability and credibility of forecasting models.
(2) BiLSTM is then trained based on the local features obtained from the CNN layer to learn the patterns of internal dynamic variation and obtain the forward and reverse time series temporal features.(3)The extracted temporal and spatial features are fed into the attention mechanism, enhancing the model's attention to crucial features during learning and improving prediction accuracy.Entropy 2024, 26, 358 9 of 28 (2) BiLSTM is then trained based on the local features obtained from the CNN layer to learn the patterns of internal dynamic variation and obtain the forward and reverse time series temporal features.(3)The extracted temporal and spatial features are fed into the attention mechanism, enhancing the model's attention to crucial features during learning and improving prediction accuracy.

Figure 2 .
Figure 2. The process of the KL-VMD-MF-SSA-CBA-GARCH model.The analysis above indicates that different mode components have distinct frequency characteristics.Adopting particular prediction models for components with different frequency features and incorporating mixed-frequency exogenous explanatory variable data can enhance prediction accuracy.We use the FE complexity algorithm to classify the frequency characteristics of the mode components into two categories.The FE algorithm quantifies the disorder within the dynamic state of the time series, with a higher computed FE value indicating increased complexity in the time series.Therefore, we classify the mode components based on the FE value in this paper.First, we calculate the complexity of the mode component k based on the FE algorithm, denoted as FE k , k = 1, 2, . . ., K. Second, we set the critical value λ (0.05).Finally, we classify each mode component IMF based on the FE value.I MF 1 , I MF 2 , ..., I MF m are recognized as low-frequency trend terms (FE < λ), and I MF m+1 , ..., I MF K are recognized as high-frequency disturbance terms (FE > λ).

Figure 3 .
Figure 3. Daily international crude oil price curve.Figure 3. Daily international crude oil price curve.

Figure 3 .
Figure 3. Daily international crude oil price curve.Figure 3. Daily international crude oil price curve.

Figure 4 .
Figure 4. Global economic conditions index curve.

Figure 4 .
Figure 4. Global economic conditions index curve.

Figure 5 .
Figure 5.The KL-VMD decomposition results of WTI crude oil prices.Figure 5.The KL-VMD decomposition results of WTI crude oil prices.

Figure 5 .
Figure 5.The KL-VMD decomposition results of WTI crude oil prices.Figure 5.The KL-VMD decomposition results of WTI crude oil prices.

Figure 5 .
Figure 5.The KL-VMD decomposition results of WTI crude oil prices.

Figure 6 .
Figure 6.The KL-VMD decomposition results of Brent crude oil prices.

Figure 6 .
Figure 6.The KL-VMD decomposition results of Brent crude oil prices.

Figure 7 .
Figure 7. Model comparison combinations of each model component.

Figure 7 .
Figure 7. Model comparison combinations of each model component.

Figures 8 -
Figures 8-11 present the in-sample fitting and out-of-sample forecasting results of WTI and Brent, respectively.For clarity, only the three months with significant fluctuations are shown in each graph.Compared to alternative models, the KV-MFSCBA-G model has higher accuracy in values and directions fitted or predicted and more coherent upward or downward movement with the actual data.

Figure 8 .
Figure 8. In-sample fitting results of each model for WTI crude oil prices.

Figure 8 .
Figure 8. In-sample fitting results of each model for WTI crude oil prices.

Figure 9 .
Figure 9. In-sample fitting results of each model for Brent crude oil prices.Figure 9. In-sample fitting results of each model for Brent crude oil prices.

Figure 9 .
Figure 9. In-sample fitting results of each model for Brent crude oil prices.Figure 9. In-sample fitting results of each model for Brent crude oil prices.

Figure 10 .
Figure 10.Out-of-sample forecasting results of each model for WTI crude oil prices.

Figure 11 .
Figure 11.Out-of-sample forecasting results of each model for Brent crude oil prices.

Figure 12 .
Figure 12.Convergence curves of different optimization algorithms.

Figure 12 .
Figure 12.Convergence curves of different optimization algorithms.

Table 1 .
The summary of the crude oil price data set.
(a) Price curve of WTI (b) Price curve of Brent

Table 1 .
The summary of the crude oil price data set.

Table 2 .
Fuzzy entropy of each component and classification result.

Table 4 .
Comparison of the in-sample fitting performance of the models.

Table 5 .
Comparison of the out-of-sample forecasting performance of the models.

Table 6 .
Model confidence set test results.Note: This table reports the p-value of the forecasting model under the MCS test.* indicates that the p-value is greater than 0.25, which means the model is in M * 75% .Bold numbers indicate the best model in that set of tests.

Table 8 .
Comparison of the in-sample fitting performance of the models.

Table 9 .
Comparison of the out-of-sample forecasting performance of the models.

Table 10 .
Forecasting performance of models based on different optimization algorithms.

Table 10 .
Forecasting performance of models based on different optimization algorithms.