Price Forecasting of Crude Oil Using Hybrid Machine Learning Models

Choudhary, Jyoti; Sharma, Haresh Kumar; Malik, Pradeep; Majumder, Saibal

doi:10.3390/jrfm18070346

Open AccessArticle

Price Forecasting of Crude Oil Using Hybrid Machine Learning Models

¹

Department of Mathematics, Faculty of Applied and Basic Sciences, SGT University, Gurugram 122505, India

²

Department of Operations Management and Decision Sciences, Birla Institute of Management Technology, Plot No. 5, Knowledge Park-2, Greater Noida 201306, India

³

Department of CSE (Data Science), Dr. B. C. Roy Engineering College, Durgapur 713206, India

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2025, 18(7), 346; https://doi.org/10.3390/jrfm18070346

Submission received: 5 May 2025 / Revised: 18 June 2025 / Accepted: 19 June 2025 / Published: 21 June 2025

(This article belongs to the Section Mathematics and Finance)

Download

Browse Figures

Versions Notes

Abstract

Crude oil is a widely recognized, indispensable global and national economic resource. It is significantly susceptible to the boundless fluctuations attributed to various variables. Despite its capacity to sustain the global economic framework, the embedded uncertainties correlated with the crude oil markets present formidable challenges that investors must diligently navigate. In this research, we propose a hybrid machine learning model based on random forest (RF), gated recurrent unit (GRU), conventional neural network (CNN), extreme gradient boosting (XGBoost), functional partial least squares (FPLS), and stacking. This hybrid model facilitates the decision-making process related to the import and export of crude oil in India. The precision and reliability of the different machine learning models utilized in this study were validated through rigorous evaluation using various error metrics, ensuring a thorough assessment of their forecasting capabilities. The conclusive results revealed that the proposed hybrid ensemble model consistently delivered effective and robust predictions compared to the individual models.

Keywords:

crude oil; machine learning; time series forecasting; error metrics

1. Introduction

Crude oil is a crucial resource that is prone to immense fluctuations due to variables such as the currency exchange rate, supply and demand, weather, policy uncertainty, stock (Khan et al., 2025) and money market, future market, etc. It plays a vital role in the global economy and is a core resource of a country’s or region’s economy. Even though it enhances the global economy, investors face uncertainty in crude oil prices, which poses a significant challenge (Oglend & Kleppe, 2025; Ma et al., 2024). The fragility of crude oil prices directly impacts the major sectors such as energy (Jiang et al., 2025), automobile, etc. Catering to the needs of policymakers, decision-makers, investors, and financiers—and to avoid any threat to national security—a reliable forecasting technique is essential. Accurate predictions aid the decision-making process for energy reservoirs, crude oil enterprises (such as petroleum-related manufacturers), air-cargo transport operations, and shipping stock prices (Andrikopoulos et al., 2025). Such predictions are valuable to investors yearning for profits from the market prices of crude oil (K. He et al., 2012; Deng et al., 2020).

Due to some accidental natural disasters, demand–supply fluctuation, or any sharp transitions in the government or global policies, the price of crude oil is susceptible to fluctuations (Manickavasagam et al., 2020). The long-term predictions of the crude oil prices strengthen the strategies for diversifying the import and export sources, investments in alternative energy resources, and increasing or decreasing domestic production. The factors crucial for the national budget, such as fuel taxes, subsidy expenses, and revenues, can be estimated conveniently by the government. Forecasting also improves global competitiveness and industrial planning, aiding the foresight of the steel, aluminum, and cement sectors. Rather than merely being a data exercise, forecasting crude oil import prices enables sustainable national growth by maintaining economic stability, market efficiency, and informed policy planning.

Crude oil significantly impacts the global economy, making it essential to analyze import prices for better strategy development. The volatile and complex data have driven the researchers to formulate an authentic model for the prediction of crude oil’s import price. Ample research has been conducted on predicting crude oil prices. Alvarez-Ramirez et al. (2002) studied the crude oil prices using the multifractal analysis method, and the result indicated that the crude oil market is consistent. C. W. Yang et al. (2002) analyzed the factors affecting the US oil market and used the error correction method to examine the related elasticity and demand relations. Movagharnejad et al. (2011) presented an artificial neural network framework (ANN) to predict the oil price ratios of the Persian Gulf Region. Satija and Caers (2015) investigated a prediction-focused approach (PFA) to build a statistical relationship between the data and forecast variables. They proposed a canonical functional component analysis (CFCA) for mapping and forecasting the data variables into a low-dimensional space, avoiding the inversion of model parameters. The results have indicated that CFCA is an effective PFA method for forecasting problems. Chai et al. (2018) constructed a Bayesian inference model using the product partition model and the K-means method for measuring and identifying the posterior probability of change points of crude oil. Ding (2018) proposed an ensemble empirical mode decomposition (EEMD) and artificial neural network (ANN) model for crude oil forecasting. Sharma et al. (2018) suggested a rough set theory approach based on a hybrid model to predict air passenger traffic for an Australian airline. This showed better forecasting accuracy than classical forecasting models. This hybrid model improved airline operational decision-making by managing imprecision and uncertainty in time series data. Y. He et al. (2019) used support vector quantile regression and fuzzy information granulation for the prediction of wind and solar power, and the result stated that the proposed hybrid model, feature interaction graph–variational quantile regression (FIG-VQR), showed better prediction performance. Kim et al. (2020) studied the four industrial sectors of South Korea and proposed a novel customer damage function estimation method, Bayesian Tobit quantile regression, to evaluate the cost of a customer’s outage. Abdollahi (2020) presented a hybrid model using generalized autoregressive conditional heteroskedasticity (GARCH) and decomposed the time series to enhance the prediction of the prices of crude oil. Similarly, J. Wang et al. (2020) developed a multi-granularity heterogeneous combination framework based on ABC algorithm for crude oil’s price prediction. R. Li et al. (2021) presented a novel multi-scale forecasting model including the variational mode decomposition technique for the accurate predictions of complex prices. Huang et al. (2021) proposed a two-stage forecasting procedure, two-stage variational Bayesian model averaging (2SVBMA), to forecast interval-valued time series of the crude oil price, and the result indicated that, despite having some limitations, the presented approach has outperformed the other models. Sun et al. (2021) investigated the heterogeneity of autocorrelation of crude oil futures in the framework of multiscale analysis and quantile regression analysis, and the result explained that the price of crude oil has a combination of random short-term fluctuation and deterministic long-term tendency. Y. Yang et al. (2021) proposed a hybrid approach including K-means, kernel principal component analysis (KPCA), and kernel-based extreme learning machine (KELM) to forecast the monthly price of crude oil, and the results provided the most accurate forecast of the presented hybrid method. Kumari et al. (2022) proposed a hybrid forecasting model based on a single forecasting model and rough set theory (RST) to forecast foreign tourist arrivals in India. Qian et al. (2022) proposed a function-on-function regression prediction model, the functional kernel partial least squares (FKPLS) method, for the prediction of product quality during the manufacturing process. The method’s effectiveness was validated through the steelmaking process. Sun et al. (2022) adopted the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) method for the decomposition of original crude oil price data. Cheng et al. (2024) constructed a threshold autoregressive interval-valued model with interval sentiment indexes for climate chan for the prediction of interval-valued crude oil prices J. Li et al. (2024) proposed a novel hybrid method to forecast the price of crude oil by combining the multivariate ensemble empirical mode decomposition (MEEMD) and mixed-kernel extreme learning machine (Mix-KELM), and the result showed that the proposed models had the lowest errors. Liu et al. (2024) presented a novel hybrid method combining long short-term memory (LSTM), extreme learning machine (ELM), autoregressive integrated moving average (ARIMA), and backpropagation neural network (BPNN) methods with Jaynes weight hybrid and Shannon information entropy. The results indicated that the highest prediction was achieved by the presented model than the individual models. Shen et al. (2024) proposed the probabilistic prediction hybrid model EEMD-CNN-BiLSTM-QR, combining ensemble empirical mode decomposition (EEMD), a convolutional neural network (CNN), a bilateral long short-term memory (BiLSTM), and quantile regression (QR) for the prediction of the crude oil price. The result indicated that the application of ensemble empirical mode decomposition (EEMD) in crude oil prices improves the forecasting accuracy of the proposed hybrid model. J. Zhang and Liu (2024) presented an improved hybrid model combining the data feature extraction and quantification techniques, an improved echo state network, fuzzy information granulation, and an autoregressive integrated moving average model, and the result showed that the hybrid model portrays the various features of the volatility of crude oil price. C. Zhang and Zhou (2024) proposed the combination of autoregressive moving average (ARIMA), support vector regression (SVR), and peaks over threshold (POT) ARIMA-SVR-POT and compared it with the hybridization of autoregressive moving average (ARIMA) and exponential generalized autoregressive conditional heteroskedasticity (EGARCH) ARIMA-EGARCH, ARIMA-SVR, and ARIMA-EGARCH-POT, and results indicated that the proposed model has passed the Kupiec test and provided comprehensive loss probability. Chen et al. (2025) proposed a hybrid model combining the stacking-ensemble model, correlation analysis, and supply and demand optimization model to improve the power load forecasting accuracy. Fozap (2025) proposed hybrid deep learning model LSTM-CNN for the prediction of long term stock market. Harikrishnan and Sreedharan (2025) suggested a weighted average ensemble mode for short-term load forecasting, and the results indicated that the proposed combination surpassed the single forecasting models. Liang et al. (2025) proposed a novel gated recurrent unit-based non-linear Granger causality (GRU-GC) for the prediction of precise oil prices and the determination of causality. Rao et al. (2025) leveraged the machine learning algorithms, generalized autoregressive conditional heteroskedasticity model of order (1,1) and long short-term memory (LSTM), Vanilla LSTM, GARCH (1,1) LSTM, and GARCH (1,1) GRU for the prediction of the crude oil prices with different time frequencies and sample periods. The empirical analysis showed that the LSTM and GARCH (1,1)-GRU hybrid models performed well. LSTM provided more accurate predictions, whereas GARCH (1,1)-GRU minimized squared errors. Sharma and Kumari (2025) applied a rough set-based hybrid forecasting model to forecast tourism analysis data. Tang et al. (2025) presented a graph variational, graph autoencoder (G-VGAE) model with node features for the prediction of international crude oil trade relations, and the empirical analysis resulted in the finding that the node features improved the accuracy of the model. Ullah et al. (2025) proposed a hybrid forecasting model with the combination of time-varying filtered-based empirical mode decomposition (TVFEMD), discrete wavelet transform (DWT), multi-scale permutation entropy (MPE), quantum particle swarm optimization (QPSO), and gated recurrent unit (GRU) to forecast wind speed. D. Wang et al. (2025) presented a functional mixture prediction (FMP) model for real-time forecasting of crude oil cumulative intraday returns (CIDR). An adaptive functional clustering algorithm was also developed for the identification of distinct patterns of CIDPR curves. Wei and Liu (2025) examined the adverse impacts of rising commodities such as Russia Ukraine war on the Chinese production activites. Xu et al. (2025) developed a non-linear mechanism feature heterogeneous hybridization for improving the robustness and effectiveness of the prediction of West Texas Intermediate (WTI) and Brent oil prices. W. Zhang et al. (2025) presented innovative spatio-temporal information recombination hypergraph neural network (STIR-HGNN) to improve agriculture price’s accuracy. Zhao et al. (2025) presented an adaptive multi-factor integrated model by integrating Pearson’s correlation coefficient (PCC), elastic net (EN) regression, and random forest (RF) for the prediction of carbon price, and the result indicated the proposed model to be a reliable benchmark in the carbon price prediction. Banik and Biswas (2024) used RF-XGBoost stacking for renewable energy forecasting, but it lacked deep learning components. Cui et al. (2024) applied CNN-GRU combinations with XGBoost-RF feature selection for short-term load forecasting, but not in crude oil price prediction. Additionally, according to a post on GitHub, LSTM-XGBoost (using Python 3.13.1) hybrid models were explored for stock price forecasting, but they do not incorporate random forest or CNN together. While existing studies on crude oil price forecasting primarily focus on single-model approaches or basic ensemble methods, the specific combination of RF-GRU-CNN-XGBoost-FPLS-Stacking in our study has not previously been applied to crude oil price forecasting, offering balanced short-term and long-term trend detection, improved volatility adaptation, and stacking optimization for ensemble learning. None of the prior studies have integrated the base models of our proposed model into a unified ensemble framework. Most of the models struggle with extreme price fluctuations, failing to balance short-term volatility adaptation and long-term trend stability. Limited integration of macroeconomic indicators in previous models reduces forecasting accuracy in dynamic market conditions. Our approach uniquely balances short-term feature extraction, long-term dependency modeling, and non-linear trend adaptation, ensuring superior forecasting accuracy. The stacking meta-model further optimizes predictions by leveraging the strengths of each base model while mitigating their individual weaknesses. CNN captures local patterns, GRU handles sequential dependencies, RF ensures feature importance, FPLS captures the smooth patterns, reduces dimensionality, and handles collinearity, and XGBoost refines non-linear relationships. The proposed model achieves lower error metrics (MAPE, RMSE, MAE) compared to baseline models, demonstrating superior predictive performance. Unlike prior models, our hybrid approach mitigates extreme fluctuations in crude oil prices. Prior works use basic stacking, but our study fine-tunes the meta-model for better generalization and reducing bias compared to traditional ensemble methods. The model is tested on real-world crude oil import price data, ensuring applicability for investors, policymakers, and energy enterprises.

Machine learning algorithms maintain a strong presence in crude oil price prediction due to their effectiveness. However, due to the temporal data, traditional or base models are not very reliable and need some adjustments for robustness. Thus, hybrid ensemble learning methods are more effective than base models as they utilize the strengths of individual models, resulting in highly robust and effective predictions that outperform base models, especially on temporal data. The goal of this study is to provide an ensemble hybrid model for the accurate prediction of the import price. By studying forecasted prices, effective policymaking, investment, and supply or demand management not only reduce losses but also offer strong strategic insight. The study focused on the improvement of deep learning and machine learning models to provide a hybrid ensemble learning model that can adapt itself to any type of data and is prone to adjust in multiple scenarios. The empirical analysis of the proposed model has further guaranteed its effectiveness.

The sections are arranged in the following manner: Section 1 introduces the literature review; Section 2 explains the individual models, including random forest, gated recurrent unit (GRU), convolutional neural network (CNN), extreme gradient boosting (XGBoost), functional partial least squares (FPLS) and stacking, along with the description of error metrics mean absolute error (MAE), mean absolute percentage error (MAPE), mean squared error (MSE), root mean squared error (RMSE), mean absolute squared error (MASE), and root mean squared scaled error (RMSSE); and Section 3 discusses and analyses the result. The study is concluded in the last section, Section 4.

2. Materials and Methods

This section provides an overview of the classical machine learning models, such as random forest, GRU, CNN, XGBoost, FPLS, and stacking, implemented to generate the proposed framework RF-GRU-CNN-XGBoost-FPLS-Stacking, along with a general overview of the error metrics. The methods deployed in this study are compatible with the stationary time series data, as it does not exhibit any trend or seasonality. These methods are specifically designed to offer high comprehensibility and authenticity.

2.1. Random Forest

Random forest is an ensemble learning method commonly used for classification, regression, anomaly detection, and feature selection. The random forest method is composed of multiple decision trees, which are trained individually on the subsets of the data drawn randomly with replacement. This approach combines diverse predictions, enhances model robustness, and reduces variances. The forest built by this algorithm is an ensemble of the decision trees trained with the bagging process. As training progressed, the decision trees trained and integrated their predictions via averaging or voting to enhance predictive performance and reduce overfitting.

The primary difficulties in implementing the random forest for time series forecasting are the historical data’s sequential nature and feature representation. It was not inherently designed for time series forecasting, as it faces difficulty in handling sequential and temporal dependencies. To adjust it for compatibility with time series forecasting, the data is transformed into a supervised learning format by creating features to capture the temporal patterns. To transform the historical time series sequential data into a structured format, lagged and other time-based features are assembled as follows:

Lagged features: Past observations are incorporated with input features $y (t - 1)$ , $y (t - 2)$ , …, $y (t - n)$ to predict $y (t)$ .
Rolling/aggregated features: Moving averages of rolling statistics such as mean, max, and min are incorporated over the past windows.
Date/time features: Depending on the seasonality, days of the week, month, holiday indicators, etc., are incorporated.

Once the data is transformed into a structured format, the random forest uses random samples of the data for building multiple decision trees, and then each tree becomes unique after being trained on a different subset of the data. The subset of features and variables to split the data is chosen randomly by the algorithm to create each tree, instead of utilizing the complete set of features, concurrently adding diversity to the trees. The random forest model is trained in the following manner:

Prepare inputs and outputs: The time-based and lagged features are used as inputs $(X)$ , whereas the target variable $y (t)$ is used as an output.
Split data: The dataset is divided into the training and test sets, such that the split preserves the temporal order and the future data does not leak into the training set.
Train the model: The random forest model is trained on the training data. To predict the target variable $y (t)$ , each tree in the forest recognizes the patterns in the input features.

For the implementation of the random forest algorithm, the tree depth is optimized using grid search, selecting a depth of 10 to prevent overfitting. The number of trees is set to 100, ensuring stability without excessive computational cost. We used Gini impurity to rank feature importance. After being trained, the random forest model predicts the test set for every decision tree under the reference of the dataset for which it was trained. The final prediction is the average of the predictions generated by all the trees.

The random forest model is flexible in capturing interactions between features and handling non-linear relationships. It also facilitates understanding of important lagged or temporal features for the predictions. The robustness of the random forest is so powerful that it can reduce the overfitting that might occur across the decision trees. Its ability to handle large datasets and missing values without sacrificing prediction accuracy makes it a highly reliable machine learning algorithm. Additionally, it is highly resilient in scenarios with non-dimensional relationships, high-dimensional input features, and sparse or irregular time series data.

2.2. Gated Recurrent Unit (GRU)

The gated recurrent unit (GRU) is a type of recurrent neural network that captures long-term dependencies in sequential time series data. GRU merges some components of the long short-term memory (LSTM) method to overcome the computational cost of the LSTM method due to its complex structure. No separate memory cells are used in the GRU, unlike the LSTM method; instead, it uses the two gated mechanisms. The information flow is controlled efficiently using the two main gates of the GRU method defined below:

Reset gate $(r_{t})$ : Determines the amount of past hidden state information $(h_{t - 1})$ that needs to be forgotten.
Update gate $(z_{t})$ : Determines the amount of past hidden state information $(h_{t - 1})$ to be preserved for the next iteration.

The internal mechanism of GRU can be explained by the following equations:

Reset gate: The reset gate assesses the extent to which the hidden state $h_{t - 1}$ must be forgotten.

$r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])$

(1)
Update gate: The update gate regulates the extent to which the new input $x_{t}$ influences in modifying the hidden state $h_{t - 1}$ .

$z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])$

(2)
Candidate’s hidden state $(h_{t}^{'}) :$ The potential hidden state is generated from the preceding state and the current input.

$h_{t}^{'} = \tan h (W_{h} \cdot [r_{t} \cdot h_{t - 1}, x_{t}])$

(3)
Hidden state $(h_{t}) :$ The weighted average of the candidate’s hidden state $h_{t}^{'}$ and the past hidden state $h_{t - 1}$ .

h_{t} = (1 - z_{t}) \cdot h_{t - 1} + z_{t} \cdot h_{t}^{'}

(4)

where

(W_{r}, W_{z}, W_{h})

are the weight matrices corresponding to the reset gate, update gate, and candidate’s hidden state,

σ

is the sigmoid activation function mapping the values between

0

and

1

, and

\tan h

is the hyperbolic tangent activation function mapping the values between

- 1

and

1

.

To implement the GRU algorithm, hidden units are tuned using Bayesian optimization, selecting 128 units for optimal sequence learning. The dropout rate is set to 0.2 to prevent overfitting. The learning rate is optimized at 0.001 using the Adam optimizer. The GRU method outperforms in adapting to the temporal dependencies, optimizing them for the prediction of future values influenced by historical data. It is trained with backpropagation through time (BPTT) and optimized utilizing the Adam or SGD algorithms. The method is feasible for determining both single-step and multi-step predictions. Due to its low computational cost, simplicity, and adaptability in handling the dynamic and varying dependencies of the time series data, GRU has enhanced forecasting efficiency.

2.3. Convolutional Neural Network (CNN)

A convolutional neural network is a deep learning algorithm that is equipped with the potential to capture the local patterns in the sequential time series data. It applies convolutional operations through specialized layers, systematically identifying and learning patterns or characteristics directly from the input data. This process enables effective handling of local dependencies and patterns within sequential datasets.

The key components of the CNN are as follows:

Convolutional layers: These layers act as the building blocks of the CNN algorithm. Convolutional operations are applied by convolutional layers to extract the features of the input data.
Activation function: Feature maps are computed by sliding the filter (or kernel) over the input matrix, then to introduce the non-linearity and enhance the learning capacity of the CNN model, non-linear activation functions are applied to the feature maps.
Pooling layers: To reduce the spatial dimensions of the feature maps, they are down-sampled, retaining most of the information they store. There are two common types of pooling operations—max pooling (extracts the maximum value within the pooling window) and average pooling (estimates the average within the pooling window).
Fully connected layers: All the neurons are connected between successive layers.
Dropout: A regularization approach for randomly deactivating the neurons to prevent overfitting.

For the model implementation of the CNN algorithm, the filter size is selected using grid search, testing

3 \times 3

,

5 \times 5

, and 7 × 7 filters. Max pooling (2 × 2) was chosen to reduce dimensionality while preserving key features. Activation function Rectified Linear Unit (ReLU) was used for non-linearity. CNN captures the local patterns by identifying the short-term dependencies in the time series data and has efficient computation as compared to the recurrent neural network since it processes the data in parallel. The CNN features an independent and robust process for feature extraction.

2.4. Extreme Gradient Boosting (XGBoost)

XGBoost is an advanced boosting algorithm that builds sequential decision trees. Each tree aims to correct residual errors from preceding trees, thus efficiently modeling intricate, non-linear interactions within the data, making it particularly effective in time series prediction tasks. Structured data is efficiently handled by the tree-based ensemble learning technique XGBoost, which requires less tuning compared to other deep learning models. Since it is not a recurrent model, the time series data needs to be transformed into a supervised learning format. The preprocessing of the data along with the model tuning is described below:

Data preprocessing: Restructure the time series data into a feature–target structure.
Training process: Map historical data to future predictions to convert the time series into a supervised learning formulation, and then the dataset is segmented into the training and testing datasets, maintaining the chronological order. The XGBosst model is trained taking future values as targets and historical data as features.
Model components: The components of the XGBoost model are as follows.
- Gradient boosting framework: The decision trees are built sequentially, and predictions are refined using gradient-based optimization. Successive trees are trained to reduce the residual errors of the preceding trees.
- Regularization: The complexity and overfitting of the model are prevented using L1 (Lasso) and L2 (Ridge) penalties.
- Loss function: Generally, loss functions are calculated using the Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) to evaluate the performance of the model.
- Tree pruning: To counteract the excessive growth, max depth, and min child weight constraints are employed by the XGBoost model.
Making predictions: Post-training, the future values are forecasted utilizing the recent observations’ lag features, external variables, and recursive predictions.

The learning rate is tuned across a range of 0.01 and 0.1 and optimized for stability by selecting 0.05. Max depth is set to 6, balancing complexity and generalization. XGBoost optimally handles the data gaps and the categorical features using boosted decision trees, leveraging the regularization technique, preventing overfitting, and supporting parallel computation to deliver exceptionally efficient and accurate predictions. Additionally, XGBoost can integrate non-time series features to enhance the accuracy of the model.

2.5. Functional Partial Least Squares (FPLS)

Functional partial least squares (FPLS) extend the partial least squares (PLS) to functional predictors

X_{i} (t)

designed for functional data analysis (FDA) (Ramsay & Silverman, 2005; Ferraty, 2006). It efficiently handles ill-posed inverse problems, such as functional regression models and high-dimensional functional predictors(Preda & Saporta, 2005).

PLS is useful in maximizing the covariance between the input (

X_{i}

) and output (

Y_{i}

) features, whereas FPLS extends the concept used in PLS to functional data (

X_{i} (t)

) curating predictors as a continuous function rather than discrete variables, maintaining optimal dimensional reduction to preserve the informative functional components for forecasting (Delaigle & Hall, 2012).

Consider a functional linear regression model:

Y = \int X (t) β (t) d t + ε

(5)

where

$Y$ is the scalar response,
$X (t)$ is the functional predictor,
$β (t)$ is the functional coefficient,
$ε$ is the error term.

FPLS projects

X (t)

onto the latent components that maximize the covariance with

Y

to estimate the functional coefficient

β (t)

. The steps to estimate the

β (t)

are as follows:

Compute the covariance operator between $X (t)$ and $Y$ .
Extract functional components that explain the most covariance. Mathematically, the FPLS components ( $ϕ_{k} (t)$ ) are obtained by solving the expression $\max_{ϕ_{k}} C o v (Y, 〈X, ϕ_{k}〉,)$ , subject to the orthogonality constraints.
Estimate $β (t)$ using the extracted functional components.

Assume

n

observations:

{\{(X_{i} (t), y_{i})\}}_{i = 1}^{n}

,

where

$X_{i} (t) \in L^{2} (T)$ , often centered;
$y_{i} \in R$ , scalar response.

At each step

k

, obtain a weight function

w_{k} (t) \in L^{2} (t)

to solve

w_{k} = a r g \max_{‖w‖ = 1} {C o v}^{2} (〈X_{i}, w〉, y_{i})

(6)

That is,

w_{k} = a r g \max_{‖w‖ = 1} {(\frac{1}{n} \sum_{i = 1}^{n} 〈X_{i}, w〉 y_{i})}^{2}

(7)

where

〈X_{i}, w〉 = \int X_{i} (t) w (t) d t

.

Now, define the score variable:

: t_{i k} = 〈X_{i}, w_{k}〉

(8)

This is the projection of

X_{i}

onto the functional direction

w_{k}

.

Regress

y_{i}

on

t_{i k}

to get the coefficient

c_{k}

and update the residuals:

y_{i}^{(k + 1)} = y_{i}^{(k)} - c_{k} t_{i k}

(9)

For deflation, update

X_{i}

by removing the part explained by

w_{k}

:

X_{i}^{(k + 1)} (t) = X_{i}^{(k)} (t) - t_{i k} p_{k} (t)

(10)

where

p_{k} (t)

is a loading function such that

p_{k} (t) = \frac{1}{n} \sum_{i = 1}^{n} t_{i k} X_{i} (t)

(11)

Each component

K

corresponds to the functional direction

w_{k} (t)

, along which

X_{i} (t)

is projected. The

K

component controls the complexity of the data. If it is too small, then it will underfit. If it is too large, then it will overfit. To evaluate

K

minimizing prediction error, cross-validation on training data is used. After

K

components, the final model is expressed as

\hat{y_{i}} = \bar{y} + \sum_{k = 1}^{K} c_{k} 〈X_{i}, w_{k}〉

(12)

The estimated regression function is

\hat{β} (t) = \sum_{k = 1}^{K} c_{k} w_{k} (t)

(13)

FPLS will be useful in crude price forecasting, where functional predictors arise from historical price movements represented as continuous curves, macroeconomic indicators modeled as functional time series, and volatility estimation, where price fluctuations require smooth functional approximations.

FPLS efficiently handles high-dimensional functional predictors. It maximizes covariance for improved forecasting accuracy and performs well in ill-posed inverse problems, ensuring better model stability. FPLS is more robust than functional principal component regression (FPCR). Despite the strengths of the FPLS method, there are certain limitations that influence the impact of the generalizability of the outcome. It is computationally intensive, especially for large datasets. Functional components should be selected carefully to avoid overfitting. FPLS is less interpretable as compared to other regression models.

2.6. Stacking

Stacked generalization, commonly referred to as stacking, is an ensemble learning technique that consolidates outputs from multiple models—random forest, GRU, CNN, XGBoost, and FPLS—to improve prediction efficiency. Stacking is structured into two stages—the base models are trained to render their own projections, and then those projections act as an input to train the meta-model to generate the final prediction. Stacking consolidates the strengths of the multiple models employed, having already captured the distinct features of the time series data. It boosts the model’s ability to enhance the generalization of unseen data by mitigating the bias and variance of the individual models.

The mechanism of the stacking algorithm is broken down in the following manner:

Base model: The independent predictions of the base models, random forest, CNN, GRU, XGBoost, and FPLS, are generated encoding the diverse temporal patterns of the time series data.
Meta-model: The simple regression models, such as linear or ridge regression, XGBoost, or neural networks, process the predictions generated by the base models to assemble a meta-model and train themselves to effectively aggregate them optimally.
Cross-validation in stacking: The meta-model is cross-validated using the predictions generated by the base models on the training dataset to facilitate the robustness in the training of the meta-model and prevent any leakages attributed to the unavailability of future data.
Final predictions: Upon cross-validation of the meta-model’s training, the predictions are generated by the stacked ensemble on the unseen data, just relying on the predictions of the base models.

Stacking’s effectiveness is sourced from the utilization of the capabilities of the base (classical) models and mitigating their weaknesses, transforming itself into a robust model to capture the complex temporal patterns in the data, enhancing its flexibility and precision in forecasting.

2.7. RF-GRU-CNN-XGBoost-FPLS-Stacking

The meta-model, RF-GRU-CNN-XGBoost-FPLS-Stacking, is the result generated by the stacking generalization algorithm from the combination of the base models random forest, gated recurrent unit, convolutional neural network, extreme gradient boosting, and functional partial least squares. The training process of the meta-model is as follows:

Train base models: Each base model is trained on the original training dataset.
Generate out-of-fold predictions: The dataset is split into $K$ folds. Each base model is trained on $K - 1$ folds and tested on the remaining fold, which ensures that predictions used for training the meta-model are not biased by the same training data.
Train the meta-model: The out-of-fold predictions from the base models are used as new features (input) for the meta-model. The meta-model is trained on these predictions to learn how to optimally combine them.

To prevent data leakage, the meta-model is trained only on predictions from unseen data, preventing it from learning directly from the training set. The dataset is divided into training and testing sets before stacking, ensuring that the test set remains completely independent. The stacking process uses cross-validation to ensure that predictions are generalizable and not overfitted.

The proposed hybrid model RF-GRU-CNN-XGBoost-Stacking incorporates multiple base models, increasing computational complexity but significantly improving forecasting accuracy. The stacking approach ensures a balanced trade-off between efficiency and precision, with inference times optimized for batch predictions. While GRU and CNN introduce higher computational costs, XGBoost and random forest help mitigate these concerns. The model can be deployed in real-time forecasting environments with cloud-based optimizations, feature selection, and distillation techniques. The random forest algorithm uses (

O (n \log n

)) computational time for constructing

n

number of trees, making it more efficient. However, it becomes slow with large feature sets. GRU uses (

O (T \cdot n^{2}

)) per time step, where

T

is the sequence length, and

d

is the hidden layer size. GRU is faster than LSTM, but still computationally intensive. CNN is efficient for local pattern recognition. XGBoost scales well, but it requires optimized feature selection. Since the stacking algorithm combines multiple models, it increases the training time but improves the prediction accuracy. Meta-learning in stacking introduces an extra layer of processing, requiring additional computational power. Compared to single models, stacking adds complexity but mitigates biases, making it more effective for high-variance crude oil price fluctuations. Each model in the RF-GRU-CNN-XGBoost-FPLS-Stacking ensemble requires careful tuning of hyperparameters to optimize forecasting accuracy. The grid and random Search are used for hyperparameter tuning across models. Cross-validation has ensured robust parameter selection by preventing overfitting. Iterative refinement is tested based on performance metrics such as MAE, MAPE, MASE, RMSE, MSE, and RMSSE. Autocorrelation analysis has identified optimal lag values based on crude oil price dependencies. Grid search has tested lag sizes from 1 to 12 months to find the best predictive performance. A lag size of 6 months was selected, balancing short-term trends and long-term dependencies.

Since stacking uses pre-trained models, its inference is fast and optimized for batch predictions rather than real-time updates. Deploying it on cloud systems (AWS, Azure) can parallelize computations for efficiency. On low-power devices, pruning techniques or model distillation could reduce complexity while maintaining accuracy.

2.8. Error Metrics

The error metrics are the statistical measures used to evaluate the accuracy of the predictions computed by the time series models. Comparison of various forecasting techniques is effectively measured by error metrics by quantifying the difference between predicted and actual values of the time series data.

The overview of the error metrics used in the study for the evaluation of the classical and proposed models is as follows:

2.8.1. Mean Absolute Error (MAE)

MAE measures the average magnitude of the difference between the predicted and actual values, without considering their directions.

MAE = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|

(14)

where

$y_{i}$ is the actual value at time $i$ ;
$\hat{y_{t}}$ is the predicted value at time $i$ ;
$n$ is the number of period.

MAE is scale-dependent and easy to interpret. It is less sensitive to large errors. A lower MAE implies better accuracy, since the model with the smallest MAE makes the least average error in predictions. A higher MAE makes a model less reliable, since it has more deviation from the actual values.

2.8.2. Mean Absolute Percentage Error (MAPE)

MAPE indicates that the forecast errors are the percentage of the actual values:

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} \frac{|y_{i} - \hat{y_{i}}|}{y_{i}}

(15)

MAPE is scale-independent and unitless. It cannot be used if any

y_{i} = 0

and if the values of

y_{i}

are very small, and they can be skewed. A lower MAPE implies better accuracy, as the models with the least MAPE make more reliable predictions. A higher MAPE makes the model less reliable, since the model with a higher MAPE has a larger deviation from the actual observation. It can be misleading if the actual values are extremely close to zero.

2.8.3. Mean Squared Error (MSE)

MSE estimates the average of the squared differences between the predicted and actual observations:

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}

(16)

MSE is capable of penalizing larger errors more heavily because of squaring. It is scale-dependent with squared units and is sensitive to the outliers. A lower MSE indicates a lesser average squared error, making the model a better fit as compared to others with a high MSE, which indicates larger deviations in predictions and is less reliable.

Two models having similar MAE, but much higher MSE indicate the presence of extreme outliers. Wide fluctuation in MSE may result in the model struggling with its generalization.

2.8.4. Root Mean Squared Error (RMSE)

RMSE takes the square root of the MSE, which makes it interpretable in the same units as the response variables:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(17)

RMSE is interpretable in the same units as its actual value

y

. Larger errors are heavily penalized by RMSE, as it checks the impact of large errors, and it is often used for comparison of models. A model having the smallest RMSE has fewer prediction errors, making it a better fit as compared to other models with higher RMSE, which indicates that the predictions are deviating more from the actual observations.

Two models having similar MAE, but much higher MSE indicate the presence of extreme outliers. Wide variations in RMSE indicate that the model may not generalize well. It is useful when the large deviations are more important than the small deviations.

2.8.5. Mean Absolute Squared Error (MASE)

MASE compares the MAE of a model to the MAE of a naïve forecast.

MASE = \frac{\frac{1}{n} \sum_{i = 1}^{n} |y_{i} - \hat{y_{i}}|}{\frac{1}{n - 1} \sum_{i = 2}^{n} |y_{i} - y_{i - 1}|}

(18)

MASE is a scale-independent error metric that can be compared across different datasets. A MASE less than

1

indicates that the performance of the model is better than the naïve forecast, whereas a MASE greater than

1

indicates the worst performance of the model. MASE is capable of handling intermittent demand and avoids issues with zero values in the data.

2.8.6. Root Mean Squared Scaled Error (RMSSE)

RMSSE is the square root of a scaled MSE or MASE:

RMSSE = \sqrt{\frac{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\frac{1}{n - 1} \sum_{i = 2}^{n} {(y_{i} - y_{i - 1})}^{2}}}

(19)

RMSSE is scale-independent and highly useful in hierarchical time series. It penalizes the large errors in a similar manner to RMSE, but in a normalized manner. A model comprising the least RMSSE has fewer scaled errors, whereas a higher RMSSE value leads to more deviation from the actual value. That means,

RMSSE < 1

indicates better performance of the model as compared to the naïve forecast, whereas

RMSSE > 1

indicates the worst performance of the model.

To understand the complete picture of the accurate prediction of all the classical models and the proposed model, the error metrics MAE, MAPE, MASE, MSE, RMSE, and RMSSE are paired up to enhance the accurate interpretation of the predictions provided by the models.

3. Results and Discussion

Comprehensive processing of the base models was conducted before training the stacking algorithm, which produced the results of the hybrid RF-GRU-CNN-XGBoost-FPLS-Stacking, ensuring consistent and accurate predictions. The preprocessing includes the predictive results of the base models, random forest, GRU, CNN, XGBoost, and FPLS, which will serve as the input of the stacking algorithm. Each base model works on the temporal patterns of the datasets and grasps the tendencies of the time series data.

3.1. Data Description

The primary dataset of the international import prices of crude oil (Indian Basket) is gathered manually from the official website of the Petroleum Planning and Analysis Cell of the Government of India (Petroleum planning and analysis cell of the government of India, n.d.) (https://ppac.gov.in/prices/international-prices-of-crude-oil, accessed on 26 December 2024). The dataset contains crude oil import prices from February 2010 to February 2024. The temporal dataset of crude oil prices is partitioned into the training and testing datasets. The training set incorporated the temporal scope between February 2010 and December 2020, whereas the testing set incorporated the scope between January 2021 and February 2024. The abscissa of Figure 1 represents the time frame of the datasets, and the ordinate of Figure 1 represents the price of the crude oil (in dollars) imported in a month.

3.2. Random Forest

Figure 2 illustrates the predictions of the forecasted values by the random forest method. The blue line of the plot represents the actual price of the crude oil, whereas the predicted price using the random forest method is shown by the dotted orange line. In the plot, the data points seem to be highly clustered around January 2022, and there are a few scattered points. The predicted line is much smoother than the actual line, which is a common occurrence due to average predictions over multiple trees, leading to underfitting in highly volatile data. Large gaps between the actual and predicted values, especially for the higher prices, are visible around early 2022 and 2023, indicating the model’s struggles to capture the spikes or drops in the import prices. The actual price line seems to be more erratic, whereas the predicted line shows a gentle upward trend. Thus, there is a possibility of missed short-term fluctuations as compared to the increasing trend.

The prediction line partially overlaps with the actual price line at some points; however, the peaks are underpredicted, and the model overpredicts the troughs. The random forest model can pick the long-term upward trend, providing reasonably stable predictions. Apparently, there seems to be some poor handling of the seasonality, and it failed to learn the temporal structure, resulting in over-smoothing, leading to inaccurate peak predictions.

3.3. GRU

The sharper responses to the recent trends are depicted by the forecast of the GRU model as compared to the random forest model, referring to the illustration of Figure 3. The gray line of Figure 3 represents the actual price of the crude oil, whereas the predicted price using the GRU method is shown by the dotted red line. A noticeable upward strike was captured in the early forecasts, and then some sensitivity to short-term dependencies was reflected by the drops off. The forecasted lines have visible fluctuations, unlike the smooth predicted line of the random forest model, suggesting that GRU performs better at adapting to non-linear changes in the time series data. It follows the directional trend of actual data reasonably during the forecast period. Nevertheless, the magnitude might be slightly misaligned depending on the volatility of the training dataset.

Concluding, the temporal dependencies are handled well due to the GRUs’ tendency to capture the time-based patterns naturally. It is highly responsive to the trend shifts and seems to be more realistic than the random forest method. The confidence in the predictions is assessed rarely due to the lack of uncertainty bands.

3.4. CNN

The plot of actual data illustrated in Figure 4 is visibly volatile, having large jumps and dips. The blue line represents the actual price of the crude oil, whereas the predicted price using the CNN method is shown by the dotted orange line. The predicted line shows a more linear and smoother trend. A noticeable mismatch in direction and amplitude appears around early 2022, in that actual prices rise sharply, while CNN predictions increase gradually. Similarly, around early 2023 and 2024, a dip in the actual values is evident as compared to the rise or relatively flat CNN predictions. CNN predictions appear consistent in tracking the general trends moderately well, but sharp peaks and troughs are missed, resulting in overestimating or underestimating the turning points.

CNNs are not inherently sequential. Therefore, they work with fixed windows and are prone to missing long-term dependencies. Extreme predictions are often avoided, leading to underfitting sharp variations and often struggles with seasonality or irregular spikes. Thus, the performance of the CNN model for the given dataset is acceptable for stable trends, but it is not good with the volatility.

3.5. XGBoost

Figure 5 demonstrates that the model was trained using a sliding window or lookback period from prior years to predict the recent values. The blue line represents the actual price of the crude oil, whereas the predicted price using the XGBoost method is shown by the dotted orange line. The XGBoost model has captured the general trend shifts quite well, as shown by the upward trend from 2021 into 2022; similarly, the downward turn is forecasted correctly into late 2022 and 2023. The over-smoothing of the data is completely avoided by the XGBoost model. In the predicted values, there are very few sharp fluctuations, and very little noise is present. To enhance the stability, it has a trade-off in the accuracy at the turning points.

Due to the predictions smoothing over short-term volatility, it has underpredicted the peak values and overpredicted the troughs. The forecast lags actual upward momentum slightly in late 2023.

XGboost struggles with temporal data and long-term forecasting or seasonality without any explicit features. Like GRU and CNN, turning points are not captured with high fidelity. It even misses the sharp transitions of inflexion points due to its reliance on feature-driven splits.

Summing up, except for sharp turns, the XGBoost model is good in trend and noise modeling, coming up as a useful medium-term stable series.

3.6. FPLS

The smooth, long-term trends of the crude oil price modeled by the FPLS method are demonstrated in Figure 6. The blue line shows the actual monthly crude oil import price, whereas the orange line represents the forecasted values generated by the FPLS model trained on the historical data of crude oil import prices. The prediction line resembles the actual price well and captures the general upward and downward movements, seasonal effects, and smoother long-term trends, which implies FPLS’s effectiveness in modeling the functional patterns and global trends in the time series. Despite providing smooth curves, FPLS misses the sharp spikes and dips around 2021 and 2023 by underestimating and smoothing the predictions. FPLS inherently projects the data into a lower-dimensional functional space that captures smooth latent structures but fails to reflect high-frequency variations. The model has tracked the central movement of the actual price with moderate accuracy, even though it lagged at turning points, from 2000 onwards.

FPLS has effectively captured the smooth long-term trends in the crude oil price, but it has struggled with high volatility. Even though it is suitable for medium- to long-term forecasting with dominant functional patterns, it lags for short-term precision.

3.7. Proposed Hybrid Model RF-GRU-CNN-XGBoost-FPLS-Stacking

The baseline models (random forest, GRU, CNN, XGBoost, and FPLS) were reimplemented using the same dataset and preprocessing steps to ensure a fair comparison with the proposed ensemble model from January 2021 to February 2024. This ensures reproducibility and allows direct comparison with the proposed hybrid model.

Figure 7 illustrates the predictions generated by the base models along with the predictions generated by our hybrid model, represented by stacking with a pink line. The blue line represents the actual price of the crude oil, which is set as the benchmark for the evaluation of forecasting accuracy, purple line represents the prediction performed by the GRU model, the green line represents the prediction of the CNN model, the red line represents the prediction of the XGBoost model, the orange line represents the prediction of random forest model, and the brown line is the prediction of FPLS model.

Stacking, XGBoost, and GRU closely follow the actual price trends, whereas others have lagged or oversimplified the plot. FPLS produces smoother results; CNN overshoots in some areas, while random forest is slightly erratic. FPLS captured the general trend well, but it struggled with spiky behavior. CNN and random forest have shown sharp swings, but they tend to overshoot in high-variance regions such as mid-2022.

The base models vary significantly in detecting turning points. GRU and Stacking show good alignment in some regions, whereas CNN and random forest missed or delayed certain turning points. FPLS provides a smooth and stable approximation but underestimates high peaks around mid-2022 and flattens sharp transitions. FPLS is less responsive to abrupt market-driven shocks, which are common in crude oil, but it works well when functional structure dominates.

The price dynamics of crude oil are complex, which makes it difficult for single models to capture all characteristics of the price. Some models follow the trend in a better way, whereas others are better at smoothing volatility. Crude oil, being a volatile commodity, requires models that balance smoothness with adaptability. Hybrid ensemble models have often performed better in such scenarios. While FPLS is good for stable trend forecasts, it failed to capture short-term volatility. Thus, a hybrid stacked model, RF-GRU-CNN-XGBoost-FPLS-Stacking, is the most robust choice for crude oil time series forecasting, which can handle its volatility and structural complexity very well.

Table 1 demonstrates the pros (advantages) and cons (disadvantages) of the base models along with the proposed model.

3.8. Matrix Evaluation and Comparative Analysis

Table 2 demonstrates the comparison of the predictions on the individual models and the proposed hybrid models—random forest, GRU, CNN, XGBoost, FPLS, and the RF-GRU-CNN-XGBoost-FPLS-Stacking.

According to the predicted values, the prediction of the proposed model is closest to the actual price as compared to the individual models. The proposed model effectively balances underestimation and overestimation, representing a superior bias–variance trade-off compared to individual models. The random forest often predicts higher values than the actual price. CNN frequently overshoots around rapid increases. XGBoost often produces constant or repetitive values, indicating poor adaptation to recent trends. GRU performs better than CNN and XGBoost, but still misses some peaks and valleys. FPLS provides values similar to actual values at some points, but it smoothens at peaks. The proposed hybrid model, RF-GRU-CNN-XGBoost-FPLS-Stacking, captures trend direction and magnitude more reliably, providing more accurate predictions.

The predictions of the XGBoost model remain flat or fixed across many months, suggesting a lack of temporal generalization and indicating it is not individually suitable for dynamic forecasting.

Around late 2022 to early 2023, significant fluctuations are present in the actual price values. Whereas CNN and GRU over-predict while under-predictions are seen in the XGBoost, the shift is tracked in a better way by the stacking model, though it still lags at the sharp transitions. Therefore, while temporal volatility challenges all models, the risk of extreme errors is reduced in the ensemble method.

It is evident that the stacking model maintains the generalization even in the long-term forecast horizon. It is reliable for complex, volatile time series.

Random forest showed moderate relative error, high absolute errors, and an extremely high MASE, indicating poor scaling compared to the naïve forecast. RMSE and MSE are moderate, whereas RMSSE inflates extremely, possibly reflecting poor model adaptation to trend shifts. The random forest model has captured the general trend better than FPLS, but it has provided poor scaled performance and it is unstable on outliers and volatile regions.

XGBoost has the highest relative error and worst absolute deviation. It is highly unreliable with large error variance and the performance deteriorated with variance. XGBoost may perform well in certain segments with trend shifts but it has the worst general and scaled performance. Overfitting leads to poor generalization.

MAPE of CNN is similar to random forest and performed better than XGBoost but worse than GRU. CNN has relatively good scaled errors and large-scaled errors, indicating sensitivity to spikes. Here, its RMSSE is slightly better than that of FPLS but worse than that of GRU. Overall, CNN is moderately scalable and adaptive, but it is highly sensitive to short-term spikes and transitions.

GRU has significantly lower MAPE as compared to CNN/XGBoost/random forest, but it has the second-best absolute and scaled-adjusted performance. It has a strong forecast ability with an MASE value of 1.4752. Conclusively, GRU is well-suited for sequential data, and it has a strong temporal pattern recognition tendency, but it still has moderate variance in large deviations.

FPLS has the second-highest relative error and slightly worse absolute error than GRU. It has extremely poor scaled performance, with the highest MSE indicating poor scale to naïve baseline models and the worst RMSSE. Concluding, FPLS smooths over noise and captures long-term trends, but it misses sharp transitions and performs poorly in volatile zones.

The lower the MAPE, the better the model. From Table 3, it is evident that the proposed hybrid model has the lowest values across all error metrics, indicating its superior performance over all base models. It has combined the strengths of all base models and reduced the weaknesses via weighted learning. For better prediction accuracy, careful meta-model tuning is required since it is slightly complex to implement and maintain. The Stacking ensemble model RF-GRU-CNN-XGBoost-FPLS-Stacking has provided the most accurate and reliable forecast, outperforming all individual base models in MAPE, MAE, MASE, RMSE, and RMSSE, demonstrating the scale adaptability, robustness, and low forecast errors.

4. Conclusions

The analysis indicates that the proposed hybrid ensemble learning model, RF-GRU-CNN-XGBoost-FPLS-Stacking, delivers the most accurate and reliable forecasts of crude oil import prices among all evaluated models. It consistently achieves the lowest values across multiple error metrics, including MAPE, MAE, and RMSE, reflecting high precision in its predictions. The GRU model ranks closely behind, demonstrating strong performance in capturing temporal patterns. The CNN model performs moderately well but exhibits some fluctuations in accuracy. FPLS tends to smooth the predictions over noise, and the long-term trends are captured well, but the sharp transitions and volatile zones are completely missed. Conversely, the XGBoost model shows the highest error values, suggesting limited effectiveness in handling sequential data in this context. The random forest model performs better than XGBoost but remains less effective compared to the neural and ensemble-based approaches.

Overall, the results highlight that combining multiple models using RF-GRU-CNN-XGBoost-FPLS-Stacking improves forecast accuracy and overall performance. From the study, it is evident that for high-stakes crude oil forecasting requiring high accuracy and generalization, stacking or hybrid models should be paired with simpler models as benchmarks to enhance interpretability.

While the proposed RF-GRU-CNN-XGBoost-FPLS-Stacking model achieves robust forecasting performance, several limitations exist. One challenge is the computational complexity, which may restrict real-time implementation, particularly in resource-constrained environments. Additionally, although the ensemble model effectively balances short- and long-term dependencies, it does not explicitly integrate external macroeconomic indicators such as exchange rates, global demand trends, or policy shifts, which can significantly impact crude oil price movements. Furthermore, sharp price fluctuations remain difficult to predict accurately, as extreme volatility events may require more sophisticated models.

For further enhancement of forecasting accuracy and applicability, future studies could focus on the following improvements. Incorporating transformer-based deep learning models such as temporal fusion transformer (TFT) and informer could enhance trend detection, short-term adaptability, and feature relevance. Future models could integrate macroeconomic feature expansion, such as global oil supply–demand trends, financial market movements, currency exchange rates, and geopolitical risk factors, to improve predictive accuracy. Computational efficiency can be enhanced using pruning techniques, quantization, and parallel processing for faster inference. The study could be strengthened through comparisons with models like xLSTM, hybrid econometric techniques, and adaptive learning architectures, ensuring a more comprehensive evaluation. These future directions aim to refine the crude oil forecasting model, making it more versatile, accurate, and adaptable to dynamic economic conditions. A more holistic integration of macroeconomic variables and deep learning techniques will further strengthen prediction reliability, benefiting policymakers, investors, and energy analysts. The analysis of crude oil import prices provides valuable insights for energy sector investors, importers, and industries planning purchases or hedging against price volatility using futures. The accurate forecast helps adjust the production schedules and inventory management more efficiently. The crude oil prices influence stock values in the energy, airline, logistics, and petrochemical sectors. It is helpful for governments and investors to understand future supply–demand dynamics and formulate effective strategies.

Author Contributions

Software, J.C. and S.M.; Validation, J.C., H.K.S. and P.M.; Formal Analysis, J.C.; Data Curation, J.C.; Writing—Original Draft, J.C.; Writing—Review and Editing, H.K.S. and S.M.; Supervision, H.K.S. and P.M.; Methodology, J.C. and H.K.S.; Visualization, P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study are available from the first/corresponding author upon request.

Conflicts of Interest

The authors have no conflicts of interest.

References

Abdollahi, H. (2020). A novel hybrid model for forecasting crude oil price based on time series decomposition. Applied Energy, 267, 115035. [Google Scholar] [CrossRef]
Alvarez-Ramirez, J., Cisneros, M., Ibarra-Valdez, C., & Soriano, A. (2002). Multifractal Hurst analysis of crude oil prices. Physica A: Statistical Mechanics and Its Applications, 313(3–4), 651–670. [Google Scholar] [CrossRef]
Andrikopoulos, A., Merika, A., & Stoupos, N. (2025). The effect of oil prices on the US shipping stock prices: The mediating role of freight rates and economic indicators. Journal of Commodity Markets, 38, 100474. [Google Scholar] [CrossRef]
Banik, R., & Biswas, A. (2024). Enhanced renewable power and load forecasting using RF-XGBoost stacked ensemble. Electrical Engineering, 106(4), 4947–4967. [Google Scholar] [CrossRef]
Chai, J., Lu, Q., Hu, Y., Wang, S., Lai, K. K., & Liu, H. (2018). Analysis and bayes statistical probability inference of crude oil price change point. Technological Forecasting and Social Change, 126, 271–283. [Google Scholar] [CrossRef]
Chen, J., Zhang, C., Li, X., He, R., Wang, Z., Nazir, M. S., & Peng, T. (2025). An integrative approach to enhance load forecasting accuracy in power systems based on multivariate feature selection and selective stacking ensemble modeling. Energy, 326, 136337. [Google Scholar] [CrossRef]
Cheng, Z., Li, M., Sun, Y., Hong, Y., & Wang, S. (2024). Climate change and crude oil prices: An interval forecast model with interval-valued textual data. Energy Economics, 134, 107612. [Google Scholar] [CrossRef]
Cui, J., Kuang, W., Geng, K., Bi, A., Bi, F., Zheng, X., & Lin, C. (2024). Advanced short-term load forecasting with XGBoost-RF feature selection and CNN-GRU. Processes, 12(11), 2466. [Google Scholar] [CrossRef]
Delaigle, A., & Hall, P. (2012). Methodology and theory for partial least squares applied to functional data. Annals of Statistics, 40(1), 322–352. [Google Scholar] [CrossRef]
Deng, S., Xiang, Y., Nan, B., Tian, H., & Sun, Z. (2020). A hybrid model of dynamic time wrapping and hidden Markov model for forecasting and trading in crude oil market. Soft Computing, 24, 6655–6672. [Google Scholar] [CrossRef]
Ding, Y. (2018). A novel decompose-ensemble methodology with AIC-ANN approach for crude oil forecasting. Energy, 154, 328–336. [Google Scholar] [CrossRef]
Ferraty, F. (2006). Nonparametric functional data analysis. Springer. [Google Scholar]
Fozap, F. M. P. (2025). Hybrid machine learning models for long-term stock market forecasting: Integrating technical indicators. Journal of Risk and Financial Management, 18(4), 201. [Google Scholar] [CrossRef]
Harikrishnan, G. R., & Sreedharan, S. (2025). Advanced short-term load forecasting for residential demand response: An XGBoost-ANN ensemble approach. Electric Power Systems Research, 242, 111476. [Google Scholar]
He, K., Yu, L., & Lai, K. K. (2012). Crude oil price analysis and forecasting using wavelet decomposed ensemble model. Energy, 46(1), 564–574. [Google Scholar] [CrossRef]
He, Y., Yan, Y., & Xu, Q. (2019). Wind and solar power probability density prediction via fuzzy information granulation and support vector quantile regression. International Journal of Electrical Power & Energy Systems, 113, 515–527. [Google Scholar]
Huang, B., Sun, Y., & Wang, S. (2021). A new two-stage approach with boosting and model averaging for interval-valued crude oil prices forecasting in uncertainty environments. Frontiers in Energy Research, 9, 707937. [Google Scholar] [CrossRef]
Jiang, Z., Dong, X., & Yoon, S. M. (2025). Impact of oil prices on key energy mineral prices: Fresh evidence from quantile and wavelet approaches. Energy Economics, 145, 108461. [Google Scholar] [CrossRef]
Khan, M., Karim, S., Naz, F., & Lucey, B. M. (2025). How do exchange rate and oil price volatility shape pakistan’s stock market? Research in International Business and Finance, 2025, 102796. [Google Scholar] [CrossRef]
Kim, M. S., Lee, B. S., Lee, H. S., Lee, S. H., Lee, J., & Kim, W. (2020). Robust estimation of outage costs in South Korea using a machine learning technique: Bayesian tobit quantile regression. Applied Energy, 278, 115702. [Google Scholar] [CrossRef]
Kumari, K., Sharma, H. K., Chandra, S., & Kar, S. (2022). Forecasting foreign tourist arrivals in India using a single time series approach based on rough set theory. International Journal of Computing Science and Mathematics, 16(4), 340–354. [Google Scholar] [CrossRef]
Li, J., Hong, Z., Zhang, C., Wu, J., & Yu, C. (2024). A novel hybrid model for crude oil price forecasting based on MEEMD and Mix-KELM. Expert Systems with Applications, 246, 123104. [Google Scholar] [CrossRef]
Li, R., Hu, Y., Heng, J., & Chen, X. (2021). A novel multiscale forecasting model for crude oil price time series. Technological Forecasting and Social Change, 173, 121181. [Google Scholar] [CrossRef]
Liang, Q., Lin, Q., Guo, M., Lu, Q., & Zhang, D. (2025). Forecasting crude oil prices: A gated recurrent unit-based nonlinear granger causality model. International Review of Financial Analysis, 102, 104124. [Google Scholar] [CrossRef]
Liu, L., Zhou, S., Jie, Q., Du, P., Xu, Y., & Wang, J. (2024). A robust time-varying weight combined model for crude oil price forecasting. Energy, 299, 131352. [Google Scholar] [CrossRef]
Ma, Y., Li, S., & Zhou, M. (2024). Forecasting crude oil prices: Does global financial uncertainty matter? International Review of Economics & Finance, 96, 103723. [Google Scholar]
Manickavasagam, J., Visalakshmi, S., & Apergis, N. (2020). A novel hybrid approach to forecast crude oil futures using intraday data. Technological Forecasting and Social Change, 158, 120126. [Google Scholar] [CrossRef]
Movagharnejad, K., Mehdizadeh, B., Banihashemi, M., & Kordkheili, M. S. (2011). Forecasting the differences between various commercial oil prices in the Persian Gulf region by neural network. Energy, 36(7), 3979–3984. [Google Scholar] [CrossRef]
Oglend, A., & Kleppe, T. S. (2025). Storage scarcity and oil price uncertainty. Energy Economics, 144, 108393. [Google Scholar] [CrossRef]
Petroleum planning and analysis cell of the government of India. (n.d.). Available online: https://ppac.gov.in/prices/international-prices-of-crude-oil (accessed on 26 December 2024).
Preda, C., & Saporta, G. (2005). Clusterwise PLS regression on a stochastic process. Computational Statistics & Data Analysis, 49(1), 99–108. [Google Scholar]
Qian, Q., Li, M., & Xu, J. (2022). Dynamic prediction of multivariate functional data based on functional kernel partial least squares. Journal of Process Control, 116, 273–285. [Google Scholar] [CrossRef]
Ramsay, J. O., & Silverman, B. W. (2005). Principal components analysis for functional data. Functional Data Analysis, 2005, 147–172. [Google Scholar]
Rao, A., Sharma, G. D., Tiwari, A. K., Hossain, M. R., & Dev, D. (2025). Crude oil price forecasting: Leveraging machine learning for global economic stability. Technological Forecasting and Social Change, 216, 124133. [Google Scholar] [CrossRef]
Satija, A., & Caers, J. (2015). Direct forecasting of subsurface flow response from non-linear dynamic data by linear least-squares in canonical functional principal component space. Advances in Water Resources, 77, 69–81. [Google Scholar] [CrossRef]
Sharma, H. K., & Kumari, K. (2025). Tourist arrivals demand forecasting using rough set-based time series models. Decision Making Advances, 3(1), 216–227. [Google Scholar] [CrossRef]
Sharma, H. K., Kumari, K., & Kar, S. (2018). Air passengers forecasting for Australian airline based on hybrid rough set approach. Journal of Applied Mathematics, Statistics and Informatics, 14(1), 5–18. [Google Scholar] [CrossRef]
Shen, L., Bao, Y., Hasan, N., Huang, Y., Zhou, X., & Deng, C. (2024). Intelligent crude oil price probability forecasting: Deep learning models and industry applications. Computers in Industry, 163, 104150. [Google Scholar] [CrossRef]
Sun, J., Zhao, P., & Sun, S. (2022). A new secondary decomposition-reconstruction-ensemble approach for crude oil price forecasting. Resources Policy, 77, 102762. [Google Scholar] [CrossRef]
Sun, J., Zhao, X., & Xu, C. (2021). Crude oil market autocorrelation: Evidence from multiscale quantile regression analysis. Energy Economics, 98, 105239. [Google Scholar] [CrossRef]
Tang, Q., Li, H., Feng, S., Guo, S., Li, Y., Wang, X., & Zhang, Y. (2025). Predicting the changes in international crude oil trade relationships via a gravity heuristic algorithm. Energy, 322, 135567. [Google Scholar] [CrossRef]
Ullah, S., Chen, X., Han, H., Wu, J., Liu, R., Ding, W., Liu, M., Li, Q., Qi, H., Huang, Y., & Yu, P. L. (2025). A novel hybrid ensemble approach for wind speed forecasting with dual-stage decomposition strategy using optimized GRU and Transformer models. Energy, 2025, 136739. [Google Scholar] [CrossRef]
Wang, D., Lu, Z., Liu, Z., Xue, S., Guo, M., & Hou, Y. (2025). A functional mixture prediction model for dynamically forecasting cumulative intraday returns of crude oil futures. International Journal of Forecasting. in press. [Google Scholar]
Wang, J., Zhou, H., Hong, T., Li, X., & Wang, S. (2020). A multi-granularity heterogeneous combination approach to crude oil price forecasting. Energy Economics, 91, 104790. [Google Scholar] [CrossRef]
Wei, H., & Liu, Y. (2025). Commodity import price rising and production stability of Chinese firms. Structural Change and Economic Dynamics, 73, 434–448. [Google Scholar] [CrossRef]
Xu, Y., Liu, T., Fang, Q., Du, P., & Wang, J. (2025). Crude oil price forecasting with multivariate selection, machine learning, and a nonlinear combination strategy. Engineering Applications of Artificial Intelligence, 139, 109510. [Google Scholar] [CrossRef]
Yang, C. W., Hwang, M. J., & Huang, B. N. (2002). An analysis of factors affecting price volatility of the US oil market. Energy Economics, 24(2), 107–119. [Google Scholar] [CrossRef]
Yang, Y., Guo, J. E., Sun, S., & Li, Y. (2021). Forecasting crude oil price with a new hybrid approach and multi-source data. Engineering Applications of Artificial Intelligence, 101, 104217. [Google Scholar] [CrossRef]
Zhang, C., & Zhou, X. (2024). Forecasting value-at-risk of crude oil futures using a hybrid ARIMA-SVR-POT model. Heliyon, 10(1), e23358. [Google Scholar] [CrossRef]
Zhang, J., & Liu, Z. (2024). Interval prediction of crude oil spot price volatility: An improved hybrid model integrating decomposition strategy, IESN and ARIMA. Expert Systems with Applications, 252, 124195. [Google Scholar] [CrossRef]
Zhang, W., Wu, J., Wang, S., & Zhang, Y. (2025). Examining dynamics: Unraveling the impact of oil price fluctuations on forecasting agricultural futures prices. International Review of Financial Analysis, 97, 103770. [Google Scholar] [CrossRef]
Zhao, S., Wang, Y., Deng, J., Li, Z., Deng, G., Chen, Z., & Li, Y. (2025). An adaptive multi-factor integrated forecasting model based on periodic reconstruction and random forest for carbon price. Applied Soft Computing, 177, 113274. [Google Scholar] [CrossRef]

Figure 1. Crude oil import price in India from February 2010 to February 2024.

Figure 2. Predicted vs. actual crude oil import price by the random forest.

Figure 3. Predicted vs. original crude oil import price by the GRU method.

Figure 4. Predicted vs. actual crude oil import price by the CNN method.

Figure 5. Predicted vs. original crude oil import price by the XGBoost method.

Figure 6. Predicted vs. actual crude oil import price by the FPLS method.

Figure 7. Base and proposed models prediction comparison with the original crude oil import price.

Table 1. Forecasting Models and theoretical comparison.

Model	Pros	Cons
GRU	Captures temporal dependencies well	Lags at sharp changes; may overfit with small data
CNN	Detects local patterns; captures spatial temporal patterns	Prone to overshooting; needs lots of training data
XGBoost	Non-linear and stable	Misses temporal nuance; may lag at turning points
Random Forest	Good at fitting and non-linear trends	Sensitive to lag selection; unstable transitions; overfits and noisy
FPLS	Captures trend; robust to noise	Misses volatility; Underestimates spikes
RF-GRU-CNN- XGBoost-FPLS-Stacking	Combines strengths; generalizes well	More complex; requires tuning and base diversity

Table 2. Comparison of different time series forecasting models.

Month	Actual Price	Random Forest	CNN	XGBoost	GRU	FPLS	RF-GRU-CNN- XGBoost- FPLS-Stacking
Jan 2021	10,887	11,693	11,390	7754.9	11,689	4531	10,730
Feb 2021	10,589	12,093	10,860	8377.1	10,927	4448.2	10,795
Mar 2021	9637	11,281	12,477	8377.1	10,232	4154.5	10,364
Apr 2021	9340	9514.9	11,743	8377.1	9421.4	3694.3	9385
May 2021	9075	9148.2	9743.5	8922.2	8976.3	11,866	9072.3
Jun 2021	7976	8713.9	9491.5	8637.7	8921.2	11,897	8617.3
Jul 2021	8289	8543.6	9193.2	8637.7	8501.3	11,293	8606.4
Aug 2021	8262	8327.7	9208.2	8922.2	8517	11,153	8774.4
Sep 2021	8555	9319.6	10,036	7754.9	9117.4	9831.7	8938
Oct 2021	10,895	8906.2	10,514	7754.9	9838	7883	10,675
Nov 2021	10,421	11,371	12,461	7754.9	11,169	7212.5	10,581
Dec 2021	11,351	11,273	13,017	7754.9	11,727	8635.5	11,075
Jan 2022	11,445	11,694	13,926	7754.9	12,507	8275.3	11,476
Feb 2022	12,253	11,050	14,413	8377.1	12,836	6092.8	12,033
Mar 2022	12,087	11,787	15,384	8377.1	13,106	8502.5	12,236
Apr 2022	11,827	11,950	15,050	8377.1	12,957	9142.5	11,881
May 2022	13,107	11,917	13,561	8922.2	12,440	10,637	13,370
Jun 2022	16,265	11,776	13,623	8637.7	12,488	10,664	14,725
Jul 2022	15,597	12,492	13,614	8637.7	13,402	11,220	14,890
Aug 2022	15,469	13,800	12,483	8922.2	13,089	11,278	14,549
Sep 2022	16,815	14,729	12,033	7754.9	12,389	11,567	14,704
Oct 2022	10,852	15,677	12,193	7754.9	12,111	11,474	12,245
Nov 2022	10,628	12,395	10,923	7754.9	9497	10,958	10,981
Dec 2022	10,044	11,065	10,511	7754.9	8253.5	12,022	10,309
Jan 2023	10,362	10,006	9755.1	7754.9	7727.3	14,489	10,505
Feb 2023	10,936	10,156	9708.5	8377.1	7904.5	14,105	10,929
Mar 2023	10,894	11,303	9366.7	8377.1	8616.8	14,409	11,068
Apr 2023	11,864	11,160	9128.8	8377.1	9380	15,566	11,458
May 2023	11,455	11,683	10,900	8922.2	10,517	11,469	11,242
Jun 2023	11,383	12,290	11,813	8637.7	10,988	9714.9	11,322
Jul 2023	12,276	11,990	12,442	8637.7	11,058	9677.1	11,543
Aug 2023	10,154	11,936	12,836	8922.2	11,362	10,119	10,392
Sep 2023	10,894	9710	9914	11,173	10,683	10,536	10,961
Oct 2023	11,864	9277	10,621	7806	10,642	10,822	11,241
Nov 2023	11,455	11,692	12,601	11,203	11,105	12,680	11,498
Dec 2023	11,383	12,030	13,184	11,865	10,993	12,253	11,486
Jan 2024	12,276	11,649	12,519	11,388	10,935	11,196	11,617
Feb 2024	10,154	11,894	13,112	11,350	11,374	12,892	10,628

Table 3. Results of the error metrics of the individual and the proposed hybrid model.

Errors	Random Forest	XGBoost	CNN	GRU	FPLS	RF-GRU-CNN- XGBoost- FPLS-Stacking
MAPE	13.82%	16.26%	14.08%	10.86%	15.39%	3.45%
MAE	1380.5	3433.1	1614.4	1302.4	1432.2	410.48
MASE	1394	40.882	1.8287	1.4752	$5707 \bar{9} 00$	0.4663
RMSE	1385.1	$1626 \bar{9} 000$	1971.6	1680.4	2389.1	607.98
RMSSE	1379	4033.5	1.3456	1.147	1.274	0.4313
MSE	1386.7	4033.5	$3887 \bar{1} 00$	$2823 \bar{8} 00$	1.3074	$3696 \bar{4} 0$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Choudhary, J.; Sharma, H.K.; Malik, P.; Majumder, S. Price Forecasting of Crude Oil Using Hybrid Machine Learning Models. J. Risk Financial Manag. 2025, 18, 346. https://doi.org/10.3390/jrfm18070346

AMA Style

Choudhary J, Sharma HK, Malik P, Majumder S. Price Forecasting of Crude Oil Using Hybrid Machine Learning Models. Journal of Risk and Financial Management. 2025; 18(7):346. https://doi.org/10.3390/jrfm18070346

Chicago/Turabian Style

Choudhary, Jyoti, Haresh Kumar Sharma, Pradeep Malik, and Saibal Majumder. 2025. "Price Forecasting of Crude Oil Using Hybrid Machine Learning Models" Journal of Risk and Financial Management 18, no. 7: 346. https://doi.org/10.3390/jrfm18070346

APA Style

Choudhary, J., Sharma, H. K., Malik, P., & Majumder, S. (2025). Price Forecasting of Crude Oil Using Hybrid Machine Learning Models. Journal of Risk and Financial Management, 18(7), 346. https://doi.org/10.3390/jrfm18070346

Article Menu

Price Forecasting of Crude Oil Using Hybrid Machine Learning Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Random Forest

2.2. Gated Recurrent Unit (GRU)

2.3. Convolutional Neural Network (CNN)

2.4. Extreme Gradient Boosting (XGBoost)

2.5. Functional Partial Least Squares (FPLS)

2.6. Stacking

2.7. RF-GRU-CNN-XGBoost-FPLS-Stacking

2.8. Error Metrics

2.8.1. Mean Absolute Error (MAE)

2.8.2. Mean Absolute Percentage Error (MAPE)

2.8.3. Mean Squared Error (MSE)

2.8.4. Root Mean Squared Error (RMSE)

2.8.5. Mean Absolute Squared Error (MASE)

2.8.6. Root Mean Squared Scaled Error (RMSSE)

3. Results and Discussion

3.1. Data Description

3.2. Random Forest

3.3. GRU

3.4. CNN

3.5. XGBoost

3.6. FPLS

3.7. Proposed Hybrid Model RF-GRU-CNN-XGBoost-FPLS-Stacking

3.8. Matrix Evaluation and Comparative Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI