Forecasting Crude Oil Prices Using Ensemble Empirical Mode Decomposition and Sparse Bayesian Learning

Crude oil is one of the most important types of energy and its prices have a great impact on the global economy. Therefore, forecasting crude oil prices accurately is an essential task for investors, governments, enterprises and even researchers. However, due to the extreme nonlinearity and nonstationarity of crude oil prices, it is a challenging task for the traditional methodologies of time series forecasting to handle it. To address this issue, in this paper, we propose a novel approach that incorporates ensemble empirical mode decomposition (EEMD), sparse Bayesian learning (SBL), and addition, namely EEMD-SBL-ADD, for forecasting crude oil prices, following the “decomposition and ensemble” framework that is widely used in time series analysis. Specifically, EEMD is first used to decompose the raw crude oil price data into components, including several intrinsic mode functions (IMFs) and one residue. Then, we apply SBL to build an individual forecasting model for each component. Finally, the individual forecasting results are aggregated as the final forecasting price by simple addition. To validate the performance of the proposed EEMD-SBL-ADD, we use the publicly-available West Texas Intermediate (WTI) and Brent crude oil spot prices as experimental data. The experimental results demonstrate that the EEMD-SBL-ADD outperforms some state-of-the-art forecasting methodologies in terms of several evaluation criteria such as the mean absolute percent error (MAPE), the root mean squared error (RMSE), the directional statistic (Dstat), the Diebold–Mariano (DM) test, the model confidence set (MCS) test and running time, indicating that the proposed EEMD-SBL-ADD is promising for forecasting crude oil prices.


Introduction
Crude oil is one of the dominant sources of energy that powers the global economy. The demand for crude oil will continue to increase, although its pace of growth is expected to slow gradually, according to the British Petroleum (BP) energy outlook 2017 [1]. Due to the importance of crude oil, many investors, governments, enterprises and even researchers pay much attention to the crude oil prices. However, a variety of factors such as speculation activities, supply and demand, technique development, geopolitical conflicts and wars can greatly produce effects on the prices of crude oil, making it show high nonlinearity and nonstationarity [2][3][4][5]. Therefore, it is a challenging task to forecast the crude oil prices accurately.
Various models have emerged to try to forecast the crude oil prices as accurately as possible in recent years. Generally speaking, the models can be roughly categorized into two groups: (1) statistical and econometric models; and (2) artificial intelligence (AI) models [5]. Typical models in the first group include vector autoregressive (VAR) models [6], the random walk model (RWM) [7,8], the autoregressive integrated moving average (ARIMA) [9,10] and generalized autoregressive conditional heteroskedasticity (GARCH) family models [11,12]. For example, Baumeister and Kilian demonstrated that VAR models could achieve good results when forecasting crude oil prices at short horizons [6]. Wang and Wu forecasted the volatility of crude oil prices using multivariate and univariate GARCH-class models, and the results indicated that the multivariate models showed better performance than univariate models [13]. Several ARIMA-GRARCHmodels for forecasting the volatility of crude oil prices were studied in 11 markets, and the forecasting results indicated that one of the models named APARCHoutperformed the others in most cases [14]. Some other research focused on a multivariate analysis of crude oil prices. Kruse and Wegener used model averaging to analyze time-varying persistence in real oil prices from more than one hundred and fifty variables and found that the index of global economic activity by Kilian [15] was the only significant measure to explain time-varying oil price persistence [16]. However, all the statistical and econometric models have the limitation that they are built on the assumption that the time series of crude oil prices has the characteristics of linearity and stationarity. Due to the high nonlinearity and nonstationarity, it is usually hard for these models to be directly applied to crude oil price forecasting to achieve satisfactory results. Therefore, the AI models have attracted increasing interest for crude oil price forecasting because they can capture the intrinsic features of nonlinearity and nonstationarity that exist widely in crude oil prices.
As for AI models, the most popular ones include the artificial neural network (ANN) and support vector regression (SVR). Mirmirani and Li used VAR and ANN with the genetic algorithm (GA) to forecast crude oil prices, and the results showed that ANN with GA significantly outperformed VAR [17]. The authors utilized ANN and fuzzy regression (FR) to forecast crude oil prices in the environment of noise, uncertainty and complexity, and the results indicated that the ANN was superior to FR regarding mean absolute percentage error (MAPE) [18]. The study suggested that the proposed ANN-GARCH approach was capable of improving the forecasting performance for both spot oil prices and future oil prices [19]. To further study the flexibility of ANN for forecasting crude oil prices, Jammazi and Aloui used the Haar a Trous Wavelet (HTW) decomposition and the multi-layer back propagation neural network (MBPNN) for decomposition and model building, respectively [20]. In the same research, the authors also analyzed the impacts of several activation functions and the levels of input-hidden nodes in ANN. The experimental results demonstrated that the proposed method performed better than the compared methods. Tang and Zhang built a multiple wavelet recurrent neural network model for crude oil price forecasting [21]. Different from some AI models that only consider the trend and the random components of crude oil prices, the model included gold prices in the forecasting. The experimental results indicated that the proposed model can achieve high forecasting accuracy.
From the perspective of machine learning, forecasting time series is a typical problem of regression that predicts the real value output based on the corresponding input. Therefore, in theory, any regression method in AI or the signal recovery method in signal processing can be applied to forecasting time series. In 2001, a novel method called sparse Bayesian learning (SBL) that used kernel-tricks was initially proposed as a machine learning method for both classification and regression [22]. Such SBL with kernel-tricks is also known as the relevance vector machine (RVM), a Bayesian competitor of the famous support vector machine (SVM). Later, it was found that SBL without kernel-tricks is a powerful tool for signal recovery, sparse representation and compressive sensing [23][24][25][26]. Therefore, SBL without kernel-tricks has the potential for forecasting crude oil prices.
Owing to the nonlinearity and nonstationarity, it is hard to forecast the movement of crude oil prices accurately using the raw price series directly. It has been reported that a novel "decomposition and ensemble" framework can significantly improve the forecasting accuracy for complex time series. In such a framework, the raw time series is first decomposed into several relatively simple components, then, each component is predicted by a single forecasting model, and finally, the results by each component are ensembled as the final results [27,28].
The commonly-used decomposition methods include wavelet decomposition (WD), independent component analysis (ICA), empirical mode decomposition (EMD), its variant ensemble empirical mode decomposition (EEMD), and so on, among which, EEMD is a popular one that has many advantages [5]. For the ensemble, much research has shown that the addition operation can achieve satisfactory results for time series forecasting. To further improve the forecasting efficiency and effectiveness, in this paper, we aim at proposing a novel method integrating EEMD, SBL and simple addition, namely, EEMD-SBL-ADD, to predict crude oil prices, following the "decomposition and ensemble" framework. Specifically, the raw series of crude oil prices is first decomposed into several components via EEMD. Then, SBL without kernel-tricks is applied to forecasting each component. Finally, the predicted values of each component are added as the final results. The main contributions of this paper are as follows: (1) A novel forecasting approach for crude oil prices that combines EEMD, SBL and addition was proposed, following the well-known "decomposition and ensemble" framework. To the best of our knowledge, this is the first time that SBL without kernel-tricks has been applied to forecasting time series. (2) Extensive experiments were conducted on the publicly-accessible West Texas Intermediate (WTI) and Brent spot crude oil prices, and it was shown that the proposed approach outperformed several state-of-the-art methods for forecasting crude oil prices. (3) We further analyzed the characteristics of SBL when applied to forecasting crude oil prices, such as running time, the impact of the lag order and SBL's parameter settings (scalar trade-off parameter and iteration number) on the forecasting performance.
It is worth pointing out that the proposed EEMD-SBL-ADD is different from some previous work on forecasting crude oil prices [5], wind power [29] and electricity price [30], where kernel-tricks are adopted in the SBL framework. The proposed EEMD-SBL-ADD utilized SBL without kernel-tricks to forecast each component decomposed from raw crude oil prices.
The rest of this paper is organized as follows. Section 2 briefly gives the description of EEMD and SBL. Section 3 formulates the proposed EEMD-SBL-ADD method in detail. For the purpose of evaluating the proposed method, experimental results are reported and discussed in Section 4. Finally, Section 5 concludes this paper.

Ensemble Empirical Mode Decomposition
EEMD is proposed on the basis of empirical mode decomposition (EMD) [31,32]. The latter is a kind of adaptive signal decomposition method after Fourier spectral analysis and wavelet analysis. It has been widely studied and applied to nonlinear and nonstationary signal or time series. However, it was reported that the IMF components decomposed by EMD often result in the "mode mixing" problem [33]. That is to say, the same IMF component contains a time series with wide scale and different ranges or different IMF components contain time series with similar scales. The consequence of "mode mixing" is that the IMF component no longer has the same characteristic timescale and becomes a scale-dependent oscillation, thereby losing the original physical meaning. To represent the signal better, EEMD was proposed to reduce "mode mixing" and extract the real model effectively [32]. Its main idea is to eliminate "mode mixing" by the characteristics of the uniform energy distribution and whole average meaning. Several white noises are added into the original time series. Then, the EMD processing is performed many times individually. Finally, the average value is obtained for the real mode. The steps of EEMD are as follows: Step 1: Specify the standard Gaussian white noises σ, the ensemble number I and a loop variable i = 1; Step 2: Add a Gaussian white noise n i (t) ∼ N(0, σ 2 ) to the raw crude oil price series X(t) to obtain the following new series as Equation (1): Step 3: Perform EMD on X i (t) to obtain m IMFs c ij (t)(j = 1, 2, · · · , m) and one residue r i (t) as Equation (2): where c ij represents the j−IMF for the i-th trail and m is the number of IMFs, determined by the length of crude oil price series T with m = log 2 T − 1 [32].
Step 4: i = i + 1, if i > M, go to Step 5; otherwise, go to Step 2; Step 5: Compute the j-th final IMF C j (t) with corresponding IMFs in I trials as Equation (3): Step 6: Compute the final residue r(t) as Equation (4): It can be seen that with EEMD, the raw crude oil price series X(t) can be expressed as the sum of m IMFs C j (t)(j = 1, 2, · · · , m) and one residue r(t). The forms of the IMFs and the residue are usually simpler than that of the raw crude oil price series. Now, the hard work of forecasting the complex, nonlinear and nonstationary raw price data is divided into forecasting several relatively simple components (m IMFs and one residue).

Sparse Bayesian Learning
Sparse Bayesian learning (SBL) was first proposed as a machine learning method that uses kernel-tricks [22], and it has shown its power in various practical classification and regression applications, such as image classification [34], object detection [35], oil production prediction [36], automatic seizure detection [37], deception detecting [38], fault diagnosis [39], and so on. However, later, SBL without kernel-tricks has also been demonstrated to be effective and efficient on sparse signal recovery, sparse representation and compressive sensing [23,26,40]. In many ways, signal recovery can be thought of as regression because their goals are to minimize the generalization error. Therefore, in this paper, we use SBL without the kernel-trick to forecast crude oil prices. For the convenience of expression, in the rest of this paper, SBL refers to sparse Bayesian learning without kernel-tricks, unless otherwise stated.
Sparse signal recovery can be formulated by Equation (5): where Φ ∈ R N×M is a matrix with N samples and each sample has M features, y = [y 1 , y 2 , · · · , y N ] T is a vector of targets, is noise and w = [w 1 , w 2 , · · · , w M ] T is the vector to be learned to represent the weights of each column in Φ. The objective of SBL is to seek a vector of weights w that has many entities of zeros, while it can still approximate the targets y accurately [23].
In the SBL framework, it assumes the Gaussian likelihood model as Equation (6): Under such conditions, the task of obtaining maximum likelihood estimates for w equals the task of finding the minimum 2 -norm solution to Equation (5). However, such solutions are usually non-sparse. To find sparse solutions, SBL estimates a parameterized prior over weights from the data by Equation (7): where γ = [γ 1 , γ 2 , · · · , γ M ] T is a vector of M hyperparameters that controls the prior variance of each weight. The procedure of estimating these hyperparameters from data consists of two steps: (1) marginalizing over the weights and (2) performing the maximum likelihood optimization algorithm.
Interested readers can refer to [23] for more details.
The advantages of SBL are as follows: 1. Unlike many other sparse learning algorithms that require M N in the matrix Φ in Equation (5), SBL can still achieve satisfactory results even when M < N. 2. There are few parameters to set in SBL, and hence, it is not necessary to optimize parameters so that the training procedure is very fast and the results are more robust. 3. The weights found by SBL can measure the importance of each feature on forecasting crude oil prices, which improves the ability of interpretability of each feature. 4. Since SBL is suitable for resolving the multiple measurement vector model, it is easy to handle forecasting crude oil prices at several horizons simultaneously.

The Proposed EEMD-SBL-ADD Approach
Inspired by the power of EEMD on signal preprocessing and the advantages of SBL on signal recovery or regression, this paper proposes a novel approach that integrates EEMD, SBL and addition, termed EEMD-SBL-ADD, for forecasting crude oil prices, following the "decomposition and ensemble" framework. The approach is shown in Figure 1, which consists of three stages: Decomposition. EEMD is applied to decompose the raw series of crude oil prices X(t)(t = 1, 2, · · · , T) into two parts: (1) m = log 2 T IMF components C j (t)(j = 1, 2, · · · , m); (2) one residue component r(t). Stage 2: Individual forecasting. For each component from Stage 1, the data samples in the component are divided into a training set and a test set. The SBL predictor is built on the training set, and then, the predictor is applied to the test set. Stage 3: Ensemble forecasting. The test results of all components from Stage 2 are aggregated by simple addition as the final forecasting results.
The proposed EEMD-SBL-ADD is a typical strategy of "divide and conquer", where the complicated issue of forecasting the original crude oil prices is transformed to several relatively simple issues of forecasting components independently. The EEMD-SBL-ADD first uses EEMD to decompose the original crude oil price series into several components, and each component reflects some characteristics of crude oil prices. For example, the first several IMFs reflect high-frequency parts, while the last several IMFs and the residue reflect the low-frequency parts of the crude oil prices. Secondly, the EEMD-SBL-ADD builds a forecasting model for each single component using SBL, and then, the forecasting model can predict the test data of each single component. Finally, the predicted values from each component can be aggregated as the final forecasting results of crude oil prices. All these characteristics make it possible for the EEMD-SBL-ADD to improve the performance for forecasting crude oil prices.

Data Description
To validate the performance of the proposed EEMD-SBL-ADD, this paper used two publicly-accessed crude oil price series, i.e., West Texas Intermediate (WTI) crude oil spot prices and Brent crude oil spot prices, for their representative significance in international crude oil markets. For WTI data series, we used the daily close prices from 2 January 1986-2 April 2018, with a total 8132 samples, as experimental data. Among the samples, the first 6506 ones accounting for 80% from 2 January 1986-13 October 2011 were used as training samples, and the remaining 1626 ones accounting for 20% from 14 October 2011-2 April 2018 were used as testing samples. Meanwhile, for Brent data series, 7836 samples of daily close prices from 20 May 1987-2 April 2018 were used. The dataset was also split as done for the WTI dataset: the first 80% of samples for training and the remaining 20% for testing. The original crude oil prices and corresponding components decomposed by EEMD of WTI and Brent are shown in Figures 2 and 3, respectively.

Evaluation Criteria
In order to evaluate the model from multiple views, we selected four evaluation indexes: the mean absolute percent error (MAPE), the root mean squared error (RMSE), the directional statistic (D stat ) and the Diebold-Mariano (DM) statistic. At first, we used RMSE and MAPE to evaluate the prediction accuracy. MAPE and RMSE are defined as Equations (8) and (9), respectively: where y t is the actual value,ŷ t is the predicted value at time t and n is the total number of test sets. Then, we selected the directional statistic (D stat ) to evaluate the ability of direction prediction. D stat is defined as Equation (10): Obviously, the lower the MAPE and RMSE values, the higher the prediction accuracy is. In contrast, the higher the D stat value, the more accurate the direction prediction of the model is.  Next, we utilized the Diebold-Mariano (DM) statistic to compare the prediction errors of two models, which is defined as Equation (11): y 1t represents the predicted values obtained by the first model at time t, whileŷ 2t is obtained by the second model. The original assumption of this statistic is that the two models have equal prediction accuracy. When the statistic is negative, the first model is statistically advantageous compared to the second one. Finally, since the multiple application of the DM test for a set of pairwise comparisons provides biased p-values, the model confidence set (MCS) was applied to further demonstrate SBL's superiority to other models on crude oil price forecasting. The MCS can be used to select the optimal model from several models for prediction, and the results are robust [41].
To obtain more data to calculate the p-value of the statistics, the MCS performs a bootstrap on the prediction sequence. For the j-th models, given the size of bootstrapped sample T, the t-th bootstrapped sample has the loss defined as Equation (12): Given a set M 0 = {m i , i = 1, 2, · · · , n} that contains n models, for any two models j and k, the relative values of the loss between these two models can be defined as Equation (13): With the above definitions, the set of superior models can be defined as Equation (14): where E(.) represents the average value. The MCS performs a series of significant tests in M 0 . Each time, the worst prediction model in the set is eliminated. In each test, the hypothesis is the null hypothesis of equal predictive ability (EPA), defined as Equation (15): The MCS mainly relies on the equivalence test and elimination criteria. The specific process is as follows: Step 1: Assuming M = M 0 , at the level of significance α, use the equivalence test to test the null hypothesis H 0,M ; Step 2: If it accepts the null hypothesis, then M * 1−α = M will be defined; otherwise, the model that rejects the null hypothesis from M will be eliminated by the elimination criteria. The elimination will not stop until there are no longer any models rejecting the null hypothesis in the set M. At last, the models in the M * 1−α are thought as surviving models. The MCS has two kinds of statistics that can be defined as Equations (16)-(18).
The p-value can be computed with these statistics, and then, it can be used to determine the elimination or survival of the models. When the p-value of one model is less than the significance level α, this model will be eliminated; otherwise, it will survive. The larger the p-value, the more accurate the prediction of the model. When the p-value is equal to one, it indicates that the model is the optimal forecasting model.

Experimental Settings
To demonstrate the power of SBL on crude oil price forecasting, we firstly conducted experiments on raw series, and this is the so-called single model. We compared SBL with three very popular models of time series forecasting, i.e., one statistical model, ARIMA, and two AI models, ANN and LSSVR. Then, we ran ensemble models on the decomposed signals. To set up the stage for a fair comparison, we used addition (ADD) to aggregate the individual forecasting result of each component as the final result. We compared the proposed EEMD-SBL-ADD with the other seven ensemble models (EEMD-ARIMA-ADD, EEMD-ANN-ADD, EEMD-LSSVR-ADD, EMD-SBL-ADD, EMD-ARIMA-ADD, EMD-ANN-ADD and EMD-LSSVR-ADD) to further validate the performance of the proposed approach. Among the ensemble models, the ones with SBL, ANN or LSSVR are called AI-based ensemble models in this paper.
For SBL, we set 0.0004 and 600 as the values of the scalar trade-off parameter for balancing sparsity and data fit and the maximum number of iterations, respectively. For ARIMA, we needed to determine the values of the three parameters (p − d − q). d represents the number of times a nonstationary sequence turns into a stationary sequence. p and q represent the autocorrelation coefficient and partial correlation coefficient, respectively. We selected the optimal values of p and q through the AIC (Akaike information criterion) rule. For the parameters of ANN, we selected the BP (back propagation) neural network and used a four-layer network with two hidden layers. The number of hidden layer nodes was set to seven, and the maximum number of training steps was set to 10,000. For LSSVR, we chose the very popular RBF as the kernel of the model. Then, we needed to select two important parameters, including the regularization parameter γ and kernel parameter σ 2 . We used grid search to find the optimal values for both parameters, following [28,42].
Data normalization is an important work for AI-based time series forecasting. On the one hand, normalization can accelerate finding the optimal solution for gradient descent; on the other hand, it allows the characteristics of different dimensions to be comparable in numerical terms and may improve accuracy. To set up a fair comparison, in the paper, we used a frequently-used normalization method (the Min-Max normalization) for AI-based predictors. Note that ARIMA does not need normalization. The normalization formula is as Equation (19): where y max and y min are the maximum and minimum values of the original dataset, respectively. y and y norm are the raw values and the normalized values. The normalization maps the raw values in the range of [0,1]. Finally, we needed to do the inverse normalization after we obtained the predicted values by normalization. The formula is as Equation (20): whereŷ norm is the predicted value before inverse normalization andŷ is the final predicted value we need. In addition, we performed multi-step-ahead prediction and selected horizons of 1, 2, 3, 4, 5 and 6 as experimental objects. If we want to predict the value of y t+h with a time series y t , (t = 1, 2, 3, · · · , T), it can be formulated as Equation (21): y t+h = f (y t , y t−1 , y t−2 , · · · , y t−(l−1) ), (21) whereŷ t+h represents the predicted value of y t+h , y t represents the actual value at time tand l is the lag order. As suggested by [43], we set six as the lag order in these experiments. For the MCS, we set 0.2 as the level of significance α and 5000 as the number of bootstrapped samples. Meanwhile, we used "stationary" as the boot type.
All the experiments were conducted by MATLAB R2016b on a 64-bit Windows 10 professional edition with an i7-7820X CPU @3.6 GHz and 32 GB RAM.

Single Models
The MAPE values for single models on WTI and Brent crude oil prices are shown in Table 1. Among the models, SBL achieved the lowest (the best) values in 11 out of 12 cases on both markets, while ARIMA, ANN and LSSVR achieved the highest (the worst) values in 7/12, 3/12 and 4/12 cases (note that some models achieved the same highest values at some horizons). The results achieved by the statistical model ARIMA were slightly higher than, but very close to those by ANN or LSSVR in most cases. For each model, with the increase of the horizon, the corresponding MAPE values were increasing. As another level performance of forecasting, the RMSE values of all the models at different horizons on WTI and Brent crude oil prices are shown in Table 2. It can be seen that SBL outperformed all other models in all cases, while ARIMA obtained the worst values in 9 out of 12 cases, showing that the ARIMA had poor power for forecasting raw crude oil prices. For ANN and LSSVR, they achieved very close results, and neither of them was always superior to the other one. For the directional prediction, it can be seen from Table 3 that none of the models can always outperform any other ones. Amongst the models, SBL, ARIMA, ANN and LSSVR obtained the highest values on 4/12, 2/12, 4/12 and 3/12 cases, respectively (ARIMA and ANN achieved the highest values simultaneously at Horizon 4 on Brent crude oil prices). The results also demonstrated that the performance of single models was not stable when forecasting the direction of crude oil prices. Another interesting finding was that the highest value in all cases was 0.5341, while the lowest value was 0.4803, which were very similar to the result of a random guess. Therefore, it was hard for single models to achieve a satisfactory directional prediction, and the hidden reason was probably the nonlinearity and nonstationarity of the complex crude oil prices. This paper also used the DM test to validate the superiority of the SBL from the viewpoint of statistics, as the statistics and p-values (in brackets) shown in Tables 4 and 5. The results further statistically confirmed the above conclusions. First, the SBL statistically outperformed ARIMA, ANN, and LSSVR, and the corresponding p-values were far below 0.1 in 21 out of 36 cases. Second, when we chose the statistical model ARIMA as the benchmark model; ARIMA was statistically shown to underperform compared to the the AI models, showing that the latter were more powerful for forecasting nonlinear and nonstationary crude oil prices. Finally, regarding ANN and LSSVR, the results showed that neither of them could significantly outperform the other one.
We further report the results of the MCS test by single models on both WTI and Brent crude oil prices, as shown in Table 6. It can be seen from this table that the values of SBL were always 1.0000, showing that as a single model, SBL outperformed any other single models in all cases. This confirmed the results by the DM test.
From the above analyses, it has been shown that the SBL was probably the most powerful model among all models, in terms of MAPE and RMSE. Therefore, in this paper, we chose SBL as the individual forecasting method in the proposed approach. Furthermore, the performance of directional forecasting of single models was so poor that the accuracy was very close to a random guess, and hence, this paper used the "decomposition and ensemble" to improve the forecasting performance.     1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000  1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000

Ensemble Models
Regarding the ensemble models (i.e., EEMD-SBL-ADD, EEMD-ARIMA-ADD, EEMD-ANN-ADD, EEMD-LSSVR-ADD, EMD-SBL-ADD, EMD-ARIMA-ADD, EMD-ANN-ADD, and EMD-LSSVR-ADD), Tables 7-10 show the experimental results in terms of MAPE, RMSE and Dstat on the crude oil prices from WTI and Brent, respectively. Note that since all the models use addition (ADD) in Stage 3 (ensemble forecasting), we removed the term "ADD: in the tables to save space. Therefore, the model SBL under EEMD shown in the table heading refers to the proposed EEMD-SBL-ADD, and the rest can be done in the same manner.
From the MAPE and RMSE results shown in Tables 7 and 8, some interesting conclusions can be drawn. First, the proposed EEMD-SBL-ADD significantly outperformed any other models except for the MAPE and RMSE values of three-step-ahead forecasting on WTI crude oil prices and the MAPE value of one-step-ahead forecasting on Brent crude oil prices, where EEMD-LSSVR-ADD and EMD-SBL-ADD were slightly better than EEMD-SBL-ADD, respectively. Second, for the methods of decomposition, it was clear that the models with EEMD were better than the corresponding models with EMD in most cases, showing that EEMD had more advantages over EMD for forecasting prices. Third, the models with AI or machine learning (SBL, ANN and LSSVR) were very superior to those with statistical models (ARIMA). The possible reason is that as a linear model, ARIMA cannot capture the nonlinearity and nonstationarity of the crude oil prices. Fourth, among the AI models, SBL outperformed the other models (ANN and LSSVR) in most cases, indicating its power for forecasting crude oil prices. Finally, the ensemble models with AI greatly improved the results of MAPE and RMSE when compared with their corresponding single models, which showed the superior capability of AI models with the "decomposition and ensemble" for forecasting crude oil prices. In contrast, the EEMD-ARIMA-ADD could not improve the forecasting performance in terms of MAPE and RMSE when compared with the single ARIMA model.
Regarding directional forecasting, the Dstat values of ensemble models were in the range of 0.5016-0.8149 (shown in Table 9), which were all higher than a random guess and showed that ensemble models had a higher capability of directional forecasting compared with single models. Specifically, the proposed EEMD-SBL-ADD outperformed other models in 11 out of 12 cases, except that EEMD-ANN-ADD achieved slightly better results at Horizon 6 on WTI crude oil prices. In contrast, the performance of directional prediction by ARIMA had been shown to be the poorest in most cases, which further confirmed the weakness of ARIMA on forecasting the nonlinear and nonstationary crude oil prices. The results also demonstrated that, at a larger horizon, the performance of the forecasting models tended to decrease. This is probably attributable to the information loss with the increasing horizon.
As far as the statistical test, the results of the DM test by the ensemble methods on WTI and Brent are shown in Tables 10 and 11 respectively, statistically confirming the above results. Firstly, the superiority of the proposed EEMD-SBL-ADD in both markets was validated from the perspective of statistics. Specifically, the p-values of the EEMD-SBL-ADD were far below 0.01 in 80 out of 84 cases. This demonstrates that the proposed EEMD-SBL-ADD performed statistically better than other benchmark models at a confidence level of 99.9% in most cases when it was treated as the model of testing the target. Furthermore, regarding the predictor in the ensemble approaches, when the decomposition tool was fixed, the p-values of SBL were far below 0.05 in most cases, showing repeatedly that SBL outperformed the other forecasting models significantly. Second, the models with EEMD have been tested to outperform their counterparts with EMD respectively, revealing statistically that as a decomposition tool, EEMD had advantages over EMD for crude oil price forecasting. Finally, with a fixed decomposition tool, EEMD or EMD, the ARIMA-based models were still statistically demonstrated to underperform with respect to other AI-models. It confirmed that for ARIMA, the ability to capture the nonlinearity and nonstationarity from crude oil prices was poor.    As far as the results of MCS on ensemble models, some interesting findings can be seen from Table 12. First, in both markets, the EEMD-SBL-ADD survived in 22 out of 24 cases, showing that the EEMD-SBL-ADD had a good generalization ability. Second, the EEMD-SBL-ADD achieved the highest value of 1.0000 in 18 out of 24 cases, indicating that the EEMD-SBL-ADD was the best model in most cases in terms of the MCS test. Finally, for the WTI market, the EEMD-LSSVR-ADD became the optimal model at Horizons 3 and 6. The EEMD-SBL-ADD was even eliminated at Horizon 3, although it was the best models at Horizons 1, 2, 4 and 5.   5508 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 From the above results and analyses, we can draw the following conclusion: (1) The single AI models and statistical models usually could not achieve satisfactory results on raw crude oil prices because of the nonlinearity and nonstationarity, although AI models outperform statistical models; (2) The AI-based ensemble models significantly improved the forecasting accuracy when compared with single models, owing to the strategy of decomposing the complex raw price into several relatively simple components; (3) The proposed EEMD-SBL-ADD outperformed the compared models in terms of MAPE, RMSE, Dstat, DM test and MCS test; (4) Regarding the decomposition tools of EEMD and EMD, EEMD was advantageous compared to EMD for crude oil price forecasting; (5) The experimental results demonstrated that SBL was very powerful for forecasting crude oil prices in both single models and ensemble models.

Discussion
In this subsection, we will further discuss some characteristics of SBL for forecasting crude oil prices, including the running time of the proposed approach, the impact of the lag order and SBL's parameter settings (scalar trade-off parameter and iteration number) on the forecasting accuracy.

Running Time
Running time is an important metric to measure the elapsed time of running a model. The less the running time is, the better the model is. Since the input samples for different horizons are the same, along with the number of components decomposed by EEMD and EMD being the same, we only report the running time of one-step-ahead with EEMD-associated ensemble models. To show the stability of the models, we repeat the experiments 10 times and then report the mean and standard errors of the running time for both training and testing. Since all the models used the same components by EEMD, we exclude the time of decomposition. We also exclude the running time of grid search for LSSVR and just report the running time of building the training model with the optimal parameters. The experiments were conducted on WTI crude oil prices with lag = 6. The results are shown in Table 13.
It can be seen that the EEMD-SBL-ADD takes only 2.0729 seconds and 0.0045 seconds to train the model and test the samples, respectively, which is far less than the other models. Since the computation of kernels is very time consuming, the EEMD-LSSVR-ADD takes the most time in both training and testing phases. The standard errors of the EEMD-SBL-ADD, the EEMD-ARIMA-ADD and the EEMD-LSSVR-ADD are less than 5% of the average time in both training and testing, showing that their running time is stable. In addition, since SBL has the ability to resolve the multiple measurement vector model, the running time can be further reduced by forecasting crude oil prices at several horizons simultaneously. In the above experiments, we use a fixed lag order of six to compare the models, as suggested by [43]. To further analyze the impact of the lag order on the proposed EEMD-SBL-ADD model, we conducted one-step-ahead forecasting on WTI crude oil prices with the lag order in the range of 1-40. The results are shown in Figure 4. It can be seen that when the lag order is not larger than six, the performance of the EEMD-SBL-ADD improves dramatically with the increase of lag in terms of MAPE, RMSE and Dstat. Then, MAPE and RMSE slightly improve with the lag order increases from 7-11, while Dstat keeps stable with the same lag orders. Furthermore, none of the metrics will clearly improve any more when the lag order is larger than 11. The experimental results also confirm that the lag order of six is good enough for forecasting crude oil prices. However, for the proposed EEMD-SBL-ADD model, the performance can be further improved by selecting 11 as the lag order. It can be seen that the proposed EEMD-SBL-ADD is capable of selecting a suitable lag order for forecasting crude oil prices.

The Impact of SBL's Parameter Settings
The scalar trade-off parameter λ to balance sparsity and data fit and the maximum number of iteration n in SBL have slight impacts on the forecasting performance. In most cases, SBL with default values for these two parameters (e.g., λ = 0.0004 and n = 600) can achieve satisfactory results. To further demonstrate this, we study the impact of these parameters for forecasting crude oil prices. We conduct the experiments on WTI with one-step-ahead forecasting. First, we fix λ = 0.0004 and run EEMD-SBL-ADD with a variable maximum number of iterations n in the range of {50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000}. The results are shown in Figure 5. Second, we fix n = 600 and run the experiments with variable λ in the range of {0.0001, 0.0002, 0.0004, 0.001, 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128}, and the results are shown in Figure 6.  It can be seen that with fixed λ = 0.0004, when the maximum number of iteration n varies from 50-10,000, the results in terms of MAPE, RMSE and Dstat are all very stable, showing that the proposed EEMD-SBL-ADD is not sensitive to the maximum number of iterations. Similarly, when n is fixed at 600 and λ varies from 0.0001-0.002, the results also keep almost the same, while when it varies from 0.004-0.128, the results gradually become worse. All the results demonstrate that the proposed EEMD-SBL-ADD is not sensitive to the parameter settings. Even if we use default settings (e.g., λ = 0.0004 and iteration = 600), it can achieve satisfactory results.

Conclusions
Accurate forecasting of crude oil prices is a challenging task because of its attributes of nonlinearity and nonstationarity. Traditional methods including statistical methods and AI-based models usually cannot achieve satisfactory results when the forecasting is performed on raw crude oil prices. Motivated by the strategy of "divide and conquer", to cope with this issue, this paper proposes a novel model integrating EEMD, SBL and addition (namely EEMD-SBL-ADD) for forecasting crude oil prices, following the "decomposition and ensemble" framework. In the phase of decomposition, EEMD is used to decompose the raw price into several components, and each component shows simpler characteristics compared with the raw prices. Then, this paper uses SBL to forecast each component individually. Finally, the predicted results of all components are aggregated by the simple operation of addition. To demonstrate the performance of the proposed model, we compare it with some state-of-the-art methods on crude oil prices from the popular markets of WTI and Brent with different horizons. This is the first time that non-kernel trick SBL has been applied to forecasting crude oil prices. From the experimental results, some conclusions can be drawn: (1) as an AI model, SBL outperforms other models in forecasting crude oil prices with raw data, demonstrating its power for time series forecasting; (2) the AI-based ensemble approaches are superior to their counterpart single models, indicating that the decomposed component is capable of better representing the characteristics of crude oil prices; in addition, EEMD outperforms EMD for decomposition; (3) the EEMD-SBL-ADD outperforms any other competitive models in terms of the evaluation criteria, showing its promise for forecasting crude oil prices.
One limitation of the proposed EEMD-SBL-ADD model is that all components are predicted using SBL. In fact, among the components, each component shows some different characteristics. For example, some components are of high frequency, while others show low frequency. An ideal model should use different predictors for high-and low-frequency components, respectively. This is one of the extended research directions in our future work. Another interesting direction is to apply the proposed approach to forecasting some other forms of energy, such as wind speed and electricity load. We will also use SBL to perform multivariate forecasting on crude oil prices and to find significant measures to analyze time-varying persistence in crude oil prices, due to the sparsity of SBL.