Forecasting Crude Oil Price Using EEMD and RVM with Adaptive PSO-Based Kernels

: Crude oil, as one of the most important energy sources in the world, plays a crucial role in global economic events. An accurate prediction for crude oil price is an interesting and challenging task for enterprises, governments, investors, and researchers. To cope with this issue, in this paper, we proposed a method integrating ensemble empirical mode decomposition (EEMD), adaptive particle swarm optimization (APSO), and relevance vector machine (RVM)—namely, EEMD-APSO-RVM—to predict crude oil price based on the “decomposition and ensemble” framework. Speciﬁcally, the raw time series of crude oil price were ﬁrstly decomposed into several intrinsic mode functions (IMFs) and one residue by EEMD. Then, RVM with combined kernels was applied to predict target value for the residue and each IMF individually. To improve the prediction performance of each component, an extended particle swarm optimization (PSO) was utilized to simultaneously optimize the weights and parameters of single kernels for the combined kernel of RVM. Finally, simple addition was used to aggregate all the predicted results of components into an ensemble result as the ﬁnal result. Extensive experiments were conducted on the crude oil spot price of the West Texas Intermediate (WTI) to illustrate and evaluate the proposed method. The experimental results are superior to those by several state-of-the-art benchmark methods in terms of root mean squared error (RMSE), mean absolute percent error (MAPE), and directional statistic (Dstat), showing that the proposed EEMD-APSO-RVM is promising for forecasting crude oil price.


Introduction
It was reported by British Petroleum (BP) that fossil fuels accounted for 86% of primary energy demand in 2014 and remain the dominant source of energy powering the global economy, with almost 80% of total energy supply in 2035.Among fossil fuels, crude oil is and will be the most important energy source, accounting for almost 29% of total energy supply in 2035 [1], and plays a vital role in all economies.In light of the importance of crude oil for the global economy, many enterprises, governments, investors, and researchers have devoted great efforts to building models to predict its price and volatility.However, due to its complexity, the price of oil can be easily affected by many factors, such as supply and demand, speculation activities, competition from providers, technique development, geopolitical conflicts, and wars [2][3][4].All of these factors make the crude oil price nonlinear, nonstationary, and fluctuate with high volatility.For example, the West Texas Intermediate (WTI) crude oil price reached the peak of 145.31 USD per barrel in July 2008.However, the price drastically dropped to 30.28 USD per barrel, with about an 80% decrease from the peak at the end of 2008 because of the financial crisis.With economic recovery, the price rose above 113 USD per barrel in April 2011, and then sharply declined below 27 USD per barrel in February 2016 for changes of supply and demand, and for some political reasons.
A wide variety of models have emerged to predict crude oil price over the past decades, which could be roughly classified into two categories: (1) statistical and econometric models; (2) artificial intelligence (AI) models.Typically, statistical and econometric models include random walk model (RWM), error correction models (ECM), grey model (GM), vector autoregressive (VAR) models, autoregressive integrated moving average (ARIMA), and generalized autoregressive conditional heteroskedasticity (GARCH) family models.For instance, Hooper et al. [5] and Murat et al. [6] studied the performance of RWM in predicting crude oil price.The study of Baumeister and Kilin showed that a VAR model outperformed some compared methods in terms of accuracy when applied to forecasting crude oil price [7].Xiang and Zhang analyzed and predicted monthly Brent crude oil price by ARIMA, and showed that model ARIMA(1,1,1) achieved good results [8].As one of the most popular time series methods, ARIMA has been widely used as a benchmark in forecasting crude oil price by many scholars [4,[9][10][11].GARCH is another widely used method for forecasting crude oil price.Morana exploited the GARCH properties of the Brent crude oil price volatility and developed a semiparametric model based on the bootstrap approach to predict crude oil price [12].Arouri applied an extended GARCH model to forecast the conditional volatility of crude oil price with structural breaks [13].Mohammadi and Su applied ARIMA and GARCH to forecast the conditional mean and volatility of weekly crude oil price in several markets [14].Since these statistical and econometric models are built on the assumption that crude oil price is linear and stationary, it is hard for them to predict nonlinear and nonstationary crude oil price with high performance.
As far as AI methods, artificial neural network (ANN) and support vector machine (SVM) have been widely used for predicting crude oil price.Shambora and Rossiter used an ANN model with moving average crossover inputs to forecast the future price of crude oil, and the results showed the superiority of ANN when compared with RWMs [15].Mirmirani and Li compared VAR and ANN with genetic algorithm (GA) in forecasting crude oil price; the experimental results indicated that ANN with GA noticeably outperformed VAR [16].Azadeh et al. compared ANN with fuzzy regression (FR) in forecasting long-term oil price in noisy, uncertain, and complex environments, and they concluded that ANN considerably outperformed FR in terms of mean absolute percentage error (MAPE) [17].Tang and Zhang put forward a multiple wavelet recurrent neural network (MW-RNN) model for forecasting crude oil price, where wavelet and ANN were applied to capture multiscale data characteristics and to predict crude oil price at different scales, respectively.The proposed model could achieve high prediction accuracy [18].Haidar et al. utilized a three-layer feedforward neural network to forecast short-term crude oil price [19].SVM, first proposed by Vapnik [20], is a very popular supervised learning algorithm that can be applied to both classification and regression.The SVM for regression is also know as support vector regression (SVR).Xie et al. proposed an SVM-based method for crude oil price forecasting, and the results indicated that SVM outperformed ARIMA and back propagation neural network (BPNN) [21].Li and Ge presented a novel model integrating -SVR and dynamic correction factor for forecasting crude oil price [11].Some scholars studied the optimization of kernel types and/or kernel parameters in SVM for oil price forecasting [22,23].Least squares support vector machine (LSSVM) [24]-an extension of SVM with less training time-has also been used in crude oil price forecasting [25].Generally speaking, since the above-mentioned AI models can capture the nonlinear and nonstationary characteristics of crude oil price, these models are superior to the statistical and econometric models.
Owing to its highly complex characteristics of nonlinearity and nonstationarity, achieving satisfactory predictive accuracy on the raw crude oil price series is still a challenging task, although many attempts have been made.In recent years, a novel "decomposition and ensemble" framework has demonstrated its superiority in forecasting time series, which decomposes a complex times series into a few simple components, predicts each component individually, and finally ensembles all predicted values as final result [4,9,[26][27][28][29].The simple components can effectively preserve some features of complex raw data from different perspectives, and each of them can be independently handled with relatively simple methods.The challenging task of forecasting crude oil price from the complex raw data is divided into several relatively easy subtasks of forecasting each component.Therefore, this framework is effective for forecasting crude oil price.For example, Yu et al. proposed a model based on empirical mode decomposition (EMD) and ANN to predict WTI and Brent crude oil price, and the results demonstrated the attractiveness of the proposed model [9].Yu et al. also proposed a novel model based on ensemble EMD (EEMD) and extended extreme learning machine (EELM) to predict the crude oil price of WTI [4,30].Zhang et al. put forward a novel hybrid model with EEMD, LSSVM, particle swarm optimization (PSO), and GARCH to predict crude oil price, where LSSVM with parameters optimized by PSO and GARCH were used to forecast nonlinear and time-varying components by EEMD, respectively [26].Tang et al. integrated complementary EEMD (CEEMD) and EELM to forecast crude oil price [27].In addition, Fan et al. used independent component analysis (ICA) to decompose the crude oil price time series into three independent components, and then constructed three SVR models to predict the components respectively, and finally used SVR again to integrate the results by the former three SVRs as final price [31].
Relevance vector machine (RVM) [32]-a kernel-trick machine learning method that uses Bayesian inference-has attracted much attention from researchers in both classification and regression in recent years [33][34][35][36][37][38][39].The main advantages of RVM over SVM are the absence of a regularizing parameter, and the ability to use non-Mercer kernels, probabilistic output, and sparsity formulation.The kernel types and kernel parameters are still crucial in RVM.For example, Fei et al. and Wang et al. studied the performance of wavelet kernel in RVM [40,41].The authors used composite kernels to identify nonlinear systems [42].Psorakis et al. investigated the sparsity and accuracy of multi-class multi-kernel RVMs [43].To improve the performance of RVM, some evolutionary algorithms were applied to optimize the weight of single kernel or kernel parameters.Fei and He used an extended PSO to optimize the weight and parameters in a combined kernel by a radial basic function (RBF) kernel and a polynomial kernel for state prediction of bearing [44], and Zhang et al. used a similar method to predict the capacity of Lithium-Ion Batteries [45].GA, artificial bee colony algorithm (ABC), and ant colony optimization algorithms (ACO) were also applied to optimize kernel parameters in RVM [46][47][48].Regarding time series analysis, RVM has been successful in detecting seizure in electroencephalogram (EEG) signals [49] and forecasting stock index [50], exchange rate [51], nonlinear hydrological time series [52], wind speed [47,53], and the price of electricity [54].These applications show the superiority of RVM in time series forecasting.According to the existing literature, there was little research on crude oil price forecasting by RVM.
As a popular decomposition method, EEMD has advantages over other methods : (1) it can be used to decompose nonlinear and nonstationary signals into several IMFs and one residue; (2) the IMFs by EEMD are obtained adaptively and represent local features of the signal; (3) unlike Fourier and wavelet transforms, EEMD does not need a basis function for decomposition; and (4) it needs only two parameters (the number of ensemble and the standard deviation of Gaussian white noise).Therefore, it can be seen that the incorporation of EEMD as the decomposition method, RVM as the prediction method, and addition as the ensemble method might achieve good accuracy of crude oil price forecasting, following the "decomposition and ensemble" framework.Based on the framework, the original difficult task of forecasting crude oil price is divided into several relatively easy subtasks of forecasting each component individually.Since EEMD decomposes the raw crude oil price into a set of components and the raw price equals the sum of all the components, simple addition might be a good choice to ensemble all predicted results from components as the final result.Although EEMD and kernel methods have succeeded in forecasting time series, most of the existing studies used a fixed type of kernel to predict every component by EEMD, ignoring the characteristics of the data.In fact, each component has its own characteristics.For example, the residue reflects the trend of original signal, while the first intrinsic mode function (IMF) reflects the highest frequency [30].It is more appropriate to adaptively select kernel types and kernel parameters for each component by its own characteristics [55].To cope with this issue, this research aims to propose a novel method integrating EEMD, adaptive PSO (APSO), and RVM-namely, EEMD-APSO-RVM-to predict crude oil price following the "decomposition and ensemble" framework.Specifically, the raw price was decomposed into several components.Then, for each component, RVM with a combined kernel where weights and parameters of single kernels were optimized by an extended PSO was applied to predict its target value.Finally, the predicted values of all components were aggregated as final predicted crude oil price.Compared to the basic "decomposition and ensemble" framework, the proposed EEMD-APSO-RVM improves the accuracy of crude oil price forecasting in three aspects: (1) it uses EEMD instead of some other decomposition methods to decompose the raw time series into several components that can better represent the characteristics of the data; (2) it applies RVM to forecast each component because of its good predictive capabilities; and (3) it proposes APSO to adaptively optimize the weights and parameters of the single kernels in the combined kernel of RVM.The main contributions of this work are three-fold: (1) we proposed an EEMD-APSO-RVM to predict crude oil price.To the best of our knowledge, it is the first time that RVM has been applied to forecasting crude oil price; (2) an extended PSO was employed to simultaneously optimize kernel types and kernel parameters for RVM, resulting in an optimal kernel for the specified component by EEMD; (3) extensive experiments were conducted on WTI crude oil price, and the results demonstrated that the proposed EEMD-APSO-RVM method is promising for forecasting crude oil price.Accordingly, the novelty of this paper can be described as : (1) it introduces RVM to forecasting crude oil price for the first time; and (2) an adaptive PSO is proposed to optimize the weights and parameters of kernels in RVM to improve the accuracy of crude oil price forecasting.
The remainder of this paper is organized as follows.Section 2 describes the formulation process of the proposed EEMD-APSO-RVM method in detail.Experimental results are reported and analyzed in Section 3. Finally, Section 4 concludes this paper.

Methodology
The decomposition and ensemble framework has three steps; i.e., decomposition, individual prediction, and ensemble prediction.In this section, the overall formulation process of EEMD-APSO-RVM is presented.Firstly, the related EEMD, PSO, and RVM are briefly introduced individually in Sections 2.1-2.3.Secondly, the adaptive PSO for parameters optimization in RVM is described in Section 2.4.Finally, the EEMD-APSO-RVM algorithm is formulated, and the corresponding steps are described in detail in Section 2.5.

Ensemble Empirical Mode Decomposition
Ensemble empirical mode decomposition (EEMD) is an extended version of empirical mode decomposition (EMD) developed to overcome the drawback of the so-called "mode mixing" problem in the latter [30,56].Contrary to traditional decomposition methodologies, EEMD is an empirical, direct, intuitive, and self-adaptive methodology that can decompose nonlinear and nonstationary time series into components (several IMFs and one residue), with each component having a length equal to the original signal.Since it was proposed, it has been widely applied to complex system analysis, showing its superiority in forecasting time series.
The main idea of EEMD is to perform EMD many times on the time series, given a number of Gaussian white noises to obtain a set of IMFs, and then the ensemble average of corresponding IMFs is treated as the final decomposed results.The main steps of EEMD are as follows: Step 1: Specify the number of ensemble M and the standard deviation of Gaussian white noises σ, with i = 0; Step 2: i = i + 1; Add a Gaussian white noise n i (t)∼N(0, σ 2 ) to crude oil price series X(t) to construct a new series X i (t), as follows: Step 3: Decompose X i (t) into m IMFs c ij (t)(j = 1, . . ., J) and a residue r i (t), as follows: where c ij is the j-th IMF in the i-th trial, and J is the number of IMFs, determined by the size of crude oil price series N with J = log 2 N − 1 [30].
Step 4: If i < M, go to Step 2 to perform EMD again; otherwise, go to Step 5; Step 5: Calculate the average of corresponding IMFs of M trials as final IMFs: Once the EEMD completes, the original time series can be expressed as the sum of J IMFs and a residue, as follows: where r J,t is the final residue.Now, the issue of forecasting original time series becomes the new issue of forecasting each component decomposed by EEMD.

Particle Swarm Optimization
Particle swarm optimization (PSO)-firstly proposed by Eberhart and Kennedy-is an evolutionary computation algorithm that uses the velocity-displacement model through iteration to simulate swarm intelligence [57].The algorithm initializes with a group of random particles in space of D dimensions, and each particle-representing a potential solution-is assigned a randomized velocity to change its position, searching for the optimal solution.In each iteration, the particles keep track of the local best solution p l and the global best solution p g to decide the flight speed and distance accordingly.
The ith particle has a position vector and a velocity vector in D dimensional space, described as p i = (p i1 , p i2 , . . ., p iD ) and v i = (v i1 , v i2 , . . ., v iD ), and the optimum locations achieved by the ith particle and population are also described as p li = (p li1 , p li2 , . . ., p liD ) and p g = (p g1 , p g2 , . . ., p gD ), respectively.The formulas to update the speed and position of the dth dimension of the ith particle are as follows, respectively: where t is the current number of iteration, w is inertia weight, c 1 and c 2 are nonnegative accelerate constants, and r 1 and r 2 are random in the range of [0,1].
PSO is good at real optimization.Therefore, in this research, we use PSO to optimize the weight and parameters in each single kernel for the combined kernel in RVM.

Relevance Vector Machine
Relevance vector machine (RVM)-put forward by Tipping [32]-can be applied to both regression and classification.Since forecasting crude oil price is related to regression, here we give a brief review of RVM for regression only.Readers can refer to [32] for more details on RVM.
Given a set of samples {x i , t i } N i=1 , where x i ∈ R d are d-dimensional vectors as inputs and t i ∈ R are real values as targets, and assuming that t i = y(x i ; w) + i with i ∼ N(0, σ 2 ), the RVM model for regression can be formulated as: where K(x, x i ) is a kernel function on x and x i , and w i is the weight of the kernel.Then, for a sample i, the conditional probability of the target is as follows: Assuming that the samples {x i , t i } N i=1 are independently generated, the likelihood of all the samples can be defined as follows: where Φ is a design matrix having the size N × (N + 1) with Φ = [φ(x 1 ), φ(x 2 ), . . ., φ(x N )] T , wherein each component is the vector of the response of kernel function associated with the sample It may cause over-fitting if we implement maximum-likelihood estimation for w and σ 2 directly, because the size of training samples is almost the same as the size of parameters.To overcome this, Tipping imposed a constraint on weights w from a Bayesian perspective, as follows [32]: where α is an N + 1 vector named hyperparameters.With the prior on weights, for all unknown samples, the posterior can be computed from the proceeds of Bayes inference as: For a given input point x * , the predictive distribution of the corresponding target t * can be written as: It is difficult to directly compute the posterior p(w, α, σ 2 |t) in Equation (11).Instead, Tipping further decomposes it as follows: The computation of p(w, α, σ 2 |t) is now becoming the computation of two items: p(w|t, α, σ 2 ) and p(α, σ 2 |t).The posterior distribution over weights can be written from Bayes's rule: where the posterior covariance and mean are as follows, respectively, with β = σ −2 and A = diag(α 0 , α 1 , . . ., α N ), respectively.As far as the second item at right hand side of Equation ( 13), it can be decomposed as: Therefore, the learning process of RVM is now transformed to maximizing Equation ( 18) with respect to the hyperparameters α and σ 2 : where I is an identity matrix.By simply setting the derivatives of Equation ( 18) to zero, we can obtain the re-estimation equations on α and σ 2 as follows, respectively: With the iteration, the optimal values of α and σ 2 -termed as α MP and σ 2 MP respectively-can be achieved by maximizing Equation (18).
Finally, for the given input point t * , the predictive result can be computed as follows: where ).The kernel function in RVM plays a crucial role which significantly influences the performance of RVM.Therefore, it is important to select appropriate kernels according to the characteristics of the data instead of using a single fixed kernel.Some widely used single kernels include the linear kernel K lin (x i , y i ) = x T i y i , the polynomial kernel ) (here we use d to represent σ 2 for short), and the sigmoid kernel K sig (x i , y i ) = tanh(e(x T i y i ) + f ).Among the kernels, the parameters a − f usually need to be specified by users.In this paper, we integrate the above-mentioned four kernels into a combined kernel for RVM, which can be represented as: where λ 1 -λ 4 are the weights for the four kernels that satisfy ∑ 4 i=1 λ i = 1.In this way, each single kernel of the four kernels is a special case of the combined kernel.For example, when λ 1 = λ 2 = λ 4 = 0 and λ 3 = 1, the combined kernel degenerates to the RBF kernel.In the combined kernel, ten parameters (λ 1 , λ 2 , λ 3 , λ 4 , a, b, c, d, e, and f ) need to be optimized.

Adaptive PSO for Parameter Optimization in RVM
For a specific problem, it is hard to set appropriate values for the parameters in the combined kernel in Equation ( 22) according to priori knowledge.PSO is a widely used real optimization algorithm that could be used in this case.However, in traditional PSO, the inertia weight for each particle in one generation is fixed, and it varies with the iteration-ignoring the difference among particles.Some varieties of PSO adaptively adjust the inertia wight of each particle based on one or more feedback parameters [58].Ideally, the particles far from the global best particle should have larger inertia weight with more exploration ability, while the ones close to the global best particle should have smaller inertia weight with more exploitation ability.To cope with this issue, in this paper, an adaptive PSO (APSO) is proposed to optimize the parameters in RVM, which adaptively adjusts the inertia weight of each particle in an iteration according to the distance between the current particle and the global best particle.Definition 1. Distance between two particles.The distance between two particles p i and p j can be defined as: where d is the dimension of particle, and f is the fitness function.It is worth noting that each dimension in Equation (23) needs to be mapped into the same scale (e.g., [0,1]) in order for the computation to make sense.According to this definition, the distance between two particles has three properties: (1) dist(p i , Definition 2. Average distance of population.The average distance of the population can be defined as: where N is the total number of particles in a swarm. In this paper, we propose an adaptive strategy to adjust the inertia weight for one particle p i in the t-th iteration by Equation (25): where T is the number of total iterations, p g is the global best particle, and w max and w min are the maximal and minimal inertia weights specified by users, respectively.The main idea of Equation ( 25) is to adjust the inertia weight of each particle adaptively according to its distance from the global best particle.If the current particle is far from the global best particle, it uses traditional inertia weight.Otherwise, it adaptively adjusts its inertia weight according to its distance to the global best particle.The model using APSO to optimize the parameters of the combined kernel in RVM-called APSO-RVM-can be presented as: Step 1: Setting parameters.Set the following parameters for running APSO, population size P, maximal iteration times T, the maximal and minimal inertia weights w max and w min , the range of the ten parameters to be optimized; Step 2: Encoding.Encode the ten parameters into a particle (vector) p i = (p i1 , p i2 , . . ., p i10 ) to represent λ 1 , λ 2 , λ 3 , λ 4 , a, b, c, d, e, and f accordingly; Step 3: Defining the fitness function.The fitness function is defined by root mean square error (RMSE): where N is the size of training samples, y i is the true target of the input x i , and φ(x i , p i ) is the predicted target associated with x i and the parameter p i ; Step 4: Initializing.Set t = 0; randomly generate initial speed and position for each particle; use the value of particle p i to compose the kernel for RVM in Equation ( 22), and then evaluate each particle; p i is selected as p li , while the particle with the optimal fitness is selected as p g ; Step 5: Updating speed and position.Set t = t + 1; calculate the inertia weight using Equation (25), and update the speed and position according to Equations ( 5) and ( 6), respectively; Step 6: Evaluating particles.Evaluate each particle by fitness function; Step 7: Updating the historical best particle, if necessary.If f (p i ) ≤ f (p li ), then p li = p i ; Step 8: Updating the global best particle, if necessary.If f (p i ) ≤ f (p g ), then p g = p i ; Step 9: Judging whether the iteration terminates or not.If t ≤ T, go to Step 5. Otherwise, stop the iteration and output p gb as the optimized parameters for the combined kernel in RVM.
The optimal RVM predictor is obtained at this point.
The APSO is based on the framework of PSO, and the main improvement lies in that each particle has its own inertia weight according to its distance from the global best particle.In this paper, the APSO is applied to adaptively searching the optimal weights and parameters of the single kernels for the combined kernel in RVM to predict crude oil price.

The Proposed EEMD-APSO-RVM Model
Following the framework of "decomposition and ensemble", a three-stage methodology that integrates ensemble empirical mode decomposition (EEMD), adaptive particle swarm optimization (APSO), and relevance vector machine (RVM)-termed EEMD-APSO-RVM-can be formulated for forecasting crude oil price.As shown in Figure 1, the proposed EEMD-APSO-RVM generally consists of three main stages: Stage 1: Decomposition.The original crude oil price series x t , (t = 1, 2, . . ., T) is decomposed into with J = log 2 T − 1 intrinsic mode function (IMF) components c j,t , (j = The proposed EEMD-APSO-RVM is one of the typical strategies of "divide and conquer".The complicated question of forecasting the original crude oil price is transformed to several questions of forecasting relatively simple components independently.The EEMD-APSO-RVM adopts a combined kernel that integrates four commonly used kernels.Furthermore, the weights and parameters in the combined kernel are adaptively optimized by an extension of PSO.The EEMD-APSO-RVM decomposes the crude oil price into several IMFs and one residue for forecasting individually, instead of using the nonlinear and nonstationary raw data as the input to a single forecasting method; this can improve the forecasting accuracy, because the individual forecasting is a relatively easy task.The kernel-trick RVM has the ability to accurately predict time series such as wind speed and electricity price, which will benefit crude oil price forecasting.The APSO adaptively optimizes the parameters in the kernel, trying to find the optimal kernel to improve the forecasting results.All these attributes make it possible for the EEMD-APSO-RVM to improve the accuracy of crude oil price forecasting.

Numerical Example
To demonstrate the performance of the proposed EEMD-APSO-RVM, in this paper, we select the crude oil price of West Texas Intermediate (WTI) as experimental data, as described in Section 3.1.The evaluation criteria are introduced in Section 3.2.Section 3.3 gives the parameter settings and data preprocessing for the experiments, and in Section 3.4, the experimental results are reported.We further analyse the robustness and running time of the proposed method in Section 3.5.Finally, some interesting findings can be obtained from the experimental study.

Data Description
The crude oil price of WTI can be accessed from the US energy information administration (EIA) [59].We use the daily close price covering the period of 2 January 1986 to 12 September 2016, with 7743 observations in total for experiments.Among the observations, the first 6194 from 2 January 1986 to 21 July 2010 are treated as training samples, while the remaining 1549 from 22 July 2010 to 12 September 2016 are for testing-accounting for 80% and 20% of total observations, respectively.
We conduct h-step-ahead predictions with horizon h = 1, 3, 6 in this study.Given a time series x t , (t = 1, 2, . . ., T), the h-step-ahead prediction for x t+h can be formulated as: where xt+h is the h-step-ahead predicted value at time t, x t is the true value at time t, and l is the lag orders.

Evaluation Criteria
The root mean squared error (RMSE), the mean absolute percent error (MAPE), and the directional statistic (D stat ) are selected to evaluate the performance of the proposed method.With the true value x t and the predicted value xt at time t, RMSE is defined as: where N is the number of testing observations.Note that the RMSE here has the same meaning as Equation ( 26), where the predicted value is represented by φ(x i , p i ).
As another evaluation criteria for prediction accuracy, MAPE is defined as: In addition, Dstat measures the ability to forecast the direction of price movement, which is defined as: where An ideal forecasting method should achieve low RMSE, low MAPE, and high D stat .
The parameters for APSO are listed in Table 2.The standard PSO uses the same parameters as APSO.Note that to guarantee ∑ 4 i=1 λ i = 1, we simply map the values in particles to new values to be applied to the combined kernel with λ j = . For b, we use b = round(b) to get an integer as the exponent.Following some previous work [4,27], we apply RBF kernel in LSSVR and use grid search to find the optimal γ and σ 2 in the range of {2 k , k = −4, −3, . . ., 12} and {2 k , k = −4, −3, . . ., 12}, respectively.For ANN, we use a back propagation neural network and set ten as the number of hidden nodes.The iteration times of ANN was set to 10,000.For the parameters in single RVM-related predictors (i.e., a − f ), we search the best parameters in the same ranges as those in APSO (listed in Table 2) with an interval of 0.2, excepting that c varies with an interval of 1 and d varies in {2 k , k = −4, −3, . . ., 12}.We use the Akaike information criterion (AIC) [60] to determine the ARIMA parameters (p-d-q).We also set the lag orders in Equation ( 27) to six, as analysed in [61].
Regarding ensemble models, we firstly add white noise with a standard deviation of 0.15 to the original crude oil price, and then set 100 as the number of ensembles in EEMD.The decomposition results of the original crude oil price by EEMD is shown in Figure 2, with 11 IMFs and one residue.
To set up the stage for a fair comparison, we applied the Min-Max Normalization (as shown in Equation ( 31)) for all of the data: where x min and x max are the minimal and maximal values for one dimension in data, respectively, and x norm and x are the normalized and the original values, respectively.It is clear that the normalization maps the original values to the range [0, 1].Conversely, after obtaining the predicted value from the normalized data xnorm , the corresponding expected predicted value x in original scale can be computed as: All of the experiments were conducted by Matlab 8.6 (Mathworks, Natick, MA, USA) on a 64-bit Windows 7 (Microsoft, Redmond, WA, USA) with 32 GB memory and 3.4 GHz I7 CPU.

Results of Single Models
We firstly evaluate the single models (i.e., APSO-RVM, PSO-RVM, RVMlin, RVMpoly, RVMrbf, RVMsig, ANN, LSSVR, and ARIMA) in terms of MAPE, RMSE, and Dstat, as shown in Figures 3-5.From these results, it can be concluded that the proposed APSO-RVM might be the most powerful single model among all the single models in forecasting crude oil price.The MAPE value by the APSO-RVM is the lowest amongst the nine single models at all horizons, followed by PSO-RVM, RVMpoly, RVMrbf, and RVMsig.The performances of the latter four models are quite alike, except that the MAPE value by RVMrbf at horizon one is slightly high.RVMlin achieves the highest values at horizon one and horizon three, and the third highest value at horizon six, showing its poor performance in forecasting crude oil price.The possible reason for this is that the crude oil price data is not linearly separable.The results by the state-of-the-art AI benchmark models (ANN and LSSVR) are very close at horizon one and horizon three.However, LSSVR outperforms ANN at horizon six.The statistical model-ARIMA-ranks sixth in all cases.This is probably because, as a typical linear model, it is difficult for ARIMA to accurately forecast crude oil price due, to its nonlinearity and nonstationarity.
As far as RMSE, the prediction accuracy of APSO-RVM is still ranked first among all of the compared benchmark models in all cases, although it is very close to the corresponding result by PSO-RVM.For the RVM model with a single kernel, RVMpoly, RVMrbf, and RVMsig achieve very close results, which are slightly higher than that of APSO-RVM, followed by RVMlin with the poorest results at horizon one and horizon three, and the second poorest result at horizon six among all methods.ANN, LSSVR, and ARIMA achieve very similar RMSE values at each horizon, except ANN underperforms LSSVR and ARIMA at horizon six.
From the perspective of directional accuracy, all of the models produce quite similar results, ranging from 0.48 to 0.52.It can be easily seen that none of the models can be proven to be better than the others.In spite of its leading performance in terms of MAPE and RMSE, APSO-RVM does not significantly outperform other models at all horizons regarding Dstat.The APSO-RVM ranks first at horizon one, fifth at horizon three, and first with slight advantages at horizon six.It is interesting that LSSVR ranks first at horizon three and second at horizon six, but it ranks last at horizon one.Another interesting finding is that the values of seven out of nine models at horizon six are higher than those at horizon three.Therefore, the performance of single models is not stable when forecasting the direction of crude oil price.
From the results by single models, it can be seen that none of the methods can consistently outperform others in all cases in terms of MAPE, RMSE, and Dstat.Another interesting finding is that many methods achieve very close results in most cases, although the APSO-RVM is better than others in eight out of nine cases.In addition, all of the results by the methods are undesirable, even for the best result.For example, the results of Dstat by all methods were between 0.48 and 0.52, which tends to guessing randomly, making it unpractical.All of these findings show that it is a difficult task to accurately forecast crude oil price using the nonlinear and nonstationary raw price.The main reason might be that the single models have their limitations in achieving high accuracy because of the complexity of crude oil price.Hence, in this work, we develop a novel "decomposition and ensemble" method to improve the performance of single models in forecasting crude oil price.

Results of Ensemble Models
Regarding the ensemble models (i.e., EEMD-APSO-RVM, EEMD-PSO-RVM, EEMD-RVMlin, EEMD-RVMpoly, EEMD-RVMrbf, EEMD-RVMsig, EEMD-ANN, EEMD-LSSVR, EMD-APSO-RVM, EMD-PSO-RVM, EMD-RVMlin, EMD-RVMpoly, EMD-RVMrbf, EMD-RVMsig, EMD-ANN, and EMD-LSSVR), Figures 6-8 show the corresponding results in terms of MAPE, RMSE, and Dstat.From these results, it can be easily seen that the proposed EEMD-APSO-RVM is the best model that achieves the lowest MAPE value, the lowest RMSE value, and the highest Dstat value at each horizon.At each horizon, the MAPE value of EEMD-APSO-RVM ranks first among all models, being far lower than that of many other ensemble models.At the same time, the MAPE value of EMD-APSO-RVM also ranks first among all of the EMD-related methods.It shows the superiority of APSO-RVM in forecasting crude oil price.Accordingly, the EEMD-PSO-RVM and the EMD-PSO-RVM rank the second among EEMD-related and EMD-related methods, respectively, at each horizon, with slightly worse results than those of counterpart EEMD-APSO-RVM and EMD-APSO-RVM.EEMD-RVMpoly ranks third in terms of MAPE at each horizon, and EEMD-RVMsig and EEMD-RVMrbf are slightly worse than EEMD-RVMpoly, but are still better than many other methods.Among the EEMD-RVM family methods, EEMD-RVMlin is the poorest model, and it always ranks last at three horizons when compared with other EEMD-RVM-related models.It is clear that RVM with a combined kernel outperforms RVMs with a single kernel.Regarding ANN and LSSVR, it is interesting that ANN underperforms LSSVR twice with EEMD, while the first always outperforms the latter with EMD.For these two AI models, it is difficult to judge which is superior to the other, since they are both parameter-sensitive and it is difficult for traditional methods to find their optimal parameters.From the perspective of decomposition algorithms, it can be found that the ensemble methods with EEMD as decomposition method are much better than their counterpart methods with EMD, except for ANN at horizon one and horizon six, showing that the EEMD is a more effective decomposition method in time series analysis.Furthermore, EEMD-APSO-RVM significantly decreases the MAPE values when compared with the single APSO-RVM method, demonstrating the effectiveness of the decomposition method for forecasting performance.
Focusing on the RMSE values (shown in Figure 7), findings similar to those of MAPE can be obtained.EEMD-APSO-RVM still ranks first amongst all benchmark models, with 0.59, 0.83, and 1.18 at horizon one, horizon three, and horizon six, respectively.The results of EEMD-APSO-RVM are far less than those by any other models, except EEMD-PSO-RVM has slightly worse results at corresponding horizons.This further confirms that the proposed EEMD-APSO-RVM is effective for forecasting crude oil price.Most ensemble methods obviously outperform their corresponding single method.This is mainly attributed to the fact that EMD or EEMD can remarkably improve the prediction power of the models.Generally speaking, the ensemble methods with EEMD have better results than their corresponding methods with EMD, due to the good performance of EEMD on data analysis.
As far as Dstat (shown in Figure 8), all the values by ensemble models are higher than 0.525, and are quite different from the results of single models (as shown in Figure 5), where the highest value is less than 0.520.This demonstrates that the "decomposition and ensemble" framework can notably improve the performance of directional prediction.At each horizon, the proposed EEMD-APSO-RVM achieves the highest Dstat value (0.86, 0.81, and 0.74 at horizon one, horizon three, and horizon six, respectively), showing its superiority over all other methods.Similarly, EMD-APSO-RVM also outperforms all other EMD-related models at each horizon.The poorest results were usually achieved by RVM models with linear kernel, except that the EEMD-RVMlin obtains the second poorest value at horizon one, further demonstrating that the components from crude oil price are not linearly separable.

Analysis of Robustness and Running Time
Although EEMD-APSO-RVM succeeds in forecasting crude oil price, it has disadvantages.First, since the PSO uses many random values in the evolutionary process, it is hard for it to reproduce the experiments with the exact solutions.Second, it is time-consuming for the EEMD-APSO-RVM to find the optimal parameters and to compute the combined kernel.
To evaluate the robustness and stableness of the proposed EEMD-APSO-RVM, we repeated the experiments 10 times and report the results in terms of means and standard deviations (std.) of MAPE, RMSE, and Dstat in Table 3.It can be seen that the standard deviations of MAPE and Dstat are far less than 0.01, and at the same time, the standard deviation in each case is lower than 5% of corresponding mean.For RMSE, the standard deviations are slightly higher than those of MAPE and Dstat.However, even the poorest standard deviation in terms of RMSE is still less than 6% of the corresponding mean.The results show that EEMD-APSO-RVM is quite stable and robust for forecasting crude oil price.Table 3. Statistical results of running the experiment ten times by the EEMD-APSO-RVM (mean ± std.).

MAPE RMSE Dstat
One 0.0065 ± 0.0001 0.5905 ± 0.0110 0.8643 ± 0.0032 Three 0.0091 ± 0.0001 0.8324 ± 0.0340 0.8062 ± 0.0037 Six 0.0126 ± 0.0003 1.1843 ± 0.0702 0.7028 ± 0.0028 In the training phase of the EEMD-APSO-RVM, to find the optimal parameters for the combined kernel, many particles need to be evaluated by fitness function, which is time-consuming.It takes about 10 h to train a model at one horizon in our experimental environment (Matlab 8.6 on a 64-bit Windows 7 with 32 GB memory and 3.4 GHz I7 CPU), while it takes only about 3 s to test the 1549 samples with the optimized parameters.In practice, the testing time plays a more important role than the training time, because the training phase is usually completed with off-line data and it runs only once.Therefore, the time consumed by the EEMD-APSO-RVM is acceptable.

Summarizations
From the above discussions, some interesting findings can be obtained, as follows: (1) Due to nonlinearity and nonstationarity, it is difficult for single models to accurately forecast crude oil price.(2) The RVM has a good ability to forecast crude oil price.Even with a single kernel, SVM may outperform LSSVM, ANN, and ARIMA in many cases.(3) The combined kernel can further improve the accuracy of RVM.PSO can be applied to optimize the weights and parameters of the single kernels for the combined kernel in RVM.In this case, the proposed APSO outperforms the traditional PSO.(4) The EEMD-related methods achieve better results than the counterpart EMD-related methods, showing that EEMD is more suitable for decomposing crude oil price.(5) With the benefits of EEMD, APSO, and RVM, the proposed ensemble EEMD-APSO-RVM significantly outperforms any other compared models listed in this paper in terms of MAPE, RMSE, and Dstat.At the same time, it is a stable and effective forecasting method in terms of robustness and running time.These all show that the EEMD-APSO-RVM is promising for crude oil price forecasting.

Conclusions
This paper proposes a novel model integrating EEMD, adaptive PSO, and RVM (namely EEMD-APSO-RVM) for forecasting crude oil price based on the "decomposition and ensemble" framework.In the decomposition phase, we used EEMD to decompose the raw crude oil price into components of several IMFs and one residue.In the single forecasting phase, we utilized RVM with a combined kernel optimized by an adaptive PSO to forecast each component individually.Finally, the predicted results of all components were aggregated by simple addition.To validate the EEMD-APSO-RVM, eight other single benchmark models and fifteen ensemble models were employed to compare the forecasting results of the crude oil spot price of WTI at three different horizons in terms of MAPE, RMSE, and Dstat.To the best of our knowledge, it is the first time that RVM with combined kernels have been applied to forecasting crude oil price.It can be concluded from the extensive experimental results that: (1) the APSO-RVM outperforms other single models in most cases; (2) the components by decomposition can better represent the characteristics of crude oil price than raw data.Furthermore, EEMD is superior to EMD for decomposition; and (3) the EEMD-APSO-RVM achieves satisfactory results in all cases, showing that it is promising for forecasting crude oil price.
In the future, the work could be extended in two aspects: (1) studying multiple kernel RVM to improve the performance on forecasting crude oil price; and (2) applying the EEMD-APSO-RVM to forecasting other time series of energy, such as wind speed and electricity price.

Figure
FigureThe IMF and residue components by EEMD.
-ahead prediction Three-step-ahead prediction Six-step-ahead prediction
-ahead prediction Three-step-ahead prediction Six-step-ahead prediction
Ensemble forecasting.The final predicted results xt can be obtained by simply adding the predicted results of all IMF components and the residue; i.e., xt = J ∑ j=1 ĉj,t + rN,t .

Table 1 .
Descriptions of all the methods in the experiments.ANN: artificial neural network; PSO: particle swarm optimization.
Mean absolute percentage error (MAPE) by different single methods.Root mean square error (RMSE) by different single methods.