Forecasting Daily Crude Oil Prices Using Improved CEEMDAN and Ridge Regression-Based Predictors

: As one of the leading types of energy, crude oil plays a crucial role in the global economy. Understanding the movement of crude oil prices is very attractive for producers, consumers and even researchers. However, due to its complex features of nonlinearity and nonstationarity, it is a very challenging task to accurately forecasting crude oil prices. Inspired by the well-known framework “decomposition and ensemble” in signal processing and/or time series forecasting, we propose a new approach that integrates the improved complete ensemble empirical mode decomposition with adaptive noise (ICEEMDAN), differential evolution (DE) and several types of ridge regression (RR), namely, ICEEMDAN-DE-RR, for more accurate crude oil price forecasting in this paper. The proposed approach consists of three steps. First, we use the ICEEMDAN to decompose the complex daily crude oil price series into several relatively simple components. Second, ridge regression or kernel ridge regression is employed to forecast each decomposed component. To enhance the accuracy of ridge regression, DE is used to jointly optimize the regularization item, the weights and parameters of each single kernel for each component. Finally, the predicted results of all components are aggregated as the ﬁnal predicted results. The publicly available West Texas Intermediate (WTI) daily crude oil spot prices are used to validate the performance of the proposed approach. The experimental results indicate that the proposed approach can achieve better performance than some state-of-the-art approaches in terms of several evaluation criteria, demonstrating that the proposed ICEEMDAN-DE-RR is very promising for daily crude oil price forecasting. become better and better. Speciﬁcally, the Dstat reaches the best values for both RR and SigRR when the number of realization equals 500, while the results of RMSE and MAPE are very close to the best values. After that, the values of the three indicators are very stable when the number of realization varies from 500 to 2000. The results indicate that 500 is very ideal for the number of realizations.


Introduction
Crude oil is one of the leading types of energy that has great impacts on the global economy. Trying to accurately expect changes in crude oil prices benefits the producers and consumers of crude oil. However, the prices are affected by many factors, such as climate, exchange rate, supply and demand, speculation activities, geopolitics and so on, and they have fluctuated drastically in the last decades [1,2]. For example, the prices of West Texas Intermediate (WTI) reached over 145 USD/barrel in July 2008 and then quickly reduced to about 30 USD/barrel in the next five months. Crude oil prices have shown significant nonlinearity and nonstationarity in the last three decades. The complex The main contributions of this paper lie in the following: (1) We propose a new framework of multiple kernel learning, which simultaneously optimizes the weights and parameters of kernels using nature-inspired optimization. (2) We forecast crude oil prices by integrating ICEEMDAN, DE and RR, following the "decomposition and ensemble" framework. To the best of our knowledge, it is the first time that this combination is used for forecasting tasks. (3) The experimental results indicate that the proposed approach is effective for crude oil price forecasting.
It is worth pointing out that although there have existed lots of models following the "decomposition and ensemble" framework for energy forecasting, the proposed models are different from them in several aspects. First, it is an attempt to use RR to forecast crude oil prices for the first time. Existing research focuses on using RR to forecast wind speed, hydrologic time series, real estate appraisal and so on [60,72,73], not including crude oil prices. Second, a new multiple kernel learning framework is proposed using DE to optimize the parameters and/or weight of each base kernel, as well as the regularization item simultaneously. The experimental results demonstrate the effectiveness of the proposed approach. Third, the integration of ICEEMDAN, DE and RR is used to forecast time series for the first time.
The novelty of this paper is three-fold: (1) Based on the power of ICEEMDAN, DE, and RR in signal decomposition, numerical optimization, and regression, respectively, a novel combination of these three methods is proposed for time series forecasting; (2) To improve the representability of kernels, a novel multiple kernel learning framework using DE to simultaneously optimize the weights and parameters of every single kernel is proposed, which can be applied to both classification and regression; (3) The proposed ICEEMDAN-DE-RR approaches are firstly applied to forecasting crude oil prices and the results demonstrate the effectiveness of the approaches.
The remainder of this paper is structured as follows. In Section 2, we briefly introduce ICEEMDAN, DE and RR. Section 3 formulates the proposed ICEEMDAN-DE-RR method in detail. To evaluate the proposed ICEEMDAN-DE-RR, we report and analyze the experimental results in Section 4. Finally, we conclude the paper in Section 5.

Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)
ICEEMDAN was originated from EMD proposed by Huang et al. [55][56][57][58]. EMD is an adaptive method for time-space analysis, which decomposes a raw sequence that is non-linear and non-stationary into several IMFs and one residue. The main steps of EMD are described as follows: Step 1: Find out all local extrema of the raw data x(t), t = 1, 2, 3, · · · , T; Step 2: Link all local minima and local maxima to construct the lower envelopes x low (t) and upper envelopes x up (t), respectively; Step 3: Compute the local mean, i.e., m(t) = x up (t)+x low (t) 2 ; Step 4: Extract the first IMF and residue by I MF 1 (t) = x(t) − m(t) and R 1 (t) = m(t), respectively; Step 5: For i = 1, 2, 3, · · · , n, if find out more than two local extrema of R i (t), go back to step 2 and get I MF i+1 (t) and R i+1 (t).
In EMD, it was found that there were similar parts of signals existing at the same corresponding position in different IMFs, which was called mode mixing. Because of it, IMFs have lost their physical meanings [55]. To cope with this issue, Wu and Huang proposed Ensemble EMD (EEMD) by performing EMD many times on the time series with added white noises [56]. The new time series with white noises x i (t) can be formulated as Equation (1): where x(t) represents the raw data series and w i (t) is the i-th white noise, i = 1, 2, 3, · · · , n. Then, when every x i (t) is decomposed, we can get the corresponding I MF i k (t). To compute the real k-th IMF, I MF k , EEMD calculates the average of the I MF i k (t), which can remove the effect of the white noises. However, in practice, one of the limitations of EEMD is that the recovered signal will include residual noise. To solve this problem, Torres et al. proposed a new extension of EEMD, termed as CEEMDAN for signal decomposition [57]. For k = 2, 3, · · · , K, the k-th IMF and residue can be computed as Equations (2) and (3): where E 1 (.) represents the first IMF decomposed from the series, and ε i is used to set the signal-to-noise (SNR) at each stage. In 2014, Colominas et al. found that the IMFs of CEEMDAN contained some residual noise and some "spurious" modes. Thus, they further proposed a method to improve CEEMDAN (ICEEMDAN) [57,58], whose main steps can be described as follows: Step 1: Add the first IMF of the given white noises to the original series x(t), as shown in following: where β 0 is the level of noise.
Step 2: Find out the local means M(.) of x i (t) and calculate the average of local means to get the following residue: Step 3: Then, we can get the first IMF, as shown in Equation (6): Step 4: For k = 2, 3, · · · , K, the residue and the k-th IMF can be computed by Equation (7) and Equation (8): In this paper, the ICEEMDAN is used to decompose the original data series into several IMFs and one residue, standing for local physical features of original signals. The difficult task of forecasting the original signals is now becoming several relatively simple sub-tasks.

Kernel Ridge Regression (KRR)
Ridge regression (RR) is a typical linear regression that uses a sum-of-squares error function and regularization technique to control the bias variance trade-off, whose purpose is to discover the linear structures hidden in the data [74]. Kernel ridge regression (KRR) is an extension of RR by introducing a kernel function that maps the input data in a low dimensional space to a high one. The kernel function k is defined on an input space X ⊆ R d and is with the formula: k : X × X → R. The kernel function is actually a feature map from d dimensional space into a high-dimensional Hilbert Space H k , [75][76][77]. The most popular kernel functions include: where a, b, and c are the coefficient, constant and degree of k P , respectively..
where d and e are the coefficient and constant, respectively.

•
Radial basis function (RBF) kernel: where f is related to the width of the kernel.
With the kernel functions and n data samples (x 1 , y 1 ), (x 2 , y 2 ), · · · , (x n , y n ) ∈ X × Y (y i is the target value of corresponding x i , i = 1, 2, · · · , n), we can construct the kernel matrix as Equation (9): Then the KRR problem can be formulated as Equation (10): where Y is the target vector of all the n data samples, w is the unknown vector to be found, and λ ≥ 0 is a regularization item to avoid a large range of w. The solution in terms of w can be easily given in a closed-from manner as Equation (11): where I n is an n × n identity matrix with ones on the main diagonal and zeros elsewhere. Kernel types and parameters are two important factors for KRR. Some existing research used only a single kernel with specified parameters or simple combinations of several kernels with a fixed weight of every kernel in practical problems, limiting the forecasting performance of KRR. To improve the performance, an ideal solution is to adaptively optimize the weight and/or the parameters of each single kernel using some nature-inspired optimization algorithms. In this paper, we use differential evolution (DE) to optimize such weights and/or parameters for kernels.

Differential Evolution (DE)
Differential evolution (DE) is a member of the family of nature-inspired algorithms, and it has been demonstrated that DE is very powerful in solving various science and engineering problems [68]. The main idea of DE is to optimize a problem by using a few operations to iteratively improve a set of candidate solutions with evaluation criteria. Basically, the evolutionary process of DE consists of four stages/operations: initialization, mutation, crossover, and selection.

Initialization
For a D-dimensional optimization problem, given the population size P, evolutionary generation G, and the lower and upper bounds of each decision variable X min = [x 1,min , x 2,min , · · · , x D,min ] and X max = [x 1,max , x 2,max , · · · , x D,max ] respectively, the d−th decision variable of the i−th individual (i = 1, 2, · · · , P) can be initialized as Equation (12): where the "1" at the right-up corner of I represents the current evolutionary generation, and rand(0, 1) generates a random real number between 0 and 1.
The initialization produces a population with P individuals/solutions/vectors, and the d−th decision variable in each individual is in the range of [x d,min , x d,max ].

Mutation
The purpose of mutation is to generate a new vector V g i , so-called mutant vector, from several existing vectors/individuals with respect to each vector I g i , so-called target vector, in the population of the g−th generation. Some popular mutation strategies are shown as follows: where r1 − r4 are mutually random indices of the individuals, F is a preset parameter for scaling the difference vector, and I g b is the best individual at the g−th generation.

Crossover
The purpose of crossover is to generate a trial vector U g i = {U g i,1 , U g i,2 , · · · , U g i,D } from the target vector I g i and its corresponding mutant vector V g i with the following strategy: where Cr is a user-defined crossover rate that satisfies Cr ∈ [0, 1], and j rand is a random integer in [1, D] to ensure that at least one decision variable in V g i can be passed to U g i directly.

Selection
The selection operation is to select the better vector from the target vector I g i and its corresponding trial vector U g i for the next generation with evaluation by fitness function f . The selection strategy is mathematically shown in Equation (18):

Ridge Regression by DE
For any types of RR, the regularization item is an important parameter to be optimized. Kernel is a very powerful trick in machine learning, which maps the data that are linearly inseparable in the input space into a higher dimensional space where the mapped data are linearly separable using kernel functions. Regarding kernel Ridge regression (KRR), besides the regularization item in Equation (10), the parameters a − f for the single kernels k P , k S and k R need to be optimized. To further explore the ability of multiple kernel RR (MKRR) for crude oil price forecasting, we build a multiple kernel that consists of one k L , one k P , one k S and n k R s, as shown in Equation (19), where w 1 , w 2 , · · · , w n+3 are the weights of each single kernel, and a − e and f 1 − f n are parameters for each single kernel.
In our approach, we use DE to optimize the regularization item λ in RR, λ and a − f for each single kernel, and λ, weights and a − e, f 1 − f n for k M , respectively. The weights for k M , w i (i = 1, 2, · · · , n − 3), are generated from n + 3 real values x j (j = 1, 2, · · · , n − 3) optimized by DE to meet ∑ n+3 i=1 = 1 by Equation (20):

The Proposed ICEEMDAN-DE-RR Approach
The proposed ICEEMDAN-DE-RR approach is a typical form of "decomposition and ensemble", which consists of three stages, i.e., decomposition, individual forecasting, and ensemble forecasting, as shown in Figure 1. The proposed ICEEMDAN-DE-RR adopts the typical "divide-and-conquer" strategy, which divides the complexly non-linear and non-stationary crude oil prices into several relatively simple components and then handles each component with a relatively simple DE-based RR predictor. With the strategy, the tough task of forecasting raw crude oil price series becomes several relatively simple sub-tasks of forecasting each component.
It is worth pointing out that although there is a lot of work on energy forecasting following the framework of "decomposition and ensemble", our proposed work is very different from the work in several aspects: (1) RR and KRR are first applied to crude oil forecasting; (2) a novel multiple kernel RR (MKRR) optimized by DE is proposed and it can be applied in other fields; and (3) the ICEEMDAN, DE and RR are integrated to forecast daily crude oil prices for the first time.

Data Description
To validate the effectiveness of the proposed approach, we select the West Texas Intermediate (WTI) daily crude oil spot closing prices from 2 January 1986 to 4 February 2019 as an experimental dataset. There are 8342 samples in total, and the daily crude oil prices from 2 January 1986 to 14 June 2012, with 6673 samples accounting for 80% of the total samples, are chosen as the training set, while the rest are for testing. Within the training set, 5338 (80%) and 1335 (20%) samples are used for training and validation, respectively.
The WTI daily crude oil spot prices and corresponding decomposed components by ICEEMDAN are shown in Figure 2. We can see that the complex raw crude oil prices are decomposed into two groups: high-frequency group (IMF1-IMF5) and low-frequency group (IMF6-IMF11 and residue). The high-frequency components fluctuate within a narrow range of amplitude while the low-frequency ones fluctuate within a wide range of amplitude, making forecasting crude oil prices with the components easier than with the raw crude oil prices.
We conduct several types of multi-step-ahead forecasting with a lag of L in the experiments. A type of m-step-ahead prediction means forecasting the crude oil prices on the (t + m)-th day with the L price samples before the t-th day but including the t-th day.
For a fair comparison, each decomposed component is scaled to [0, 1] using the min-max normalization, as formulated by Equation (21).
where x t and x t are crude oil price series before and after normalization respectively, and x max and x min are the maximum value and the minimum value of the time series, respectively.

Evaluation Criteria
We use a set of criteria to evaluate the proposed approach as well as the compared models. Specifically, the selected criteria include the mean absolute percent error (MAPE), the root-mean-square error (RMSE), and the directional statistic (Dstat). The MAPE, RMSE and Dstat are defined as Equations (22)-(24): where y t andŷ t are the actual and predicted values at time t respectively, N is the size of the prediction, The smaller the values of MAPE and RMSE, the better the model. In contrast, a higher Dstat means a better forecasting model. Besides, we take the Diebold-Mariano (DM) test to compare the statistic difference of the accuracy of prediction between pairs of models. At first, we compute the difference of the prediction of pairs of models at time t as Equation (25): whereŷ t,a is the prediction of model a at time t andŷ t,b is the prediction of model b at time t. Then, the DM statistic can be defined as Equation (26): where cov is a covariance matrix.
If the value of the DM test is negative and statistically significant (e.g., p-value ≤ 0.05), it is proven that there is a significant difference between the predictive accuracy of pairs of models [78].

Experimental Settings
We forecast daily crude oil prices from the raw price series and decomposed components, so-called single models and ensemble models, respectively. As for single models, we compare the RR-based methods (RR, LinRR, PolyRR, SigRR, RbfRR and MKRR in Table 1) with two state-of-the-art AI models: LSSVR and BPNN, as well as two classical statistical methods: ARIMA and RW. Regarding ensemble models, we compare ICEEMDAN with EEMD to show the power of decomposition, and all the forecasting methods with single models except for ARIMA are applied to ensemble models. The parameters in the experiments are shown in detail in Table 1. Note that for the MKRR, we use 23 single kernels, i.e., one linear kernel, one polynomial kernel, one Sigmoid kernel and 20 RBF kernels, to build the multiple kernel, and both the weight and parameters of each single kernel are optimized by DE. The values or the ranges of some parameters are from existing literature [8,31]. In addition, we use RMSE as the fitness function to evaluate the individuals in DE.
All the experiments were performed by Matlab R2016b on a PC with 64-bit Windows 10, a 3.6 GHz i7 CPU and 32 GB RAM.

Single Models
The single models are performed on the raw daily crude oil prices directly. We compared the RR-based methods with LSSVR, BPNN and ARIMA. The experimental results are reported in Table 2, with the best and the worst results being shown in bold and underline, respectively, in terms of each criterion with each horizon. From the table, we can find that the AI models outperform the statistical models in 6 out of 9 cases. Among the AI models, BPNN obtains two worst results: the MAPE value of 0.0161 as well as the RMSE value of 1.3050 with Horizon 1, while LSSVR and PolyRR obtain the worst values once: the Dstat value of 0.4592 and 0.4862 with Horizon 3 and 6, respectively. Overall, the RR-based single models outperform other models in most cases. In particular, RR achieves the best results in 6 out of 9 cases, showing that it is superior to other single models. Regarding the statistical models, RW is very close to but slightly better than ARIMA. In terms of MAPE and RMSE, for each model, the results become worse and worse when the horizon increases.
Regarding directional statistics, RbfRR, MKRR and BPNN achieve the highest values of 0.5186, 0.5090 and 0.4976 with Horizon 1, 3, and 6, respectively. In contrast, ARIMA and PolyRR obtain the worst values of 0.4868 and 0.4862 with Horizon 1 and 6, respectively. For Horizon 3, both LSSVR and RW obtain the worst Dstat values of 0.4952. The intervals of the best values and the corresponding worst values are so narrow that all the results of all cases are around 0.5, just like the result of random guessing, showing that it is a tough task to forecast the direction using single models.
To further compare the single models, we report the DM test results in Table 3, with the statistics and the corresponding p-values (in brackets). From this table, we have some findings. First, compared with the statistical model ARIMA and RW, the statistics of all the AI models in all cases except for BPNN with Horizon 1 are far below −2.000 and the corresponding p-values are much less than 0.01, indicating that the AI models significantly outperform ARIMA and RW. ARIMA and RW have very similar results. Second, LSSVM and BPNN underperform RR-based approaches in most cases, showing that the RR-based predictors are more effective than the statistical AI models (LSSVM and BPNN) for forecasting raw crude oil prices, to some extent. Third, as for RR-based models, the RR, PolyRR and MKRR have very close performance, which are superior to LinRR, SigRR and RbfRR. All the findings confirm the analysis on MAPE, RMSE and Dstat. Due to the nonlinearity and nonstationarity, the performance of directly forecasting on raw crude oil price series needs to be improved. To cope with this issue, we use ICEEMDAN to decompose the raw series into several components each of which shows relatively simple characteristics when compared with the raw series, and then each component is predicted using an AI model individually. At last, the predicted results of all components are aggregated as the final result.

Ensemble Models
To demonstrate the effectiveness of ICEEMDAN, we also use EEMD as one of the decomposition methods for comparison. For the forecasting methods, we compare RR-base predictors with two state-of-the-art AI methods: LSSVR and BPNN. The values of MAPE, RMSE and Dstat with ensemble models are shown in Table 4. As far as MAPE and RMSE with EEMD are concerned, the results of different forecasting models except for BPNN and RW are significantly superior to those of the corresponding single models. For example, the best (lowest) MAPE, and RMSE with Horizon 1 are improved from 0.0154 to 0.0084, and from 1.2454 to 0.6399, respectively. Regarding Dstat, the best/worst value except those by RW is 0.8231/0.7344, which is far greater than the values of single models. Among the forecasting methods, RR-based predictors achieve all the best values while BPNN and RW obtain all the worst results. Specifically, RR, SigRR and PolyRR achieve the best values 4, 4 and 3 out of 9 times, respectively. Except for the values of MAPE and RMSE with Horizon 1, the ensemble models by BPNN are advantageous over single BPNN. Another finding is that RW obtains the worst values 8 out of 9 times. In particular, the Dstat values by RW are always around 0.5, and the possible reason is that RW performs poorly in forecasting high-frequency components. The experimental results indicate that the ensemble models except for RW can significantly improve the forecasting effectiveness when compared with the single models.
When we look at the results with ICEEMDAN and AI models in Table 4, we can see a significant improvement in the forecasting ability. As for MAPE, the value of each model with ICEEMDAN is superior to that of its competitor with EEMD. Specifically, RR achieves the best (lowest) MAPE for all the horizons, while BPNN obtains the worst values for the same horizons. For Horizon 1, PolyRR, SigRR and MKRR also achieve the best MAPE (0.0043) as RR does. The results of RMSE show similar characteristics that all the models with ICEEMDAN exceed those with EEMD. The best values of RMSE with Horizon 1, 3, and 6 are achieved by SigRR, RR, and RR, respectively. In contrast, LinRR, LinRR and BPNN obtain the worst RMSEs with Horizon 1, 3 and 6, respectively. It is worth pointing out that the directional statistics is significantly improved by ICEEMDAN. The best Dstat (0.9113) is achieved by SigRR with Horizon 1, and the other models achieve very close Dstat, indicating that this metric is very stable with ICEEMDAN. The worst (lowest) value of Dstat is 0.6661, which is much higher than the best one by single models (0.5186). Therefore, the models with ICEEMDAN and AI models are able to effectively improve the results of directional statistics. Regarding the results with ICEEMDAN and RW models, they still perform poorly and obtain the nine worst values, although RW models with ICEEMDAN perform slightly better than those with EEMD for Horizon 1 and 2.
For the models with both EEMD and ICEEMDAN, most results of MAPE, RMSE and Dstat will become worse when the horizon increases, showing that it is more difficult to forecast crude oil prices with a long horizon than with a short one.
We still apply the DM test to compare the ensemble models, and the results are shown in Table 5. From this table, we can see that when the forecasting methods with ICEEMDAN are compared with those with EEMD, the statistical values are far below zero and the p-value is very close to zero (usually less than 0.0001) with Horizon 1 and 3, indicating that the former significantly outperforms the latter with these two horizons. Regarding Horizon 6, the forecasting methods, except for BPNN with ICEEMDAN, still outperform the corresponding methods with EEMD. For each decomposition, the RR-based methods are usually superior to LSSVR and BPNN. Among the RR-based predictors, RR and SigRR have the best forecasting ability, followed by MKRR and PolyRR, while RbfRR and LinRR have a slightly worse predictive power. In addition, the models with AI are all superior to the corresponding models with RW, showing that the forecasting effectiveness does not stem from luck but by the forecasting superiority of the proposed approaches. All the DM test results confirm that ICEEMDAN and RR-based predictors are very effective for forecasting daily crude oil prices. The proposed approach that integrates ICEEMDAN and RR is capable of improving the results of crude oil price forecasting.

Discussion
In this subsection, we will discuss the impact of the parameter settings of the ICEEMDAN, the impact of the lag orders and the result of each individual component. Since the above results and analysis have shown that RR and SigRR usually outperform the other models, we will take both RR and SigRR as examples to discuss the following.

The Impact of the Parameter Settings of the ICEEMDAN
When we use the ICEEMDAN to decompose the daily crude oil price series, we need to add noises to the series and decompose the series many times. Therefore, the noise standard deviation nsd and the number of realizations nr are two important parameters. To study the impact of nsd on forecasting, we run the proposed approach with a variable nsd in the range of {0.01, 0.03, 0.05, 0.08, 0.1, 0.15, 0.2, 0.3, 0.4} while fixing other parameters. The experimental results are shown in Figure 3. Likewise, we use a variable nr in the range of {20, 50, 100, 200, 300, 500, 800, 1000, 1500, 2000} and fixed other parameters to study the impact of the number of realizations, as shown in Figure 4.      It can be seen from Figure 3 that the results in terms of RMSE, MAPE and Dstat are gradually improved when nsd increases from 0.01 to 0.08. In contrast, after that, all the evaluation indicators are getting worse and worse with the increase of noise strength when nsd is greater than 0.1. Both RR and SigRR have similar trends, and one of the two models is alternatively better than the other. The results show that the white noise strength has great impact on the forecasting performance and an ideal white noise strength is between 0.05 and 0.1.
When we look at Figure 4, we can find that when the number of the realization is 20, the results of RMSE, MAPE and Dstat are all rather bad. When the number of realization increases from 20 to 500, all the results become better and better. Specifically, the Dstat reaches the best values for both RR and SigRR when the number of realization equals 500, while the results of RMSE and MAPE are very close to the best values. After that, the values of the three indicators are very stable when the number of realization varies from 500 to 2000. The results indicate that 500 is very ideal for the number of realizations.

The Impact of the Lag Orders
Lag orders refer to the length of recent data points treated as explanatory variables to build time series models. We further investigate the impact of variable lag orders from 1 to 20 with horizon 1, and the results are shown in Figure 5. When the lag order is equal to 1, the results of the evaluation indicators are the worst. However, when it varies from 1 to 6, the corresponding results are all becoming better and better. After that, the results have remained almost unchanged for the lag order from 6 to 20. Therefore, the best lag order is 6 because it can provide satisfactory results with less input, which confirms the previous study [12,31].

The Result of Each Individual Component
Each decomposed component by the ICEEMDAN shows either high-frequency or low-frequency characteristics. In general, it is harder to forecasting a high-frequency component than a low-frequency one. We plot the predicted values and raw data of each component and the raw crude oil prices by RR and sigRR in Figures 6 and 7, respectively. It can be seen from these figures that both RR and SigRR are able to forecast the high-frequency components (IMF6-IMF11 as well as the residue) very well, and the predicted errors mainly come from the high-frequency components (IMF1-IMF5), especially from IMF1. Since the volatility of the hight-frequency components is relatively narrow, the forecasting errors from such components might be restricted. This is one of the possible reasons why the framework of "decomposition and ensemble" is effective for time series forecasting.

Summary and Conclusions
Forecasting daily crude oil prices is an important but challenging task. To improve the forecasting performance, a series of approaches using ICEEMDAN, DE and RR, termed as ICEEMDAN-DE-RR, are proposed in this paper. The proposed approaches firstly use ICEEMDAN to decompose the complex original crude oil prices into several components, and then each component is forecasted individually by DE-based RR predictors. In the end, the sum of the predicted results of all the components is taken as the final result. The extensive experiments demonstrated the proposed approaches can outperform some state-of-the-art methods.
Especially from the experimental results, we have the following interesting findings: (1) It is a very difficult task to accurately forecast daily crude oil prices with the raw price series because of its nonlinearity and nonstationarity; (2) AI-models usually outperform statistical methods when forecasting crude oil prices; (3) RR-based predictors with DE optimizing the parameters have good forecasting ability; (4) The framework of "decomposition and ensemble" can significantly improve the performance of forecasting daily crude oil prices; ICEEMDAN is advantageous over EEMD for the forecasting tasks; (5) The proposed ICEEMDAN-DE-RR approach outperforms the competitive methods in terms of several evaluation metrics, indicating that it is promising for daily crude oil price forecasting. (6) Regarding RR-based predictors, RR and SigRR with DE optimizing parameters can achieve very promising forecasting results in terms of several criteria.
In the future, we will apply the proposed approaches to forecasting other types of energy time series, such as natural gas prices, wind speed, wind power and electricity load.

Conflicts of Interest:
The authors declare no conflict of interest.