Short-Term Forecasting for Energy Consumption through Stacking Heterogeneous Ensemble Learning Model

In the real-life, time-series data comprise a complicated pattern, hence it may be challenging to increase prediction accuracy rates by using machine learning and conventional statistical methods as single learners. This research outlines and investigates the Stacking Multi-Learning Ensemble (SMLE) model for time series prediction problem over various horizons with a focus on the forecasts accuracy, directions hit-rate, and the average growth rate of total oil demand. This investigation presents a flexible ensemble framework in light of blend heterogeneous models for demonstrating and forecasting nonlinear time series. The proposed SMLE model combines support vector regression (SVR), backpropagation neural network (BPNN), and linear regression (LR) learners, the ensemble architecture consists of four phases: generation, pruning, integration, and ensemble prediction task. We have conducted an empirical study to evaluate and compare the performance of SMLE using Global Oil Consumption (GOC). Thus, the assessment of the proposed model was conducted at single and multistep horizon prediction using unique benchmark techniques. The final results reveal that the proposed SMLE model outperforms all the other benchmark methods listed in this study at various levels such as error rate, similarity, and directional accuracy by 0.74%, 0.020%, and 91.24%, respectively. Therefore, this study demonstrates that the ensemble model is an extremely encouraging methodology for complex time series forecasting.


Introduction
In Machine Learning (ML), ensemble methods combine various learners to calculate prediction based on constituent learning algorithms [1].The standard Ensemble Learning (EL) methods include bootstrap aggregating (or bagging) and boosting.Random Forest (RF) [2]; for instance, bagging combines random decision trees and can be used for classification, regression, and other tasks.The effectiveness of RF for regression has been investigated and analyzed in [3].The boosting method, which builds an ensemble by adding new instances to emphasize misclassified cases, yields competitive performance for time series forecasting [4].As the most generally utilized usage of boosting, Ada-Boost [5] has been compared with other ML algorithms such as support vector machines (SVM) [6] and furthermore combined with this algorithm to additionally enhance the forecasting performance [7].Also, stacking [8] is an instance of EL multiple algorithms.It combines the yield which is produced by various base learners in the first level.In addition, by utilizing a meta-learner, it tries to combine the outcomes from these base learners in an ideal method to augment the generalization ability [9].Although multistep predictions are desired in various applications, they are more difficult tasks than the one-step, due to lack of information and accumulation of errors.In some universal forecasting rivalries held lately, different forecasting methods were proposed to solve some genuine issues.In numerous studies, authors compared the performance of hybrid model on long-term forecasting, for instance, in [10], comparison results demonstrated that an ensemble of neural networks, such as multilayer perceptron (MLP), performed well in these competitions [10].Also, Ardakani et al. [11] proposed optimal artificial neural networks (ANN) models based on improved particle swarm optimization for long-term electrical energy consumption.Regarding the same aspect this study, [12] introduced a model named the hybrid-connected complex neural network (HCNN), which is able to capture the dynamics embedded in chaotic time series and to predict long horizons of such series.In [13], researchers combined models with self-organizing maps for long-term forecasting of chaotic time series.
On the other hand, in short-term forecasting models, such as ANN and SVM, provide excellent performance for one-step forecasting task [14,15].However, these models perform poorly or suffer severe degradation when applied to the general multistep problems.As well, the long-term forecasting models are designed for long time prediction tasks (for instance monthly or weekly time series prediction).That means they may perform better in multistep forecasting, while worse in one-step ahead than other methods.In general, the performance of combined forecasting models (e.g., mixing short-term and long-term approaches) is better when compared to single models [16].Therefore, a forecasting combination can be benefit from performance advantages of short-term and long-term models, while avoiding their disadvantages.Furthermore, major static combination approaches [17][18][19] depend on assign a fixed weight for each model such as (average, inverse mean), while dynamic combinations methods such as bagging and boosting investigated to combine the results of complementary and diverse models generated by actively perturbing, reweighting, and resampling training data [20,21].Therefore horizon dependent weights used to avoid the shortcoming of a static and dynamic combination for short-and long-term forecasts [14].
Oil Consumption (OC) is a significant factor for economic development, while the accuracy of demand forecasts is an essential factor leading to the accomplishment of proficiency arranging.Due to this reason, energy analysts are concerned with how to pick the most suitable forecasting methods to provide accurate forecasts of OC trends [22].However, numerous techniques contribute to estimating the oil demand in future.The field of energy production, consumption, and price forecasting have been gaining significance as a current research theme in the entire energy sectors.For instance, numerous studies investigated foe electricity price forecasting such as Rafał [23], this review article aims to explain and partition the primary methods of electricity price forecasting.Furthermore, Silvano et al. [24] analyzed electricity spot-prices of the Italian power exchange by comparing traditional methods and computational intelligence techniques NN and SVM models.Also, Nima and Farshid [25] proposed a hybrid method for short-ahead price forecasting composed of NN and evolutionary algorithms.
Several studies discussed the issue of time series prediction using different methodologies including statistical methods, single machine learning models, soft computing on ensemble, and hybrid modeling.
Statistical methods have been investigated for time series prediction in the energy consumption area, such as moving average [26], exponential smoothing [27,28], autoregressive moving average (ARMA) [29], and autoregressive integrated moving average (ARIMA) models [30].For instance, the ARIMA model has been introduced for natural gas price forecasting [31].However, these statistical techniques do not yield convincing results for complicated data patterns [32,33].In this context, the Gray Model (GM) forecast accuracy was enhanced by using a Markov-chain model.The outcome of this study demonstrated that the hybrid GM-Markov-chain model was more accurate and had a higher forecast accuracy than GM (1, 1) [34].
In fact, neural networks offer a promising tool for single machine learning model in time series analysis due to their unique features [35].To further improve the generalization performance, ANN models were investigated for forecasting future OC [36].Another study experimented with ANN models to predict the long-term energy consumption [37].For the same purpose, an ANN model was applied to forecast load demands in future [38].
Energies 2018, 11, 1605 3 of 21 However, ANNs yield mixed results when dealing with linear patterns [39], it is difficult to obtain high accuracy rates of predictors by using the single method, either statistical or ML techniques individually.In order to avoid the limitations associated with the individual models; researchers suggested a hybrid model which combines linear and nonlinear methods to yield high prediction accuracy rates [32,39].Several studies investigated hybrid modeling to optimize the parameters of the ANN [40].Hence the improved performance of artificial bee colony (ABC-LM) over other alternatives has been demonstrated on both benchmark data and OC time series.
Similarly, an NN, combined with three algorithms in a hybrid model, then optimized by using a genetic algorithm was used to estimate OC; the outcome demonstrated the efficiency of the hybrid model overall benchmark models [41].Moreover, a researcher in [42] proposed a genetic algorithm-gray neural network (GA-GNNM) hybrid model to avoid the problem of over-fitting, by examining hybrid versus a total of 26 combination models.Authors concluded that the hybrid models provided desirable forecasting results, compared to the conventional models.Also, the GA has more flexibility in adapting NN parameters to overcome the performance instability of neural networks [22].
In the same context, hybrid models were investigated to solve prediction intervals and densities problems, and have become more common.As shown in Hansen [43] fuzzy model combined with neural models, this combination increased the computation speed, and the coverage is extended.Thus, the problem of the narrow prediction intervals is resolved.Similarly, in [44] the prediction interval also concerned with blend of neural networks and fuzzy models to determine the optimal order for the fuzzy prediction model and estimate its parameters with greater accuracy.Since prediction intervals and forecast densities have become more popular, many types of research have been done about how to determine the appropriate input lag, for this purpose, the fuzzy time series model suggested increasing accuracy by solving the problems of data size (sampling) and the normality [45].Regarding the same aspect, Efendi and Deris extended a new adjustment of the interval-length and the partition number of the data set, this study discussed the impact of the proposed interval length in reducing the forecasting error significantly, as well as the main differences between the fuzzy and probabilistic models [46].
Finally, as a conclusion from the above studies, hybrid methods give off an impression of being an astounding way to combine predictions of several learning algorithms.The hybrid regression models give preferred predictive accuracy over any single learner.Nonetheless, there was no distinctive way to merge the outcome forecasts of individual models.
In this paper, the goal is to introduce a novel EL framework that can reduce model uncertainty, enhance model robustness, and enhance forecasting accuracy on oil datasets, improve model accuracy, being defined as having a lower measure of forecasting error.The most important motivation for combining different learning algorithms is based upon the assumption that diverse algorithms using different data representations, dissimilar perceptions, and modelling methods are expected to arrive at outcomes with different prototypes of generalization [47].In addition, to date, comparatively few researches have addressed ensembles for different regression algorithms [48].
We demonstrate that the OC framework can significantly outperform the current methodologies of utilizing the single and classic ensemble forecasting models in single and multistep performance.Although the idea is straightforward, it is yet a robust approach, as it can outperform the average model, as one does not know a priori which model will perform best.The merits of this proposed methodology are analyzed empirically by first describing the exact study design and after that, assessing the performance of various ensembles of different OC models on the GOC.These outcomes are then compared to the classical approach in the literature, which takes the calibrated model with the lowest measure of forecasting error on the calibration dataset at the horizon (1-ahead) to OC of the same dataset at the horizon t = n (10-ahead).
In summary, the developed ensemble model takes full advantage of each component and eventually achieves final success in energy consumption forecasting.The major contributions of this paper come therefore from three dimensions as follows: 1.
In this study, we develop a new ensemble forecasting model that can integrate the merits of single forecasting models to achieve higher forecasting accuracy and stability.
We have introduced a novel theoretical framework how to predict OC.Although the ensemble concept is more demanding regarding computational requirements, it can significantly outperform the best performing model (SVR) of individual models.While the idea is straightforward, it is yet a robust approach, as it can outperform the linear combination methods, as one does not know a priori which model will perform best.

2.
The proposed ensemble forecasting model aims to achieve effective performance in multi-step oil consumption forecasting.Multi-step forecasting can effectively capture the dynamic behavior of oil consumption in the future, which is more beneficial to energy systems than one-step forecasting.Thus, this study builds a combined forecasting model to achieve accurate results for multi-step oil consumption forecasting, which will provide better basic for energy planning, production and marketing.

3.
The superiority of the proposed ensemble forecasting model is validated well in a real energy consumption data.The novel ensemble forecasting displays its superiority compared to the single forecasting model and classic ensemble models, and the prediction validity of the developed combined forecasting model demonstrates its superiority in oil consumption forecasting compared to classical ensemble models (AR, Bagging) and the benchmark single models (SVR, BPNN and LR) as well.Therefore, the new developed forecasting model can be widely used in all temporal data application prediction.

4.
A perceptive discussion is provided in this paper to further verify the forecasting efficiency of the proposed model.Four discussion aspects are performed, which include the significance of the proposed forecasting model, the comparison with single models, and classical ensemble methods, the superiority of the developed forecasting model's stability, which bridge the knowledge gap for the relevant studies, and provide more valuable analysis and information for oil consumption forecasting.
The structure of the paper is organized into five sections: Section 2 is devoted to describing proposed methods design.Section 3 presents the experimental results.Section 4 offers the consumption prediction analysis and discussion.Section 5 describes the conclusion and further suggestion for future works.

Proposed Framework
In Section 1, reviewed the literature in three different areas (i.e., single, hybrid, and soft computing on ensemble).While the hybrid modeling literature advanced significantly over the last 20 years, the research on minimizing forecast error, model uncertainty, and hybrid methods is still relatively limited so far.To the best of our knowledge, no attempts exist yet of combining these different areas, by using EL methods to reduce the issues of OC tasks (see Table 1 for a summary).
In particular, we will outline a very general theoretical framework to calibrate and combine heterogeneous ML models using ensemble methods.Its modularity is displayed in Figure 1 and allows for flexible implementation regarding base models, forecasting techniques, and ensemble architecture.For a practical application of this method, we have split the Stacking Multi-Learning Ensemble (SMLE) framework into four main phases and will describe them including their sub-steps in further detail as follows.

Ensemble Generation
In the original data set, the initial training data, represented as D , had m observations and n features, so that it is mn  .The modeling procedure can be realized by setting different parameters of the base learners.In this level, some heterogeneous models were trained on D using one method of training (i.e., cross-validation).Moreover, each model offered prediction results ( 1, 2,..., )

Ensemble Generation
In the original data set, the initial training data, represented as D, had m observations and n features, so that it is m × n.The modeling procedure can be realized by setting different parameters of the base learners.In this level, some heterogeneous models were trained on D using one method of training (i.e., cross-validation).Moreover, each model offered prediction results p i (i ∈ 1, 2, . . ., n) which were then cast into a second level data; the outcome became the input for the second level as training data.

Ensemble Pruning
Ranking-based subset selection method ranks the candidate models according to criteria, such as the mean absolute percentage error (MAPE), directional accuracy (DA), and Euclidean Distance (ED), and included only the top n models from all candidate models.

Ensemble Integration
This step describes how the selecting models were combined into ensemble forecast.In this context, the stacking method is used to build the second level data, stacking uses a similar idea to K-folds cross-validation to solve two significant issues: Firstly, to create out-of-sample predictions.Secondly, to capture distinct regions, where each model performs the best.The stacking process investigates by inferring the biases of the generalizers concerning the provided base learning set.Then, stacked regression using cross-validation was used to construct the 'good' combination.Consider a linear stacking for the prediction task.The basic idea of stacking is to 'stack' the predictions f 1 , . . . ,f m by linear combination with weights a i , . . . ,(i = 1, . . ., m): where the weight vector a is learned by a meta-learner.

Ensemble Prediction
The second level learner model(s) can be trained on the D data to produce the outcomes which will be used for final predictions.In addition, to select multiple sub-learners, stacking allows the specification of alternative models to learn how to best combine the predictions from the sub-models.Because a meta-model is used to combine the predictions of sub-models best, this method is sometimes termed blending, as in mixing the final predictions.
In brief, Figure 1 demonstrated the general structure of SMLE framework, which consisted of various learning steps, after applying this scheme, three SMLE models were generated, while the difference between the SMLE models were not in structure, but in the type of base model in level #0 and the differences between the three models in the part of base model can be explained as follows:

•
1st SMLE in base layer used SVR learner and in Meta layer LR used as meta learner.

•
2nd SMLE in base layer used BPNN learner and in Meta layer LR used as meta learner.

•
3rd SMLE in base layer used SVR and BPNN learners and in Meta layer LR used as meta learner.

Data
The GOC data were used as benchmark data; this dataset was downloaded from the website: https://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-worldenergy.html.The data represented total OC in the world; the data was yearly type and had a duration from 1965 to 2016.The data consisted of two factors, thus dependent variable oil consumption (in Million Tonnes), which was a feature over time, and date (in years) was the independent variable in this case study.Therefore, the OC time series for this experiment had 52 data points.For a better explanation, we visualized whole actual time series in Figure 2, with a blue circle in curve.

Models
As above-mentioned, we applied the ensemble SMLE model to predict the GOC data set after combining the heterogeneous models, Table 2 lists the learners' parameters that have been investigated in this paper.These related methods are presented briefly as follows: 1.The BPNN algorithm consists of multiple layers of nodes with nonlinear activation functions and can be considered as the generalization of the singer-layer perceptron.It has been demonstrated to be an effective alternative to traditional statistical techniques in pattern recognition and can be used for approximating any smooth and measurable functions [49].This method has some superior abilities, such as its nonlinear mapping capability, self-learning and adaptive capabilities, and generalization ability.Besides these features, the ability to learn from experience through training makes MLP an essential type of neural networks and it is widely applied to time series analysis [50]. 2. The SVM algorithm is always considered a useful tool for classification and regression problems due to the ability to approximate a function.Furthermore, the kernel function is utilized in the SVR to avoid the calculations in high-dimensional space.As a result, it can perform well when the input features have high dimensionality.It separates the positive and negative examples as much as possible by constructing a hyperplane as the decision surface.The support vector regression (SVR) is the regression extension of SVM, which provides an alternative and promising method to solve time series modeling and forecasting [51,52].3. LR is a popular statistical method for regression and prediction.It utilizes the ordinary leastsquares method or generalized least-squares to minimize the sum of squares of errors (SSE) for obtaining the optimal regression function [53].

Models
As above-mentioned, we applied the ensemble SMLE model to predict the GOC data set after combining the heterogeneous models, Table 2 lists the learners' parameters that have been investigated in this paper.These related methods are presented briefly as follows: 1.
The BPNN algorithm consists of multiple layers of nodes with nonlinear activation functions and can be considered as the generalization of the singer-layer perceptron.It has been demonstrated to be an effective alternative to traditional statistical techniques in pattern recognition and can be used for approximating any smooth and measurable functions [49].This method has some superior abilities, such as its nonlinear mapping capability, self-learning and adaptive capabilities, and generalization ability.Besides these features, the ability to learn from experience through training makes MLP an essential type of neural networks and it is widely applied to time series analysis [50].

2.
The SVM algorithm is always considered a useful tool for classification and regression problems due to the ability to approximate a function.Furthermore, the kernel function is utilized in the SVR to avoid the calculations in high-dimensional space.As a result, it can perform well when the input features have high dimensionality.It separates the positive and negative examples as much as possible by constructing a hyperplane as the decision surface.The support vector regression (SVR) is the regression extension of SVM, which provides an alternative and promising method to solve time series modeling and forecasting [51,52].

3.
LR is a popular statistical method for regression and prediction.It utilizes the ordinary least-squares method or generalized least-squares to minimize the sum of squares of errors (SSE) for obtaining the optimal regression function [53].This subsection describes several aspects of the evaluation of the different models; the evaluation aspects include the estimation of error rates and pairwise comparisons of classifiers/ensembles.

Performance Evaluation
In terms of performance error estimation, the mean absolute percentage error (MAPE) was adopted as an indicator of accuracy for all forecasting methods.The accuracy is expressed as a percentage value, and is defined by the Formula (2) as below: where y i is the actual value and ∧ y i is the forecast value.

Time Series Similarity
The distance between time series can be measured by calculating the difference between each point of the series.The Euclidean Distance (ED) between two-time series Q = {q 1 , q 2 , . . . ,q n } and S = {s 1 , s 2 , . . . ,s n } is defined as: This method is moderately easy to calculate, and has complexity of O(n) [54].

Continuous Growth Rates (CGR)
Calculating change growth rate in data is useful for average annual growth rates that steadily change.It is famous because it relates the final value in series to the initial value in the same series, rather than just providing the initial and final values separately-it gives the ultimate value in context [30].The CGR value calculated according to Formula (4) as follows: where k represents the annual growth rate y t represents the initial population size, t represents the future time in years and k is CGR.

The Algorithm for Stacking Multi-Learning Ensemble (SMLE)
In this study, SMLE offers a dynamic EL method.The SMLR method depends on the sequence characteristic of OC data.For accurate OC prediction, we express the algorithm of SMLE when predicting the next mth moment OC at the time t.The general design of the proposed model considered both diversity management and accuracy enhancement for base models.Here the algorithm of SMLE is described below as pseudocode in Algorithm 1: Algorithm 1: Stacking Multi-Learning Ensemble (SMLE).
% Train the second-level learner h by applying the second-level learning algorithm L to the new data set D h = L(D ).Output: H(x) = h (h 1 (x 1 ), . . ., h T (x T ))

Results
In this section, we evaluated various models on GOC 52-year data sets using BPNN, SVR, and LR as the base models to demonstrate their predictability of both single and EL forecasting.Hence, there were single models used as benchmark model compared to ensemble predictors.In the second experiment, we tested two classic ensemble models include bagging and additive regression (AR).Moreover, the third experiment tested three ensemble models based on SMLE scheme.To establish the validity of the evaluated method, a further procedure was done by comparing the obtained results of single models with the outcome of the ensemble models.Evaluation criteria were used to compare and analyze the prediction, such as T-Time, DA, MAPE, and ED, which are excellent methods for predicting GOC.Meanwhile, we compare the evaluation criteria of multistep (10-ahead) with single step (1-ahead) forecasting to find the better SMLE model for predicting GOC in both short-term and term-long horizon situations.Finally, consumption growth rate evaluated for all prediction outcome.

Single Models Results
Regarding the experiment design and the overall steps described in Section 2.2, the first test in this experiment was to compare the performance of all base models separately.The output of 10-fold cross-validation tests run on the initial training were used to determine whether each model was sufficient for OC data to make the forecasting results more stable.Figure 2a presents the comparison of the best-obtained results from all base models with the real OC data.It is evident that the results obtained according to the SVR method for the 52 known years  were close to the actual ones and comparable to those produced by the BPNN and LR models.
Similarly, Figure 2b demonstrated the residual errors of the prediction, to make a reliable comparison to quantitatively analyze the performances of the base models; we considered the MAPE measure indices for performance accuracy processes, which are listed in Table 3.
In brief, as seen in Table 3, the MAPE between predicted and actual values for the SVR model is 1.24% given by relative accuracy (DA) 89.9% which indicates clearly that the SVR model is well working and has acceptable accuracy.Regarding the same aspect, we can observe that the SVR had superiority in both run time and similarity (0.01 and 0.034, respectively).However, it is worth mentioning that the LR models scored poor performance compared to other single models.The similarity between actual and predicted data was measured using Euclidean Distance (ED), as shown in Table 3; the BPNN score 0.034, which was small indicates the best predictive performance, while LR scores 0.074 was the worst similarity across the models.

Classic Ensemble Models Results
In the second experiment, we empirically tested two classical ensemble models, included bagging and additive regression (AR).To illustrate the behavior of all classical ensemble fitting, they were compared with actual data in Figure 3a,b, for visual comparison of the residual error of each model.The evaluation matrix of single learning, classical ensemble methods, and proposed SMLE models are summarized in Table 3.As observed from Table 3 and Figure 3, the bagging model performed better than the AR model in all evaluation measures, except in DA.Similarly, the bagging model performed better than the best single model (SVR) in performance and similarity while SVR perform best in DA and has least training time.For this dataset, we accordingly developed homogeneous and heterogeneous ensembles of individual models rather than using their hybrid versions.
Regarding the experiment design and the overall steps described in Section 2.2, the first test in this experiment was to compare the performance of all base models separately.The output of 10-fold cross-validation tests run on the initial training were used to determine whether each model was sufficient for OC data to make the forecasting results more stable.Figure 2a presents the comparison of the best-obtained results from all base models with the real OC data.It is evident that the results obtained according to the SVR method for the 52 known years  were close to the actual ones and comparable to those produced by the BPNN and LR models.
Similarly, Figure 2b demonstrated the residual errors of the prediction, to make a reliable comparison to quantitatively analyze the performances of the base models; we considered the MAPE measure indices for performance accuracy processes, which are listed in Table 3.
In brief, as seen in Table 3, the MAPE between predicted and actual values for the SVR model is 1.24% given by relative accuracy (DA) 89.9% which indicates clearly that the SVR model is well working and has acceptable accuracy.Regarding the same aspect, we can observe that the SVR had superiority in both run time and similarity (0.01 and 0.034, respectively).However, it is worth mentioning that the LR models scored poor performance compared to other single models.The similarity between actual and predicted data was measured using the Euclidean Distance (ED), as shown in Table 3; the BPNN score 0.034, which was small indicates the best predictive performance, while LR scores 0.074 was the worst similarity across the models.

Classic Ensemble Models Results
In the second experiment, we empirically tested two classical ensemble models, included bagging and additive regression (AR).To illustrate the behavior of all classical ensemble fitting, they were compared with actual data in Figure 3a,b, for visual comparison of the residual error of each model.The evaluation matrix of single learning, classical ensemble methods, and proposed SMLE models are summarized in Table 3.As observed from Table 3 and Figure 3, the bagging model performed better than the AR model in all evaluation measures, except in DA.Similarly, the bagging model performed better than the best single model (SVR) in performance and similarity while SVR perform best in DA and has least training time.For this dataset, we accordingly developed homogeneous and heterogeneous ensembles of individual models rather than using their hybrid versions.

The SMLE Results
In the third experiment, we empirically tested three heterogeneous stacking models, each model was composed of a combination of base and meta-models.The first ensemble model consists of SVR as a base learner and LR as meta-learner.To illustrate the behavior of all SMLE for fitting, they were compared with actual data in Figure 4a,b, for visual comparison of the residual error of each model.The evaluation matrix of single learning methods, and proposed framework is summarized in Table 3.
The outcome of this model, as presented in Table 3, enhanced the forecasting accuracy by 34% when it was compared to the best base learner, SVR.Moreover, the second ensemble model was a mix of BPNN as the base learner and LR as the meta-learner, the combined model increased the forecasting accuracy by decreasing the error by 46%, compared to the best single model as mentioned previously.
In the third experiment, we empirically tested three heterogeneous stacking models, each model was composed of a combination of base and meta-models.The first ensemble model consists of SVR as a base learner and LR as meta-learner.To illustrate the behavior of all SMLE for fitting, they were compared with actual data in Figure 4a,b, for visual comparison of the residual error of each model.The evaluation matrix of single learning methods, and proposed framework is summarized in Table 3.
The outcome of this model, as presented in Table 3, enhanced the forecasting accuracy by 34% when it was compared to the best base learner, SVR.Moreover, the second ensemble model was a mix of BPNN as the base learner and LR as the meta-learner, the combined model increased the forecasting accuracy by decreasing the error by 46%, compared to best single model as mentioned previously.Finally, the third ensemble model was a combination of SVR and BPNN as base learners and LR as the meta-learner.The forecasting result of this model indicates that the accurate predictive model decreased the error of the best base model by 50%, which led to proof of the superiority of the third model over both the single and combination models.The similarity between actual and predicted data is shown in Table 3, the 3rd SMLE based (SVR-BPNN) model score was 0.020, while 1st SMLE based (SVR) score was 0.028, the worst similarity in across all the models.Also, it can be observed that all ensemble model had less distance compared to single models.
In the same aspect, the 3rd SMLE performed better than the best classic ensemble model (bagging) in all measures, except for training time (T-Time); this was due to the ensemble model learning level which consumed more time and calculation.Finally, the third ensemble model was a combination of SVR and BPNN as base learners and LR as the meta-learner.The forecasting result of this model indicates that the accurate predictive model decreased the error of the best base model by 50%, which led to proof of the superiority of the third model over both the single and combination models.The similarity between actual and predicted data is shown in Table 3, the 3rd SMLE based (SVR-BPNN) model score was 0.020, while 1st SMLE based (SVR) score was 0.028, the worst similarity in across all the models.Also, it can be observed that all ensemble model had less distance compared to single models.

Discussion
In the same aspect, the 3rd SMLE performed better than the best classic ensemble model (bagging) in all measures, except for training time (T-Time); this was due to the ensemble model learning level which consumed more time and calculation.

Discussion
In this subsection, we practically used all single and ensemble forecasters to solve the problem of how to estimate the future OC.For further evaluation of SMLE scheme stability, all models were examined in 1-ahead and 10-ahead horizon predictions.
From Figure 5 and Table 4, it is easy to find that the proposed SMLE method was the best one for OC forecasting in all prediction horizons (i.e., 1, 3, 5, 7, and 10-step-ahead), relative to other models considered in this study.In all the models, the SMLE-based BPNN-SVR model did not only accomplish the highest accuracy at the level estimation, which was measured by the MAPE criteria, it additionally got the highest hit rate in direction prediction, which was estimated by the DA criterion.Then again, among the majority of the models utilized as a part of this investigation, the single LR model performed the poorest in all progression ahead forecasts.LR model not only had the lowest level accuracy, which was measured by MAPE, but also acquired the worst score in direction accuracy, which was measured by the DA criteria.The main reason might be that LR was a class of the typical linear model and it could not capture the nonlinear patterns and occasional characteristics existing in the data series.Apart from the SMLE-based BPNN-SVR and LR models, which performed the best and the poorest, respectively.All models listed in this study produce some interestingly blend results, these outcomes were analyzed by using four estimation criteria (i.e., MAPE, DA, T-test, and CGR).In this subsection, we practically used all single and ensemble forecasters to solve the problem of how to estimate the future OC.For further evaluation of SMLE scheme stability, all models were examined in 1-ahead and 10-ahead horizon predictions.
From Figure 5 and Table 4, it is easy to find that the proposed SMLE method was the best one for OC forecasting in all prediction horizons (i.e., 1, 3, 5, 7, and 10-step-ahead), relative to other models considered in this study.In all the models, the SMLE-based BPNN-SVR model did not only accomplish the highest accuracy at the level estimation, which was measured by the MAPE criteria, it additionally got the highest hit rate in direction prediction, which was estimated by the DA criterion.Then again, among the majority of the models utilized as a part of this investigation, the single LR model performed the poorest in all progression ahead forecasts.LR model not only had the lowest level accuracy, which was measured by MAPE, but also acquired the worst score in direction accuracy, which was measured by the DA criteria.The main reason might be that LR was a class of the typical linear model and it could not capture the nonlinear patterns and occasional characteristics existing in the data series.Apart from the SMLE-based BPNN-SVR and LR models, which performed the best and the poorest, respectively.All models listed in this study produce some interestingly blend results, these outcomes were analyzed by using four estimation criteria (i.e., MAPE, DA, T-test, and CGR).4, the MAPE values of the SMLE-based BPNN-SVR model were 0.61 in 1-ahead and 0.74 as an average of the 10-step-ahead predictions, which was less than other methods.Also, in the short-term prediction step, better performance was observed when comparing ensemble methods with single models, the results indicate that the ensemble methods outperformed the single and classic ensemble methods in all cases.The principle reason could be that the cross-validation decomposition methodology did efficiently enhance the forecast execution.Interestingly, the 1-step-ahead and multi-step-ahead prediction horizon of single model forecasts were inferior to ensemble models.Focusing on the single methods and classic ensembles, all the ML models outperformed the LR model; the reason may be that LR is a typical linear model, which is not suitable for capturing the nonlinear and seasonal characteristics of OC series.In ML models (i.e., SVR, BPNN), it can be seen that SVR performed slightly better than BPNN in all 10-step-ahead predictions and BPNN perform poorest in all the step prediction.The main reason leading to this may the parameter selection.The MAPE values of LR were from 2.91 to 2.40, which were slightly inferior to SVR and BPNN models.The possible reason was that the prediction results of LR, which was under the linear hypothesis were more volatile than those of the ML models.
Second, the high-level exactness does not necessarily imply that there was a high hit rate in forecasting direction of OC.The correct forecasting direction is essential for the policy manager to make an investment plan in oil-related operations (production, price, and demand).Therefore, the DA comparison is necessary.In Figure 6a-c, some similar conclusions can be drawn regarding DA criterion.(i) The proposed 3 rd SMLE model performed significantly better than all other models in all cases, followed by the other two ensemble models, then two of the single ML models (i.e., SVR, BPNN), (LR, AR) had equal values, and bagging model had the worst values.Individually, the DA values of all SMLE-based ensembles were similar 92.31% for the 1 step-ahead predictions and showed superiority with 91.24% for average 10-ahead step forecasts for the 3rd SMLE model.(ii) The three ensemble methods mostly outperformed the single prediction models.Furthermore, among the ensemble methods, the SMLE-based BPNN-SVR model performed the best, and SMLEbased BPNN model outperformed SMLE-based SVR model, except for the 2-ahead forecast.(iii) SVR model outperformed other methods, BPNN had the similar performance as SVR in the 2, 3, 5 step-ahead forecasts, except that SVR exceeded BPNN in both 1-ahead and average ahead prediction.The possible reason leading to this phenomenon may be the choice of optimal parameters for the models.We also found that bagging model had the lowest directional accuracy of 66.17%.Also, comparing different prediction horizons, the short-term prediction horizon showed better performance for in all the model see Table 4. Taking 1-step-ahead forecasting and an average of the 10-step-ahead predictions for example, for all the SMLE-based ensemble, BPNN, SVR, bagging, AR models, the 1-step-ahead forecasting outperformed the average of the 10-step-ahead forecast, no matter the level accuracy or directional accuracy.Apart from the models mentioned above, SMLEbased ensembles and ML models and classic ensembles performed better in 1-step-ahead prediction given directional accuracy.However, from the point of level accuracy, both these approaches only had slight superiority in 6-step-ahead prediction.Except for the LR, which performed almost poorer in the 1-step-ahead compared to the average of 10-step-ahead prediction as shown in Figure 7a-c.
Third, to further validation of SMLE models forecasting, the t-test was used to test the statistical significance of the prediction performance.The t-test results presented in Table 5, for all ensemble models under this study were not significant (df = 51, p-value > 0.05)).Based on the detailed statistical test, no significant differences were observed between the actual OC and that predicted by the SMLE models.The mean differences in the last column of Table 5, indicate that in the population from where the sample models were drawn, the actual and predicted OC was statistically semi-equal.Therefore, it was possible to prove that the SMLE model was useful in predicting OC based on the heterogeneous models with excellent levels of accuracy (see Table 5).So, we can conclude that the model developed structure is sufficient with more parameters setting (i.e., kernels, neuron) for OC prediction.Also, comparing different prediction horizons, the short-term prediction horizon showed better performance for in all the model see Table 4. Taking 1-step-ahead forecasting and an average of the 10-step-ahead predictions for example, for all the SMLE-based ensemble, BPNN, SVR, bagging, AR models, the 1-step-ahead forecasting outperformed the average of the 10-step-ahead forecast, no matter the level accuracy or directional accuracy.Apart from the models mentioned above, SMLE-based ensembles and ML models and classic ensembles performed better in 1-step-ahead prediction given directional accuracy.However, from the point of level accuracy, both these approaches only had slight superiority in 6-step-ahead prediction.Except for the LR, which performed almost poorer in the 1-step-ahead compared to the average of 10-step-ahead prediction as shown in Figure 7a-c.
Third, to further validation of SMLE models forecasting, the t-test was used to test the statistical significance of the prediction performance.The t-test results presented in Table 5, for all ensemble models under this study were not significant (df = 51, p-value > 0.05)).Based on the detailed statistical test, no significant differences were observed between the actual OC and that predicted by the SMLE models.The mean differences in the last column of Table 5, indicate that in the population from where the sample models were drawn, the actual and predicted OC was statistically semi-equal.Therefore, it was possible to prove that the SMLE model was useful in predicting OC based on the heterogeneous models with excellent levels of accuracy (see Table 5).So, we can conclude that the model developed structure is sufficient with more parameters setting (i.e., kernels, neuron) for OC prediction.The forecasted values for each model and total OC growth rate from 2017 to 2026 is summarized in Table 6.As seen from the table, all models will still be increasing in the period from 2017 to 2026.However, the average annual rates will decrease in all.For the period between 1965 and 2016, the rate of increases was 2.2% for BPNN, 1.3% for LR, 1.8% for SVR, 1.5% for bagging 1.6%, for AR 2.0% for SMLE-based SVR, 2.0% for SMLE-based BPNN, and 2.1% for SMLE-based BPNN-SVR.Additionally, for the forecasted period between 2017 and 2026 the rates were expected to be 0.74%, 1.39%, 1.38%,1.42%,1.39% 0.13%, 0.38%, and 0.44, respectively.On the other hand, the average annual rate of total oil demand decreased from 1.8% between 1965 and 2016 to 0.91% between 2017 and 2026.
Lastly, the summarized results in Table 6 demonstrate that the annual growth rates of 1-ahead OC were more significant than the total average OC in 10-ahead years.Figure 8, shows the apparent rise in the 1-ahead in both single and classic ensemble models, and for the SMLE models there was a sudden drop from 1-to 2-ahead years, also note the stability in the growth from 2-ahead to 10-ahead, with close values in all models, except for SMLE-based BPNN where there was a few decreasing in the 9-, 10-ahead, sequentially.The decrease in the rate of oil demand may be interpreted as there being other alternative energies that affect oil demand, this will be achieved in the coming decades, as compared with all other energy type consumption.Rates of changes and reserves in the OC of all the models indicate that the SMLE scheme was the best to determine the actual demand of energy globally, which facilitates the planning process, associated with the issue OC prediction.Based on these study findings, we suggested some recommendations.The forecasted values for each model and total OC growth rate from 2017 to 2026 is summarized in Table 6.As seen from the table, all models will still be increasing in the period from 2017 to 2026.However, the average annual rates will decrease in all.For the period between 1965 and 2016, the rate of increases was 2.2% for BPNN, 1.3% for LR, 1.8% for SVR, 1.5% for bagging 1.6%, for AR 2.0% for SMLE-based SVR, 2.0% for SMLE-based BPNN, and 2.1% for SMLE-based BPNN-SVR.Additionally, for the forecasted period between 2017 and 2026 the rates were expected to be 0.74%, 1.39%, 1.38%,1.42%,1.39% 0.13%, 0.38%, and 0.44, respectively.On the other hand, the average annual rate of total oil demand decreased from 1.8% between 1965 and 2016 to 0.91% between 2017 and 2026.
Lastly, the summarized results in Table 6 demonstrate that the annual growth rates of 1-ahead OC were more significant than the total average OC in 10-ahead years.Figure 8, shows the apparent rise in the 1-ahead in both single and classic ensemble models, and for the SMLE models there was a sudden drop from 1-to 2-ahead years, also note the stability in the growth from 2-ahead to 10-ahead, with close values in all models, except for SMLE-based BPNN where there was a few decreasing in the 9-, 10-ahead, sequentially.The decrease in the rate of oil demand may be interpreted as there being other alternative energies that affect oil demand, this will be achieved in the coming decades, as compared with all other energy type consumption.Rates of changes and reserves in the OC of all the models indicate that the SMLE scheme was the best to determine the actual demand of energy globally, which facilitates the planning process, associated with the issue OC prediction.Based on these study findings, we suggested some recommendations.We summarized all of the above results in Table 7 and Figure 9.In general, combining the forecasters using SMLE will significantly improve the final prediction.Generally, from the analysis of the experiments presented in this study, we can draw several important conclusions as follows: Firstly, the SMLE-based BPNN-SVR model was significantly superior to all models in this study regarding similarity, level accuracy, and direction accuracy.Through performance enhancement, the SMLE-based BPNN-SVR outperformed other models at the 1.17 statistical significance level, compared to the best benchmark models SVR and bagging, respectively.Secondly, the prediction performance of the SMLE-based BPNN-SVR, SMLE-based SVR and SMLE-based BPNN models were better than the single and classic ensemble methods.These results indicate that the hybrid, based on stacking method, can efficiently improve the prediction performance in the case of OC.Thirdly, nonlinear models, with seasonal adjustment, were more suitable as base learners for the ensemble to predict the time series with annual volatility than linear methods, due to properties above of OC (i.e., We summarized all of the above results in Table 7 and Figure 9.In general, combining the forecasters using SMLE will significantly improve the final prediction.Generally, from the analysis of the experiments presented in this study, we can draw several important conclusions as follows: Firstly, the SMLE-based BPNN-SVR model was significantly superior to all models in this study regarding similarity, level accuracy, and direction accuracy.Through performance enhancement, the SMLE-based BPNN-SVR outperformed other models at the 1.17 statistical significance level, compared to the best benchmark models SVR and bagging, respectively.Secondly, the prediction performance of the SMLE-based BPNN-SVR, SMLE-based SVR and SMLE-based BPNN models were better than the single and classic ensemble methods.These results indicate that the hybrid, based on stacking method, can efficiently improve the prediction performance in the case of OC.Thirdly, nonlinear models, with seasonal adjustment, were more suitable as base learners for the ensemble to predict the time series with annual volatility than linear methods, due to properties above of OC (i.e., nonlinear and non-stationary).However, computationally, the new method consumed more time because of its way of segmenting inputs and the use of the ensemble.Fourthly, the average annual rate of total oil demand decreased from 1.8% between 1965 and 2016 to 0.91% between 2017 and 2026.Finally, on one hand, short-term forecasting models, such as BPNN and SVM, provided excellent performance for one-step forecasting task.However, these models performed poorly or suffered severe degradation when applied to the general multistep problems.In general, the performance of ensemble forecasting models (e.g., combining short-term and long-term approaches) was better when compared to single models.Therefore, a forecasting combination can benefit from performance advantages of short-term and long-term models, while avoiding their disadvantages.Furthermore, to overcome the shortcoming of a static combination approach, a dynamic combination of short-and long-term forecasts can be employed by using horizon dependent weights.nonlinear and non-stationary).However, computationally, the new method consumed more time because of its way of segmenting inputs and the use of the ensemble.Fourthly, the average annual rate of total oil demand decreased from 1.8% between 1965 and 2016 to 0.91% between 2017 and 2026.Finally, on one hand, short-term forecasting models, such as BPNN and SVM, provided excellent performance for one-step forecasting task.However, these models performed poorly or suffered severe degradation when applied to the general multistep problems.In general, the performance of ensemble forecasting models (e.g., combining short-term and long-term approaches) was better when compared to single models.Therefore, a forecasting combination can benefit from performance advantages of short-term and long-term models, while avoiding their disadvantages.Furthermore, to overcome the shortcoming of a static combination approach, a dynamic combination of short-and long-term forecasts can be employed by using horizon dependent weights. 1 Score: sum of rank values from (1-8) for each model depends on performance in related measure. 2  Order value for each model depending on total score, for example rank no 1 means the first model.

Conclusions
Forecasting time series data is considered as one of the most critical applications and has concerned interests of researchers.In this study, we discussed the problem of combining heterogeneous forecasters and showed that ensemble learning methods could be readily adapted for this purpose.We have introduced a novel theoretical ensemble framework integrating BPNN, SVR, and LR, based on the principle of stacking; which was proposed for the GOC forecasting.This framework has been able to reduce uncertainty, improve forecasting performance, and manage the diversity of learning models in empirical analysis.
According to the experimental results and analyses, the proposed ensemble models have been able to outperform the classical ensemble and single models on OC data analyzed results.Furthermore, all ensemble models have been able to exceed the best performing individual models on single-ahead, as well as the multi-ahead horizon.
The advantages of proposed model to the knowledge comes therefore along three aspects as follows: Firstly, in methodology part, we have introduced a novel theoretical framework based on ensemble learning for OC forecasting.Although the ensemble concept is more demanding regarding computational requirements, it can significantly outperform single models and classical hybrid models.While the idea is straightforward, it is yet a robust approach, as it can outperform linear combination methods, as one does not know a priori which model will perform best.
Secondly, theoretically we have demonstrated that ensemble methods can be successfully used in the context of OC forecasting due to the ambiguity decomposition.
Thirdly, we have conducted a very extensive empirical analysis of advanced machine learning models, as well as ensemble methods.Just the calibration alone of such a wide range of ensemble models is very rare in the literature, considering that the ranking of some evaluation measures per model to run, which was not only due to the limited.
This study has two limitations including: the consideration of the integration of heterogeneous algorithms (SVR, BPNN and LR) without using ensemble pruning for internal hyper-parameters; and the evaluation process investigated on single data set, so that this model can verified in different data sets.All these limitations could be interesting future research.
In future work, homogeneous ensemble model based SVR with different kernels can be developed and evaluated.In addition, investigating ensemble pruning by using evolutionary algorithms that provides an automatic optimization approach to SVR hyper-parameters, could be an interesting future research work in the hybrid-based energy forecasting field.Another direction of future work is to apply ensemble models in other energy prediction problems, such as pricing, production, and load forecasting.


which were then cast into a second level data; the outcome became the input for the second level as training data.2.2.1.Ensemble Pruning

Figure 2 .
Figure 2. Comparison of (a) the actual and predicted consumption with the use of the SVR, BPNN and LR single learners (b) errors of all single models.

Figure 2 .
Figure 2. Comparison of (a) the actual and predicted consumption with the use of the SVR, BPNN and LR single learners (b) errors of all single models.

Figure 3 .
Figure 3. Illustrated (a) actual and predicted consumption using classic ensemble learners (b) error of classic ensemble models.

Figure 3 .
Figure 3. Illustrated (a) actual and predicted consumption using classic ensemble learners (b) error of classic ensemble models.

Figure 4 .
Figure 4. Illustrated (a) actual and predicted consumption using SMLE learners (b) error of SMLE models.

Figure 4 .
Figure 4. Illustrated (a) actual and predicted consumption using SMLE learners (b) error of SMLE models.

Figure 8 .
Figure 8. Illustrated annual CGR for 10-ahead consumption prediction using (a) single models (b) classic ensemble models (c) SMLE models.

Figure 8 .
Figure 8. Illustrated annual CGR for 10-ahead consumption prediction using (a) single models (b) classic ensemble models (c) SMLE models.

Table 1 .
Summary of related studies on forecasting OC between 2009 and 2017.Fuzzy Time Series; 2 Regression Time Series; 3 Artificial Bee Colony Algorithm; 4 Artificial Bee Colony Neural Network; 5 Cuckoo Search Neural Network; 6 Genetic Algorithm Neural Network; 7 Grey Markov; 8 Adaptive Neuro-Fuzzy Inference Systems; 9 Genetic Algorithm-Gray Neural Network; 10 Organization of the Petroleum Exporting Countries; 11 Global Oil Consumption; * Proposed Method. 1

Table 1 .
Summary of related studies on forecasting OC between 2009 and 2017.Fuzzy Time Series; 2 Regression Time Series; 3 Artificial Bee Colony Algorithm; 4 Artificial Bee Colony Neural Network; 5 Cuckoo Search Neural Network; 6 Genetic Algorithm Neural Network; 7 Grey Markov; 8 Adaptive Neuro-Fuzzy Inference Systems; 9 Genetic Algorithm-Gray Neural Network; 10 Organization of the Petroleum Exporting Countries; 11 Global Oil Consumption; * Proposed Method. 1

Table 2 .
Summary of parameters setting for all learners.
y 1 ), (x 2 , y 2 ), . . ., (x m , y m )}; First-level learning algorithms L 1 , L 2 , . . ., L n ; Second-level learning algorithm L; Process: %Train a first-level individual learner h t by applying the first-level learning algorithm L t to the original dataset D for t = 1, . . ., T: h t = L t (D) D = φ; for i = 1, . . ., m: for t = 1, . . ., T: z it = h i (x i ) % Use h t to predict training example x i end; D

Table 3 .
Summary of different evaluating measures among all models on GOC data.

Table 3 .
Summary of different evaluating measures among all models on GOC data.
Bold number indicates the best value in all measures.

Table 4 .
10-ahead forecasting performance among all models on GOC data.

Table 5 .
The t-test results of actual and predicted oil consumption using SMLE models.

Table 5 .
The t-test results of actual and predicted oil consumption using SMLE models.

Table 6 .
Summary of forecasted values and CGR for OC using all models from 2017 to 2026.

Table 6 .
Summary of forecasted values and CGR for OC using all models from 2017 to 2026.

Table 7 .
Summary of evaluation measures among all models on GOC data.
1 Score: sum of rank values from (1-8) for each model depends on performance in related measure.2Ordervalue for each model depending on total score, for example rank no 1 means the first model.

Table 7 .
Summary of evaluation measures among all models on GOC data.