Electricity Price Forecasting Based on Self-Adaptive Decomposition and Heterogeneous Ensemble Learning

: Electricity price forecasting plays a vital role in the financial markets. This paper proposes a self-adaptive, decomposed, heterogeneous, and ensemble learning model for short-term electricity price forecasting one, two, and three-months-ahead in the Brazilian market. Exogenous variables, such as supply, lagged prices and demand are considered as inputs signals of the forecasting model. Firstly, the coyote optimization algorithm is adopted to tune the hyperparameters of complementary ensemble empirical mode decomposition in the pre-processing phase. Next, three machine learning models, including extreme learning machine, gradient boosting machine, and support vector regression models, as well as Gaussian process, are designed with the intent of handling the components obtained through the signal decomposition approach with focus on time series forecasting. The individual forecasting models are directly integrated in order to obtain the final forecasting prices one to three-months-ahead. In this case, a grid of forecasting models is obtained. The best forecasting model is the one that has better generalization out-of-sample. The empirical results show the efficiency of the proposed model. Additionally, it can achieve forecasting errors lower than 4.2% in terms of symmetric mean absolute percentage error. The ranking of importance of the variables, from the smallest to the largest is, lagged prices, demand, and supply. This paper provided useful insights for multi-step-ahead forecasting in the electrical market, once the proposed model can enhance forecasting accuracy and stability.


Introduction
The electricity power market is an important research topic, which has been receiving much attention from research over the last years [1][2][3]. The high interest in this field is due to the forecasting Zhou et al. [23] coupled LSTM and ensemble empirical mode decomposition (EEMD) to forecasting electricity markets of Pennsylvania, New Jersey, and Maryland. Khalid et al. [24] proposed an optimized deep neural network framework to conduct electricity price forecasting based on the Jaya optimizer and LSTM approach. The main drawback of this proposed approach is related to the lack of the use of exogenous variables and use of decomposition approaches to cap the data variability. The use of artificial intelligence models, as well as non-linear models are attractive tools to make the forecasting system more robust, as observed in Ribeiro and Coelho [12] and Ribeiro et al. [13].
Many papers related to electricity forecasting are based on energy consumption. As exposed by Alipour et al. [25] renewable energy resources have an uncertainty of electric power generation, which can lead to problems in the electrical power systems. As presented by Kazemzadeh et al. [26] load forecast, it is one of the main base studies for the planning and operation of the expansion of the electric power system. In [27], an evaluation of the forecast is made for a project up to 2030, many models are covered in this work in order to show that there is a better performance depending on the algorithm.
Heydari et al. [28] proposed a composed model based on VMD, feature selection (selection of features related to hours of the day) in the Pennsylvania-New Jersey-Maryland and Spanish electricity markets. This paper lacks the discussion about the feasibility of the use of other features related to the electricity market. The authors argued that the obtained results outperform the results of some benchmarks, such as generalized regression and radial basis function neural networks.
In addition to electricity price and load forecasting, the use of artificial intelligence is powerful to assess the development of possible failures in the electrical system [29]. As presented by Stefenon et al. [30], the use of the wavelet transform reduces signal noise, improving the analysis of chaotic time signals. The results using the wavelet group method of data handling proved to be superior to well-consolidated algorithms as LSTM and adaptive neuro fuzzy inference system. Additionally, pre-processing techniques of time series are robust approaches that are widely used to de-noise the raw signal, and then enhance the forecasting accuracy, as approached in da Silva et al. [31].
Focused on the analysis of the electrical system about electricity generation in Brazil, many works evaluated the generation capacity concerning a hydroelectric source, as presented in Brito et al. [32] and Fredo et al. [33], or as a hydrothermal problem as discussed by Finardi et al. [34] and van Ackooij et al. [35]. Moreover, according to Silveira Gontijo and Azevedo Costa [36] in Brazil, there is a predominance of hydroelectric generation (73%), which makes the analysis of hydroelectric energy price forecasting an important field of study.
According to the above-related papers, there is no consensus about which decomposition method to employ in price forecasting analysis. Based on this, a different kind of decomposition, named complementary ensemble empirical mode decomposition (CEEMD), can be considered. In this context, the CEEMD, an extension of the EEMD method, has been applied in several fields of knowledge, such as crude oil price forecasting [37], short-term photovoltaic power generation forecasting [38], and detecting epileptic seizures in electroencephalogram [39].
Yeh et al. [40] proposed the CEEMD, in which the paired noises are perfectly anti-correlated and have an exact cancellation of the residue noise in the reconstruction of the signal. The CEEMD split the original signal in a set of components named intrinsic mode functions (IMFs) and one residue component, which are designed to represent a trend, seasonality, high and lower frequency. When considering the aforementioned, due to the use of the CEEMD approach, two questions emerge. The first one lies in which algorithm should be employed to train and forecast each component. The second refers to the selection of CEEMD's hyperparameters, namely the number of components, ensemble, and amplitude of standard deviation.
Alongside this, especially in Brazil, the evaluation of the energy price forecast is very important, when considering that the price of electricity has great influence due to the level of the large water reservoirs of hydroelectric plants. As the system is hydrothermal, when the reservoir level is low, thermoelectric plants are activated, depending on energy planning. The cost of generating thermoelectric plants is higher than that of hydroelectric plants, as there is a variable cost due to the inputs that are needed to produce energy, so there can be a wide range of higher prices in dry periods. For this reason, evaluating the forecast of the future price, based on seasonal market and climate factors, generate greater reliability for the energy market.

Objective and Contribution
Therefore, this paper proposes a self-adaptive decomposed heterogeneous ensemble learning model. The methods CEEMD, coyote optimization algorithm (COA) [41] and machine learning are combined to develop a heterogeneous ensemble learning model, to forecasting commercial and industrial electricity prices in Brazil for multi-step-ahead (one, two, and three months-ahead) horizons. Exogenous variables, including energy generation (supply), energy prices lagged, and consumption (demand) are considered to be inputs by the forecasting model. Firstly, the COA optimizer is applied to define the CEEMD's hyperparameters and, subsequently, CEEMD decomposes the series of electricity energy prices (commercial and industrial). Thereafter, the components obtained in the previous step (IMFs and one residue component) are trained using extreme learning machines (ELM) [42], SVR [43], Gaussian process (GP) [44], and gradient boosting machines (GBM) [45]. These individual models are chosen due to the effects already observed for regression and time series forecasting tasks, as described in [46][47][48].
The hyperparameters of each model are obtained by grid-search during leave-one-out cross-validation time slice (LOOCV-TS). Finally, the prediction results of different components are directly integrated to generate the final electricity price. Afterwards, by the grid of models, the most adequate model is the one with the best generalization out-of-sample capacity in terms of coefficient of determination (R 2 ), symmetrical mean absolute percentage error (sMAPE), root mean squared error (RMSE), and overall weight average (OWA). The proposed framework is compared with COA-CEEMD homogeneous based methods, i.e., approaches which consider the same model to handle all components, as well as with homogeneous ensemble learning models that adopted maximal overlap discrete wavelet transform (MODWT) [49] for data pre-processing. Moreover, the autoregressive integrated moving average (ARIMA) [50], naïve [51], and theta models [52,53] are used as additional benchmarks.
The contributions of this paper are described, as follows: • Firstly, this paper contributes to the field of time series pre-processing by coupling the CEEMD with metaheuristic approach named COA to tune its hyperparameters; • Second, based on the literature review gap, exogenous variables related to supply and demand are used as inputs for each evaluated model, and their importance is assessed. The inputs associated to supply are the generation of hydraulic, nuclear, and thermal energy. Thus, the variables related to demand are the monthly consumption for each area (commercial and industrial). Through the use of these variables is intended to giving additional information for the models to learn the data behavior, so that they achieve high forecasting accuracy; • Third, with the combination of the different non-linear models (ELM, SVR, GP, and GBM) to train and predict each component of the decomposed stage, the developed model can learn the data patterns and reflect the high-frequency of electricity price data; and, • Also, this paper contributes for the literature of models used to forecasting electricity prices by investigating the performance of decomposed homogeneous and heterogeneous ensemble learning models.
The organization of the remainder of this paper is as follows: Section 2.1 presents the datasets that were adopted in this paper. Section 2.2 brings a brief description of the adopted methods in the forecasting framework. Section 3 details the procedures of the research methodology. Afterwards, Section 4 describes the results and discussions. Finally, Section 5 concludes the paper with final considerations, limitations of the study, and proposals of research directions.

Material & Methods
This Section presents the description of the data (Section 2.1) and methods applied in this paper (Section 2.2).

Material
The datasets analyzed in this paper refer to Brazil's commercial and industrial electricity prices (Brazilian currency-Real-R$) by megawatt-hour (MWh). Additionally, exogenous variables, such as energy generation (supply) and consumption (demand) (MWh), are considered. The datasets consist of 289 monthly observations from April 1996 to December 2019. These data were obtained from the website of the Institute of Applied Economics Research (IPEA) (Instituto de Pesquisa Econômica Aplicada, in Portuguese) available in http://www.ipeadata.gov.br/Default.aspx.
The datasets were split into training and testing sets in the proportion of 70% and 30%, respectively. The first 70% of them were used to train the adopted models and the remaining 30% of them were used to test the effectiveness of evaluated forecasting models. This range is commonly used in the literature, such as observed in Ribeiro and Coelho [12], and used in this paper. This proportion adopted to split the datasets into training and test sets allows for us to give the models more information to the adopted models learn the prices' dynamic, as well as to have a sufficient amount of the data to evaluate the effectiveness of the proposed model. Figure 1 illustrates the electricity energy prices series, and the time series for the exogenous variables are shown in Figures 2 and 3. According to the Shapiro-Wilk normality test, the output variables for commercial and residential cases do not present a normal distribution (W = 0.9130 − 0.9215, p-value < 0.05). In Table 1, a summary of the statistical indicators of commercial and industrial electricity prices is shown. Moreover, previous prices are used as inputs signals by the forecasting system, and more details are described in the methodology section. However, to avoid repetition, the statistical indicators are not addressed in Table 1 for this variable.

Methods
This subsection describes the methods employed in this paper.

Coyote Optimization Algorithm
The COA optimizer is a swarm intelligence algorithm that considers the social relations of the Canis latrans species and its adaptation to the environment proposed by [41] devoted to solving optimization problems. Therefore, the COA mechanism has been designed based on the social conditions of the coyotes, which means the decision variables x of a global optimization problem [54].
The COA's performance was evaluated under 40 benchmark functions in the optimization field with different features as multimodality, nonlinearity, separability, and the number of optimized variables. In most of the benchmarks evaluated, set of 40 benchmark functions, the COA optimizer outperformed some classical metaheuristics, such as particle swarm optimization, artificial bee colony, symbiotic organisms search, grey wolf optimizer, bat-inspired algorithm, and firefly algorithm. Through these results, it is possible to state that COA could be applied in new applications and problems, such as that proposed in this paper.
This algorithm has recently been applied in several applications, especially to feature selection [54], tune heavy-duty gas turbine hyperparameters [55], optimal power flow for transmission power networks [56] define networks reconfiguration [57], and for optimal parameter estimation of a proton exchange membrane fuel cell [58]. Due to the promising potentials results, a search of the literature reveals that the COA has not yet been applied for the CEEMD's hyperparameters definition, then it is adopted. In the COA approach, there are only two control parameters, the number of packs (N p ) and the number of coyotes per pack (N c ). The population size is defined by multiplying (N p ) and (N c ), both natural numbers, and then the population is divided into packs with coyotes each.

Complementary Ensemble Empirical Mode Decomposition
The empirical mode decomposition (EMD) [59] and its improvements, such as EEMD and CEEMD [40], were proposed to deal with the non-linearity and non-stationarity of time series. The EMD separates the original signal into IMFs and one residue component. The main drawback of this decomposition is named mode mixed problem (MMP). The MMP is characterized by the fact that disparate scales could appear in one IMF. Next, to overcome this disadvantage, EEMD was proposed, and in the sequence CEEMD.
Despite the fact that EEMD has effectively resolved the MMP, the residue noise in the signal reconstruction has been raised, and the noise is independent and identically distributed [60]. To improve EEMD, [40] proposed the CEEMD, in which the paired noises are perfectly anti-correlated and have an exact cancellation of the residue noise in the reconstruction of the signal. Because of the effectiveness of CEEMD to de-noise time series, it has been applied for crude oil price forecasting [37], wind speed forecasting [61], and this paper employs this decomposition approach to pre-process the Brazilian electricity energy prices.
The CEEMD has three main hyperparameters, named as the number of trials or number of ensembles, the number of components, and noise amplitude. Especially, the noise amplitude is designed to be some percentage of the data standard deviation. In most of the cases, these hyperparameters are defined by trial and error procedure [37,60,62]. However, this paper proposes the use of the COA approach to minimize the inverse of the orthogonal index (OI) [16]. The OI is used to measure the orthogonality of the EMD numerically, and a value close to zero is desirable. A smaller OI indicates the best decomposition result [59].
The OI can be computed, as follows: in which T is the number of time-series observations, IMF i and IMF j are the i-th and j-th components, k is the number of components, and x(t) is the original signal at time t = 0, . . . , T.

Extreme Learning Machine
The ELM is a learning algorithm proposed by [42] designed for single-hidden layer feedforward neural networks. In this approach, hidden nodes are randomly chosen and outputs are obtained analytically. Good generalization and fast learning speed are the main advantages of ELM [48]. The input weights and hidden biases are specified arbitrarily and then are fixed. The output weights are obtained by solving the multiplication of the Moore-Penrose Generalized inverse matrix of the output variable matrix [63].

Gradient Boosting Machine
The GBM is an ensemble learning approach that employs a sequential learning process to build an efficient classification or regression model [45]. A regression tree is initially fitted to the data and, on this basis, predictions and the initial residue are computed. A new model is fitted to the previous residue, a new prediction, to which the initial forecast is added, and then a new residue is obtained. This process is iteratively repeated until a convergence criterion is obtained. In each iteration, a new model is fitted to the data, aiming to compensate for the weaknesses of the previous model [12].

Gaussian Process
A GP is a stochastic process, in which every set of the random variable is multivariate normally distributed. In this respect, a GP is entirely specified by its statistical orders mean and covariance or kernel function. Through kernel function, it is possible to maps the similarity between observations of the training set with the purpose of predicting new observations [44].

Support Vector Regression
The SVR consists in determining support vectors close to a hyperplane that maximizes the margin between two-point classes obtained from the difference between the target value and a threshold. To deal with non-linear problems, SVR takes into account kernel functions, which calculates the similarity between two observations. In this paper, the linear kernel is adopted. The main advantages of the use of the SVR lie in its capacity to capture the predictor non-linearity and then use it to improve the forecasting cases. In the same direction, it is advantageous to employ this perspective in this adopted case study, once the samples are small [64].

Performance Indicators
To check the forecasting models' performance, the sMAPE (2), R 2 (4), and RMSE (3) criteria are used. Additionally, a criterion which combines the sMAPE and RMSE, named OWA (5) is introduced to evaluating the accuracy of the proposed model against benchmark compared models. According to Makridakis et al. [65], the use of OWA would help to achieve a higher level of objectivity. These measures are described in the following.
where n is the number of observations, y i andŷ i are the i-th observed and predicted values, respectively. additionally, the Criteria c and Criteria p represent the performance measure of compared and proposed model, respectively.
By considering the criteria sMAPE, and RMSE, lower values are desired, while, for R 2 , a value closest to one indicates better performance. Additionally, the Diebold-Mariano (DM) test [66] is applied in this paper to compare the forecasting errors of proposed versus compared forecasting models.

The Proposed Self-Adaptive Decomposed Heterogeneous Ensemble Learning Model
This section presents the steps adopted to develop the self-adaptive decomposed heterogeneous ensemble learning model.
Step 1: The COA is coupled with CEEMD to tune the CEEMD's hyperparameters. For COA optimizer, the number of coyotes and packs are defined as 5 and 10, respectively. These values are selected by the trial and error, once that there is no guideline for the definition of COA's hyperparameters [41]. Moreover, if the increase in the number of packs, and/or coyotes is considered, the optimization time will also increase due to the greater number of evaluations to be carried out. However, for this problem, it was observed that the accuracy does not improve significantly. In this way, the initial values adopted for these parameters are fixed, for both problems. Table 2 shows the CEEMD's hyperparameters defined by COA, where 50 generations and population of size equals to 50 is adopted. Finally, the original electricity price is decomposed. Step 2: Each component obtained in step 1 (three IMFs and one residue) is trained using ELM, GBM, GP, and SVR. In the training stage, LOOCV-TS IS adopted. The inputs are defined by auto-correlation and partial auto-correlation analysis. The data are centered by its mean value and divided by its standard deviation. The training structure is stated, as follows: and forecast electricity energy prices one-month-ahead (7), two-months-ahead (8), and three-months-ahead (9) according to: in which f is a function that is related to the adopted model in the training stage,ŷ (t+h,k) is the forecast value for k-th component obtained in the decomposition stage (k = 1,. . . ,4) on time t and forecast horizon h (h = 1, 2, 3), y (t+h−n y ,k) are the previously prices lagged in n y = 1, . . . , 3 and is the random error which follows a normal distribution N with zero mean and variance σ 2 . Also, X (t+h−n x ) is the exogenous inputs vector at the maximum lag of inputs (n x = 1 if h = 1, and n x = 3 if h = 3). The behavior observed in the residue component of both time series is due to the growing trend in prices. Additionally, for the GP approach, there are no hyperparameters for tuning and the linear kernel function is used. Moreover, for GBM the shrinkage and minimum number of terminal node size are held constant equal 0.1 and 10, respectively. Finally, for SVR, the Radial Basis kernel function is adopted. The kernels of GP and SVR were defined by grid-search by RMSE minimization during LOOCV-TS.
Step 3: The forecasts of different models used for each component are directly integrated (simple sum) to generate final electricity price values. Afterwards, by the grid of models, the most adequate model is the one with the best generalization out-of-sample capacity in terms of RMSE and sMAPE. Table 3 describes the models that are used for each component.   Step 4: Obtaining forecasts out-of-sample (test set), and the performance indicators defined in Section 2.3 are computed and two kinds of comparisons are conducted. The first is the comparison of self-adaptive decomposed heterogeneous and homogeneous ensemble learning models. Second, a comparison of the proposed model and models without decomposition are developed. Table 4 presents the models' hyperparameters obtained by grid-search.  Figure 6 summarize the main steps used in data analysis. The R software [67] is adopted to perform the modeling. The packages caret [68] and forecTheta [69]. The ARIMA modeling is performed through the use of forecast package [70,71] with use of auto.arima function. To define the ARIMA order, grid-search is adopted, and the most suitable order is that reach lower Akaike and Bayesian Akaike criteria information.

Results
This section describes the results of the developed experiments in three ways in forecasts out-of-sample (test set). First, Section 4.1 is designed to compare the results of the proposed model and self-adaptive decomposed homogeneous ensemble learning models. In the sequence, Section 4.2 is used to compare the performance of developed approaches and non-decomposed models. To finish, Section 4.3 presents the DM test to statistically evaluate the errors of the proposed approach versus other models. Additionally, Figure 7 presents the variables importance, Figures 8 and 9 illustrate the relation between the observed and predicted values. Additionally, Figure 10 shows the magnitude of the sum of standardized squared errors. In Tables 5 and 6, the best results are presented in bold.      Table 5 illustrates the performance of developed and self-adaptive decomposed homogeneous ensemble learning models named COA-CEEMD-GP, COA-CEEMD-ELM, COA-CEEMD-SVR, and COA-CEEMD-GBM, as well compared MODWT-based homogeneous models.
By investigating the improvement on the errors of the proposed model regarding compared COA-CEEMD homogeneous ensemble learning models, it is possible to infer that the OWA criterion is ranged between 2.51-45.28%, 2.23-89.05%, and 0.87-86.48%, for commercial electricity price on one, two and three-months-ahead forecasting, respectively. In the comparison of COA-CEEMD heterogeneous ensemble learning model with MODWT ensemble learning models, the OWA ranges between 20.44-45.35%, 44.69-89.28%, and 42.02-88.51%, for commercial electricity price on one, two, and three-months-ahead forecasting, respectively.
The reason for the high forecasting error of COA-CEEMD-GBM and COA-CEEMD-ELM is attributed to forecasting errors and its high variability for each component. In general, the proposed model employed ELM or GBM for the first two components in the context of adopted datasets, as mentioned in Table 3. For these components, the GBM and ELM models achieved lower forecasting errors.
When considering the COA-CEEMD-GBM and COA-CEEMD-ELM models for the commercial dataset, the GBM and ELM models have achieved a high forecasting error values for the components 3 and 4. A similar analysis can be conducted in the context of industrial dataset. Therefore, this results lead the models to achieve a high forecasting error in the general model.
In the industrial context, the same pattern is observed, once the regarding the MODWT based models is ranged between 62.95-92.26%, 63.65-90.15%, and 55.06-87.29% in horizons of one up to three-months-ahead, respectively. Therefore, it should be noted that the use of different models to compose an ensemble of components improves the final accuracy of the forecasting model. This is verified for the case of commercial as well as industrial electricity prices. The observed superiority of the proposed approach regarding these compared models should be attributed to the fact that diversity of heterogeneous ensembles is higher than homogeneous ensembles, which plays an important role in ensemble learning model [72]. Table 6 shows the performance of the developed and non-decomposed models, named GP, ELM, SVR, and GBM.

Comparison of Proposed and Non-Decomposed Models
Concerning the enhancement of hybrid model regarding non-decomposed models, the OWA in one-month-ahead forecasting ranged between 9.02-45.02% and 27.36-92.62% in commercial and industrial electricity prices, respectively. Similar behavior is observed for the other two time windows. In fact, the OWA ranges between 42.02-88.51% and 22.91-90.28% for two-months-ahead horizon, as well as 8.25-86.38% and 12.71-87.59%, for three-months-ahead horizon. The performance of the proposed ensemble learning model is excellent, once that the forecasting errors are lower than 5%.
In respect of the non-decomposed approaches, the GBM and ELM models have difficulty in training and predicting the electricity prices due to high level of uncertain of the prices, which justify the lower performance. In this way, results lead to high forecasting errors and lower stability as point out in Tables 5 and 6, as well as in Figure 10.
In respect of the results thatare presented in Table 6, the outcomes of this paper reinforce findings presented by Hao et al. [73], which point out the benefits of using decomposition techniques as a way of pre-processing time series. In particular, the use of the COA-CEEMD approach is important for the development of an effective model for forecasting electricity prices. Second, to Yeh et al. [40], the use of signal decomposition as the pre-processing step is useful in the analysis of time series field because through the use of this technique it is possible to deal with non-stationarity and non-linearity behaviors of the data. Additionally, the results that are described in this section corroborate the findings of Agrawal et al. [74], once the ensemble learning models achieved better accuracy than its members.
In comparison to the proposed model with the ARIMA model, the naïve, and theta models, there are the following results. When considering the average (standard deviation) of RMSE for the proposed model over the three forecasting horizons, the average accuracy equals 17.05 (2.96) and 15.38 (3.33) in the commercial and industrial datasets, respectively. When considering the ARIMA model, the average accuracy equals to 22.24 (5.02) and 24.06 (4.28), in the commercial and industrial datasets, respectively. In respect to the naïve method, the average accuracy equals 23.84 (5.14) and 24.30 (4.54), in the commercial and industrial datasets, respectively. Finally, for the theta model, the average accuracy equals to 23.50 (5.06) and 24.25 (4.35), in the commercial and industrial datasets, respectively. In the context of described accuracy, the proposed model outperforms the results of these three benchmarks approaches. The naïve model forecasting the next h steps-ahead equal to the previous h time steps, which do not address satisfactory information to the decision making process. The ARIMA, naïve, and theta models are less complex than the proposed model, and they also achieved competitive results. However, these approaches are not adequate for forecasting the time series adopted in this paper (single input and multiple-output). Moreover, they do not allow for incorporating the information of exogenous variables such as supply and demand, as well as to access the importance of these features.

Statistical Tests to Compare Proposed and Benchmark Models
In order to demonstrate the statistical comparisons between errors of the proposed and compared models described in Sections 4.1 and 4.2, in Table 7 can be seen the statistic of DM test, as well as when the comparisons are statistically significant. Through the DM test, it can be stated that, in 95.83% of the cases, the proposed approach reached statistically lower errors than the other models. The GP model reaches similar errors regarding the proposed forecasting framework, but greater than proposed modeling. Figure 7 illustrates the importance of each variable to the forecasting proposed models. It is possible to observe that the past electricity prices are the most important. In the next, the demand variables (commercial and industrial consumption) and supply hydraulic energies have similar importance. Finally, thermal and nuclear supply are less important, but they should be considered by the forecasting model. Figures 8 and 9 expose that the self-adaptive decomposed ensemble learning model learn the data behavior, being able to obtain forecast prices that are similar to the observed values. For commercial and industrial datasets, the good performance (regarding RMSE and sMAPE) in the training set is maintained in the test set.
Finally, Figure 10a,b present the standard error of each model in the out-of-sample forecasting for the adopted forecasting horizons, i.e., red (one-month-ahead), blue (two-months-ahead), and green The best models are represented by smaller bars. The models with better accuracy presented in Tables 5 and 6 reached better stability (errors with lower standard deviation), especially the proposed COA-CEEMD heterogeneous ensemble learning model. Therefore, the results that are exposed in previous sections are confirmed.
Based on Figure 11 it is possible to evaluate the residue (partial auto-correlation function-PACF) of final models adopted in each case study, from training set results. In this respect, once most of the lags are inside the boundaries, we can see that the models are well fitted.

Conclusions
In this paper, a self-adaptive decomposed heterogeneous ensemble learning model was proposed in order to forecast multi-step-ahead (one, two, and three-months-ahead) Brazilian commercial and industrial electric energy prices. Exogenous variables, such as demand (commercial and industrial consumption) and supply (hydraulic, thermal, and nuclear) (MWh), were adopted. In the first stage, the COA optimizer was adopted to define the hyperparameters of pre-processing CEEMD. In the sequence, the four obtained components (three IMFs and one residue component) by COA-CEEMD were trained and predict the time series by different forecasting models (ELM, GBM, GP, and SVR). The grid-search approach was conducted in order to choose the most suitable model for each component. The final forecasts were obtained through a heterogeneous ensemble learning of components directly integrated. Finally, the average importance of each variable used as model inputs was computed for the proposed forecasting model.
Our findings suggest that: (i) the COA-CEEMD ensemble learning models achieve better forecasting accuracy than single forecasting models; (ii) the pre-processing of the energy prices through COA-CEEMD is better than MODWT; (iii) the use of different models for components allow for improving the final accuracy concerning the use of a homogeneous ensemble model of components; (iv) the proposed approach reaches better accuracy than the compared models, and the good performance is constant when the forecast horizon is expanded; and, (v) the variables importance ranking, from the smallest to the larger is, energy prices lagged, demand, and supply. Moreover, competitive results are achieved by ARIMA, naïve, and theta models regarding the proposed framework. However, these models do not allow us to evaluate the importance of exogenous inputs, which plays a key role in the decision-making process.
The evaluation of the electricity price forecast can be used to assess feasibility for future expansions of the electric system, through the incentive of investment in the electric sector using incentive policies. Defining how much energy may cost in the future can facilitate the justification for investing in new electricity generation ventures. In Brazil, there is a potential for the use of wind farms, due to the geographic characteristics of the country. The evaluation of the investment potential in this segment can be analyzed based on the forecast of future water inflow, when considering that the cost of energy is calculated based on the level of the water reservoirs.
For future works, it is desirable (i) to decompose the raw data using an ensemble of pre-processing and to evaluate the effect of different ranges of splitting training and test sets in the forecasting systems; (ii) reconstruct the decomposed signal through the weighted integration considering the no negative constraint theory; (iii) selection of the models for components through optimization techniques and considering the metaheuristics in order to obtain the best hyperparameters, such as the new approaches like [75][76][77][78][79]; (iv) developing an adaptive version of COA to define the number of coyotes and packs; and, (v) to perform cross-country comparisons using different forecasting models linking between models accuracy and economic aspects.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: