The Hybridization of Ensemble Empirical Mode Decomposition with Forecasting Models: Application of Short-Term Wind Speed and Power Modeling

: In this research, two hybrid intelligent models are proposed for prediction accuracy enhancement for wind speed and power modeling. The established models are based on the hybridisation of Ensemble Empirical Mode Decomposition (EEMD) with a Pattern Sequence-based Forecasting (PSF) model and the integration of EEMD-PSF with Autoregressive Integrated Moving Average (ARIMA) model. In both models (i.e., EEMD-PSF and EEMD-PSF-ARIMA), the EEMD method is used to decompose the time-series into a set of sub-series and the forecasting of each sub-series is initiated by respective prediction models. In the EEMD-PSF model, all sub-series are predicted using the PSF model, whereas in the EEMD-PSF-ARIMA model, the sub-series with high and low frequencies are predicted using PSF and ARIMA, respectively. The selection of the PSF or ARIMA models for the prediction process is dependent on the time-series characteristics of the decomposed series obtained with the EEMD method. The proposed models are examined for predicting wind speed and wind power time-series at Maharashtra state, India. In case of short-term wind power time-series prediction, both proposed methods have shown at least 18.03 and 14.78 percentage improvement in forecast accuracy in terms of root mean square error (RMSE) as compared to contemporary methods considered in this study for direct and iterated strategies, respectively. Similarly, for wind speed data, those improvement observed to be 20.00 and 23.80 percentages, respectively. These attained prediction results evidenced the potential of the proposed models for the wind speed and wind power forecasting. The current proposed methodology is transformed into R package ‘decomposedPSF’ which is discussed in the Appendix.


Introduction
Wind energy is a clean and renewable source, which can be dependent on for the very long-term future [1,2]. The use of wind energy saves fossil fuels, since it is non-polluting in nature and its generated energy does not lead to greenhouse gases and radioactivity. This encourages the use of the wind as a free, clean and sustainable source of energy across the world [3,4].
The process of energy generation is usually affected by the uncertain nature of wind energy [5]. To minimize the uncertainty due to intermittent winds, the accurate forecast of wind energy is observed to be the utmost important task for energy managers and electricity operators. The accurate forecast of wind speed can be used in the evaluation of wind energy potential, in the design of wind farms, scheduling and distribution of the power and other situations [6]. Hence, the precise wind energy forecast became a very important task with large benefits and a huge impact for the mankind.
The major wind energy forecasting approaches can be classified majorly into three parts: (1) Model-driven, (2) Data-driven and (3) Hybrid intelligent approaches [7]. In the model-driven approach, abundant meteorological information of distinct physical factors affecting wind energy are required [8]. Whereas, in data-driven approaches, statistical modeling based on data are used for the simulation. Because of recent advancements in the artificial intelligence (AI) research era and higher computational capabilities, higher prediction accuracy can be achieved [9]. Such models require only the historical data for implementation. Many research studies shed light on the performance of various data-driven approaches, which includes basic persistence model [10] and complex models including support vector machine (SVM) [11,12], neural network (NN) [13,14], ARIMA [15], etc.; however, being highly stochastic nature of wind power time-series, it becomes difficult to predict within significant error range. Commonly, such problems are overcome with the hybridization of two or more approaches, where multiple models are integrated to forecast the targeted wind data. Furthermore, several researches indicated the potential of the implementations on the hybrid modeling [16,17] and many others have explained how hybrid intelligent models are better than the individual ones.
The implementation of ARIMA model for modeling wind energy prediction, various modified versions as well as its hybrid models are successfully reported in reducing prediction error [18,19]. Hybrid model of ARIMA and artificial neural networks (ANN) [20] in which ARIMA method predicted the wind speed and the non-linear nature of the time-series was handled by ANN model. Hybridization of ARMA with Generalized Autoregressive Conditional Heteroskedasticity (GARCH) method is exercised in [21], which accurately captured the trend behavior in wind speed. A hybrid ARIMA-Kalman and ARIMA-ANN models [22] were applied for hourly wind energy forecast. A nonlinear autoregressive exogenous (NARX) artificial neural network based multivariate [23] and the fractional autoregressive integrated moving average (f-ARIMA) models [18] used to forecast wind speed and power. All the forgoing studies showed the potential of the hybrid version of ARIMA models in simulating wind speed time-series.
Due to the chaotic and very complex nature of wind speed and power data, understanding the characteristics in order to model a prediction method becomes difficult. To propose a stable forecasting model, it becomes very important to study the data characteristics. Hence, the decomposition analysis of such time-series can be a better step to analyze the time-series characteristics in more detail [24,25].
The hybridization with the decomposition methods is a very famous and effective approach [26]. In a decomposition method, the wind power time-series is decomposed into various sub-series and the cumulative predictions of each sub-series are treated as forecast results. The wavelet transform (WT) [27] and EMD [28] are the most commonly used decomposition methods for wind power time-series prediction. The decomposition with wavelet transform needs prior knowledge of the data whereas the EMD method works on a predefined methodology irrespective of the nature of data. Many EMD based hybrid models show the improvements in forecasting accuracy. In [29], the EMD-ANN model was proposed, where wind speed data was decomposed into a set of sub-series and then ANN method forecasted all sub-series. On the basis of a similar principle, EMD-ARIMA [30], EMD-SVM [31] and many other methods were proposed.
However, the mode mixing problem in the EMD method has affected the result accuracy adversely. Wu and Huang [32] proposed the EEMD method to reduce the effects of the mode mixing problem. Some of such hybrid EEMD models are disucssed in [33][34][35], which showed significant improvements in forecasting results as compared to EMD ones. A detailed review on EMD and EEMD methods based hybrid models for wind speed and power predictions is discussed and compared in [36]. This review is focused on various objectives including the EMD methods evolution, novel ways of handling intrinsic mode functions (IMFs) generated with EMD/EEMD methods. This review concluded that the wind energy prediction are in favor of the hybrid models, which shows their accuracy as compared to the non-hybridized ones. Similar conclusions are observed with recent studies [37][38][39][40].
The wind power data time-series is not an independent phenomenon. It is dependent on various climate variables such as wind direction, wind speed, air temperature, turbines and its physical characteristics [41]. The forecast of wind power indirectly by forecasting wind speed data and then transforming it to wind power with the help of the power curve, which indicates the cubic relation between wind speed and power [42][43][44][45]. Actually, wind power is a function of the cube of the wind speed [44]. Hence, there are always various stands on deciding the best strategy to design a prediction model for either wind power or wind speed [46].
In recent years, the PSF model has been applied in a variety of research domains. The authors in [47,48] proposed the forecasting of electricity price with PSF models compared with ARIMA, ANN, weighted neural network (WNN) and mixed models. Those studies concluded the more accurate performance of the PSF method. It outperformed ARIMA and kNN in electric vehicle charging energy consumption forecasting [49]. Further, the hybridization of NN with PSF methods was exercised in [50]. This hybrid method shows the best performance in forecasting electricity demand. In [51], the PSF algorithm was in forecasting energy demand based on the photovoltaic energy records using the non-negative tensor factorization instead of k-means clustering method. Again, the PSF method is used and compared with state-of-the-art forecasting methods in distinct studies [52]. For the first time, [53] used the PSF method to forecast wind speed data. These prediction results are promising and presented the possibility in improvement in prediction with modifications of the PSF method.
In this paper, two hybrid prediction models (EEMD-PSF and EEMD-PSF-ARIMA) are proposed for both wind power and speed forecasting. In the first model (EEMD-PSF), the wind power/speed time-series is decomposed with the EEMD method and each decomposed sub-series is forecasted with PSF. Similarly, in EEMD-PSF-ARIMA they are categorized into stationary and non-stationary structures. The stationary sub-series are predicted with the PSF model, while non-stationary sub-series are predicted with the ARIMA one. Finally, the performance of the proposed models is compared and evaluated with eight models including PSF, ARIMA, least squares support vector machine (LSSVM) methods and their hybridized models with EMD and EEMD methods. The performance of hybrid EMD and EEMD models are compared with prediction accuracy and consumption time aspects.
The rest of the paper is organized as follows. Section 2 focuses on the acquired methods and the proposed hybrid models in detail. The forecasting results of proposed models are evaluated and compared in Sections 3 and 4, where a case study is presented. Finally, the conclusions are presented in Section 5. Further, the R package which facilitates the efficient use of the proposed models is described in Appendix A.

State-of-the-Art Methods
This subsection presents a brief discussion about conventional algorithms used in the proposed methodologies and a comparison of them. These methods include EMD, EEMD, PSF, and ARIMA.

Empirical Mode Decomposition (EMD) Method
EMD is a very famous and widely accepted decomposition method that is generally applied to non-stationary and nonlinear time-series [28]. It decomposes such a time-series into a finite number of intrinsic modes known as IMFs and a residual.
This process is dependent only on the statistical nature of the time-series. Firstly, it finds local minimas and maximas in the time-series and generates lower and upper envelopes corresponding to the minima and maxima values, respectively, by using an interpolation method. Then, the average of lower and upper envelopes are removed from the original time-series to achieve the local IMFs. This procedure is repeated till following two situations get contented: • mean of both envelopes (lower and upper) approaches zero • count of minima and maxima, and zero crossings differs at most by one This is a sifting process and represented as shown in (1).
where I MF i (t) and R N (t) represent generated IMFs and a residue with the decomposition method.

Ensemble Empirical Mode Decomposition (EEMD) Method
The conventional EMD usually suffers from the problem of mode mixing, which is an existence of the signal frquencies with highly desperate scales seen in IMFs. Ref. [33,54] discussed the mode mixing concept in detail. This problem of mode mixing is reduced significantly with the EEMD method which is the inheritor of EMD discussed in [32]. In EEMD method, the ultimate IMF signals are estimated by averaging IMFs obtained with trials of new noises. In EEMD, each trail consisting decomposed signals along with a white noise having finite amplitudes. It is interesting to observe that EEMD acts as a self-adaptive filter [55,56].
The procedure of the EEMD method is initiated by introducing white noise to the original wind power/speed time-series. Then IMFs and residue are generated with the EMD method. These two steps are repeated with different white noises and the corresponding IMFs are obtained. This process is repeated for few finite numbers, which is also known as ensemble numbers.
The final IMF and a residue generated with the EEMD method is achieved as the IMFs and the residue means at each repetition. However, the original time-series will not be achieved with the addition of these IMFs and residue, but, the EEMD method was supported in [32][33][34][35] because of its better prediction performance and smoother IMFs. In all these IMFs, similar scaled frequencies are obtained. This phenomena is discussed more detailly in [33,54].

Pattern Sequence based Forecasting (PSF) Method
The PSF method was proposed in [47] and then its utility was explained in detail in [48]. The prediction with the PSF method is dependent on patterns occurred in a time-series. It consists of sub-processes such as data normalization, clustering, clustered data based forecasting and de-normalization, etc. The novelty of this method is that, it uses labels for the patterns present in the data, and not using the original one. With the normalization process, the redundancies present in the data can be eliminated. It is done by (2).
where N is its size in units of time and X j is the jth value of every cycle in the time-series. In the PSF algorithm, the patterns present in the data are replaced with labels and it is done with clustering methods. This method produces clusters by k-means clustering. The advantages of the k-means clustering technique is that it is easy to use and consumes very few calculation time, but it needs the prior information of the number of centers into which the data has to be clustered. Ref. [47] suggested the so-called Silhouette index in order to decide suitable numbers of cluster centers, whereas, [48] suggested three different indices, i.e., the Dunn index [57], the Silhouette index [58], and the Davies-Bouldin index [59].
The 'best two among three' policy is used to find the optimum clusters. It means that cluster size will be finalized with a number returned by more than one index. [49,60,61] suggested and used a single index, which leads to simplification of computation for the clustering process. Then, prediction procedure is executed labels series.
The prediction process with the PSF method consists of different steps including optimum window size selection, matching of pattern sequences, and estimations. The sequence of last W labels (Window), is searched within the label series. While doing so, if the window is not repeated at least once, in such cases, the sequence size is reduced by one unit. Again, this process continues until the window repeats itself in the sequence of labels. During window pattern searching in the labels, the labels seen very next to all matched sequence is stored and its mean is considered as the next predicted value.
Finally, de-normalization is used to replace the labels with an appropriate value in the dataset. The predicted value is attached to the original time-series and the entire procedure is repeated to get the next forecasted value. This allows the prediction of multiple future values. The PSF algorithm is operated until the desired outcomes are obtained.
The optimum window size (W) selection in a challenging task in the prediction process of PSF such that the prediction error can be kept minimum. The mathematical representation for the window size selections is the minimization of (3), ∑ t Ts where X(t) is the original time-series at time t andX(t) is the corresponding forecasted values. Practically, cross-validation is used to calculate the window size (W). The methodology of PSF method discussed in [48] is shown in Figure 1. An R package for the PSF method is explained in [61]. This R package 'PSF' [62] automatically calculate various parameters related to PSF method and forecasts the time-series.

Autoregressive Integrated Moving Average (ARIMA) Method
It method consists of differencing, auto-regression and the moving average model [63]. To fit an ARIMA model, it is necessary that the data is stationary. If not, it is made stationary with differencing technique.
ARIAM method is the combined form of differences, autoregression and moving average methods that is represented as ARI MA(p, d, q), where p is the order of the autoregression, d is the degree of differencing and q is the order of the moving average method. Generally, these parameters (p, d, q) are calculated with autocorrelation graphs, Akaike's information criterion (AIC) and Schwartz Bayesian information criterion (BIC) tests. The linear equation to state the ARIMA method for a time varying time-series Y t is shown in (4), θ and φ represents autoregression and moving average methods applied to a backshift operator B and mean is represented as µ.

Proposed Methods
As explained in the introduction, in this work two new methodologies have been developed to improve the results obtained by using the four methods described in Section 2. This section is devoted to explain in detail both proposed methodologies.

Hybrid EEMD-PSF Model
PSF is a useful method and proven successfully in time-series forecasting in various domains with satisfactory results [49,51,64]. Though the simple PSF method has many advantages over the conventional forecasting methods, it is tough task to forecast wind power or speed time-series accurately with it because of highly non-stationary and intermittent nature of such datasets. In order to overcome this inability of the PSF method, two new hybrid approaches are proposed in this paper. One of them consists of the hybridization of EEMD and PSF methods, denoted as Hybrid EEMD-PSF model. Whereas, the second one is denoted as hybrid EEMD-PSF-ARIMA model that is the hybridization of PSF, ARIMA and EEMD methods. This and next sections describe both proposed models in detail.
Most of the EMD based forecasting models follow a similar principle. Firstly, they decompose a time-series into sub-series and then forecast each sub-series with suitable methods. Secondly, the addition of all forecasted sub-series is noted as the final forecast result. A similar approach is used in the proposed hybrid model. All the sub-series (IMFs) are treated with the PSF method and the process of future value prediction is performed. The flowchart of the hybrid EEMD-PSF model is shown in Figure 2 and its corresponding procedure is explained as follows: • Step 1: Apply EEMD method to transform a time-series in to a set of sub-series (IMFs and a residue).

•
Step 2: Calculate the cluster size (K) and optimum window size (W) for the IMFs and residue.

•
Step 3: Use PSF method to forecast all sub-series (IMFs and residue).

•
Step 4: Add forecasted outcomes corresponding to all sub-series to achieve the ultimate forecasting results.

Hybrid EEMD-PSF-ARIMA Model
This is an extension of the EEMD-PSF model in association with the ARIMA method, which executes both PSF and ARIMA methods for prediction with the consideration of the time-series characteristics of respective IMFs and residue series. As discussed in upcoming sections, different IMFs exhibit distinct time-series characteristics (Refer Figures 5 and 7). Earlier few IMFs exhibit much higher frequencies and reflects the random and mainly noisy information present in wind power and wind speed time-series. Whereas, the middle range of IMFs are more periodic and looks with seasonal patterns as compared to earlier IMFs. Finally, the residue along with last few IMFs show trend components in the series. With the PSF method, it might be quite difficult to achieve accurate predictions for all types of IMFs with distinct types of time-series characteristics. There are many pieces of evidence [48,49,51,61] discussing the superior performance of PSF for stationary, seasonal and cyclic time-series, but it fails to achieve such accurate prediction in most of the cases with trendy and non-stationary time-series because of unavailability of pattern sequences in such trendy time-series. Whereas, methods such as ARIMA belonging to autoregression family achieve better prediction results for trendy and non-stationary time-series by introducing stationarity in such series.
In order to avail the advantages of autoregression based methods in the hybrid EEMD-PSF model, the non-stationary and trendy time-series are processed and predicted with the ARIMA method. This new model is named as hybrid EEMD-PSF-ARIMA model. The stationarity and trends characteristics of all IMFs and the residue are determined with the Kwiatkowski Phillips Schmidt Shin (KPSS) test [65]. This test uses a linear regression technique and breaks a time-series into three sections: (a) a deterministic trend, (b) a random walk, and (c) a stationary error. These sections are the deciding factors to understand the stationarity and trend nature of a time-series, statistically. This test finds out whether a time-series is stationary around a mean or a linear trend. The null hypothesis of KPSS test is that the data is trend stationary. This null hypothesis is rejected usually at 5% confidence level if the p-value associated with the test is lower than the significance level (p − value < 0.05). With the inclusion of the KPSS test and the ARIMA method, the proposed hybrid EEMD-PSF-ARIMA method is modified as shown in the flowchart in Figure 3. The corresponding steps of EEMD-PSF-ARIMA are as follows: 1.
Step 1: Apply EEMD method to transform a time-series in to a set of sub-series (IMFs and a residue). 2.
Step 2: Execute the KPSS test on all IMFs and the residue to differentiate them in stationary and non-stationary groups. 3.
Step 3: Apply the PSF method on stationary IMFs and the ARIMA method on non-stationary IMFs. 4.
Step 4: Add forecasted outcomes corresponding to all sub-series to achieve the ultimate forecasting results.

Case Study
In this section, two case studies are discussed to evaluate the performance of the proposed methods. In the first one, they are examined on wind power time-series whereas in the second one, a wind speed time-series is used. In both cases, the short-term prediction is performed in two ways:
multiple step ahead prediction with direct strategy (12 and 24 h).

Case Study 1 -Wind Power Data
The wind power data used in this case study has been collected from an online Government portal which shows average hourly and daily generation of wind power in the state of Maharashtra, India. For this study, the data are taken from 1 January 2016 to 30 April 2016 and averaged over 1 h. No missing values were observed in data within this duration. The data within the first three months (January -March) are used for training and the remaining data are used for validation purpose. The hourly behavior of the wind power of this dataset is from 1 January 2016 to 31 March 2016 illustrated in Figure 4. Furthermore, the statistical parameters including mean, median and standard deviation are mentioned in Table 1.  To evaluate the prediction performance of the proposed methods, three error performance measures are used, i.e., Root mean square error (RMSE), Mean absolute error (MAE) and Mean absolute percentage error (MAPE). These error measures are defined in (5)-(7).
where X i andX i original and the forecasted values, respectively. The RMSE and MAE describe the sample standard deviation and the average variance between the true value and the corresponding predicted values, respectively. Whereas, MAPE represents the sensitivity for minute change in the time-series. MAPE does not have any unit measure. Furthermore, the computation time is considered as one of the metrics for comparison of various prediction methods examined in the study.

Simulation
This section describes the proposed models (EEMD-PSF and EEMD-PSF-ARIMA) applied to the wind power time-series. Both of these models initiate with the decomposition of time-series into a finite number of IMFs and a residue with the EEMD technique. For the original series of mean hourly wind power data, 10 IMFs along with a residue are generated (Figure 5). In the EEMD-PSF model, all sub-series (IMFs and a residue) are forecasted using PSF methodology. First of all, suitable values of clusters and window size are calculated for all IMFs, shown in Table 2. With respect to these parameters, different PSF models are assigned for distinct IMFs and finally, the future values predicted for the desired duration. The aggregation of these predicted values corresponding to all IMFs series is treated as the prediction with EEMD-PSF model.  Conversely, the finite number of IMFs are differentiated into two clusters (stationary and non-stationary) in the EEMD-PSF-ARIMA model. The clustering of these IMFs is performed with the KPSS test, which follows the null hypothesis of time-series being stationary. The IMF series with p-values lower than 0.05 belonged to the non-stationary cluster and other IMFs were kept in the stationary one. All stationary signals without trendy characteristics are processed with the PSF method and trendy, non-stationary signals are processed with the ARIMA method. The corresponding optimum window and cluster size for the PSF method and p, d, q parameters for the ARIMA method for respective IMFs and residue are shown in Table 2. Finally, the accumulation of all predicted values is considered as a final prediction with the EEMD-PSF-ARIMA model.
In this study, for evaluation of the proposed methods, forecasted results are compared with PSF, ARIMA and their hybrid combination models (EMD-PSF, EMD-ARIMA, EEMD-ARIMA, EMD-PSF-ARIMA). Furthermore, the benchmarked LSSVM and EEMD-LSSVM models generally used for wind power and wind speed prediction are compared in the study.

Comparison and Discussion
In this subsection, a performance of both proposed models is compared with various models including distinct combinations of PSF, ARIMA, EMD, and EEMD. Further, a comparative analysis is performed with LSSVM and EEMD-LSSVM models. To prove the superiority of the proposed models, different forecast techniques and horizons are selected. Prediction performance is examined on two different techniques of predictions, i.e., a) iterated strategy and b) direct strategy of prediction [66]. In iterated strategy approach, the prediction model predicts a small horizon value and uses this value along with the input time-series to predict the following forecast. Whereas, in the direct strategy approach, a model forecasts using only its observations in a single iteration. In [67], authors explained the difference between these strategies in this way: "Iterated multi-period ahead time-series forecasts are made using a one-period ahead model, iterated forward for the desired number of periods, whereas direct forecasts are made using a horizon-specific estimated model." In the iterated strategy, two cases are considered for comparison, i.e., one and two step ahead forecast. In the one step ahead iterated approach, for a given time-series, a very next value is predicted and this value is further considered for the prediction of the next value. While in the two step ahead iterated approach, two future values are predicted and these values are used for the prediction of the next two values.
To evaluate the performance of the proposed models, eight forecasting models are tested: PSF, ARIMA, LSSVM, hybrid models (EMD-PSF, EMD-ARIMA, EMD-LSSVM, EMD-PSF-ARIMA). One and two step ahead (iterated approach) predictions are carried out on all the models and future values are predicted for the next 24 h. Table 3 shows the estimated errors in prediction in terms of RMSE, MAE, and MAPE measures. Similarly, a multiple-step ahead (direct strategy) forecast for the horizon of 12 and 24 h are shown in Table 4 with the same error measurements (RMSE, MAE, and MAPE). From Tables 3 and 4, the following conclusions can be extracted: Compared with all models studied in the paper, the proposed models (EEMD-PSF and EEMD-PSF-ARIMA) are showing lower error values (for all error measures) in all cases, whether it is an iterated strategy or a direct strategy of prediction. For example, RMSE, MAE and MAPE values for EEMD-PSF-ARIMA are 30.18, 25.06 and 6.34% for one step ahead forecasts and it is 117.84, 117.43 and 17.73% for two step ahead forecasts. These error values were found to be minimum in the comparison table (refer Table 3). The EEMD-PSF comes out to be a second best model for the same approach to prediction.
A similar performance can be seen in a case of multiple-step ahead (direct strategy) forecast for both horizons (12 and 24 h) Here also, EEMD-PSF performed comparable and ranked as second best in the comparison table (refer Table 4).
From Tables 3 and 4, it can be seen that both proposed models straightaway outperformed all other models compared in the study. Usually, error measurements are considered as a primary measure to evaluate a performance of a prediction model, but as computation time should be one of the important concerns while evaluating the same, Table 5 shows the computation time for all the models for 24 step ahead prediction (direct strategy). The computation time for the proposed models is a bit greater than the one for models without decomposition techniques and its EMD combination counterparts, too. For 24 h horizon prediction (a direct strategy), EEMD-PSF-ARIMA consumed 8.48 s and EEMD-PSF consumed 6.91 s. Whereas, EMD-PSF-ARIMA, EMD-PSF, PSF and ARIMA models consume 9.28, 7.10, 1.35 and 0.49 s, respectively, as shown in Table 5. In other words, it can be said that the proposed methods are forecasting accurately at the cost of consumption of slightly more computation delays. Another interesting thing can be observed from Table 5. While comparing computation time for the PSF model (1.35 s) with that of its hybrid EMD-PSF model, this time increased up to 7.10 s. For sure, this delay is introduced because of the decomposition process and individual prediction processes for each IMFs and the residue. For the wind power time-series, the average computation time for EMD and EEMD decomposition methods are found to be 0.01 and 1.10 s, respectively. Even though the EEMD method consumes more computation time, its hybrid models with PSF, ARIMA, and PSF-ARIMA are consuming significantly less computation time as compared to that of EMD hybrid models (refer Table 5). There can be various reasons for this fact, but the most suitable reason can be the similar scaled nature of IMFs in the EEMD method as compared to the IMFs of the EMD method showing mode mixing problem. Both PSF and ARIMA methods might have forecasted IMFs within specific frequency range more quickly than the IMFs with a combination of multiple frequencies.
There are a large number of evidences (discussed in Section 1) which state, with the hybridization of a time-series prediction algorithm with EMD or EEMD methods, the prediction accuracy is improved by many folds. Similar results are observed in this study. If the prediction results with PSF models are compared with hybrid EMD-PSF model, the error (in terms of RMSE) is reduced by 6.24% and with EEMD-PSF model, it decreases by 15.29% for 24 h horizon prediction with a direct strategy (refer Table 4). Similar behavior of error is observed for ARIMA and its hybrid models (Table 4).
Finally, Table 6 illustrates the validation of the proposed hybrid EEMD-PSF-ARIMA model with an ANalysis Of VAriance (ANOVA) test [68,69]. It is used to compare the means of two or more samples based on assumptions of normality from the F distribution. The evaluation of the null hypothesis, which samples in two or more groups and they are selected from same mean values populations can be done with ANOVA test. It compared the results from the models selected for comparison. This test provided confirmation that the prediction results had different statistical behavior and improvements in precision between methods were meaningful statistically. In all comparisons, the one sided p-values were significant at α = 0.05, which suggests that the selection of the methods was appropriate and the proposed model can improve the prediction in most cases.

Case Study 2-Wind Speed Data
In this Section, the proposed models are examined on wind speed time-series. The wind speed time-series in this study has been collected in Galicia, which is an autonomous community located in North-Western Spain. The mean wind speed values are sampled every 10 min at several measure stations of the Galician meteorological network. These time-series correspond to four consecutive months as shown in Figure 6. The data corresponding to the initial three months are used for training purpose and the performance of prediction models are validated on the last month data. This time-series shows statistical characteristics (including mean and standard deviation) as noted in Table 7.  In case 2, the same performance measures are adopted as that of Case study 1, which includes RMSE, MAE, and MAPE along with a respective comparison with ANOVA test. Furthermore, the computation time is considered as another parameter for comparison.

Simulation
A wind speed time-series is studied, while the case 1 was focused on wind power data. A similar comparison background is maintained in case 2 and a similar length of time-series horizon is maintained for testing and validation of the proposed methods with similar error performance metrics. Various prediction models were examined such as in case 1. Both proposed models (EEMD-PSF and EEMD-PSF-ARIMA) initiate with the decomposition of wind speed time-series in to a set of sub-series (IMFs and a residue). For the given wind speed time-series, 10 true IMFs and a residue are observed as illustrated in Figure 7. For the EEMD-PSF model, optimum window and cluster size parameters for all IMFs and the residue are estimated as shown in Table 8. Whereas in the EEMD-PSF-ARIMA model, with the implementation of KPSS tests, all IMFs and the residue are classified among stationary and non-stationary clusters. Table 8 shows the optimum window and cluster size for stationary and non-trendy IMFs and (p, d, q) parameters for non-stationary and trendy IMFs. With these parameters, respective prediction methods (either PSF or ARIMA) are applied and future values are predicted. The aggregation of these values is noted as prediction results with the EEMD-PSF-ARIMA model. Table 8. Prediction method selection for IMFs in EEMD-PSF and EEMD-PSF-ARIMA models for wind speed time-series. In case 2 (similar to case 1), all contemporary methods are compared with the proposed ones and the comparison results are discussed in next section.

Comparison and Discussion
This section presents a brief description of the proposed hybrid models (EEMD-PSF and EEMD-PSF-ARIMA) for wind speed time-series prediction, similar to the case of wind power prediction discussed in case 1. To examine the usability and stability of the proposed methods on wind speed data, the same analysis techniques and tests are operated in case 2.
In order to avoid redundancy and achieve simplicity, only the evaluations are given in this subsection. Tables 9 and 10 show the prediction results for both proposed models with various prediction environments considering iterated and direct prediction strategies, respectively. In the iterated strategy, one and two steps forecast methods are used for 24 h (24 × 6 = 144 instances, since time-series is sampled at 10 min interval) horizon prediction. While in the direct strategy, data for 12 and 24 h are predicted and error measures are compared in Table 10. Furthermore, the computation time comparison for all models under study is noted in Table 5. Finally, forecasted outcomes of the proposed model EEMD-PSF-ARIMA is validated with an ANOVA test with the corresponding statistical significance shown in Table 6. The observation in Tables 3, 4, 9 and 10 indicates that hybridization of PSF and EEMD methods has improved the prediction accuracy and further predicting stationary IMF series with PSF and remaining IMFs with ARIMA significantly improved the overall performance of the hybrid EEMD-PSF-ARIMA method in both direct and iterated approaches. Furthermore, it can be seen that the main results are similar to wind power data in case 1, such as:

1.
Prediction with the proposed models (EEMD-PSF and EEMD-PSF-ARIMA) is more accurate as compared to other methods.

2.
The hybridization with EMD and EEMD methods with PSF, ARIMA, and LSSVM methods lead to more accurate predictions as compared to their original forms.

3.
Similar to case 1, the trade-off between accuracy in prediction and computation time consumption is observed in case 2. For example, EEMD-ARIMA and EMD-ARIMA show more prediction accuracy at the cost of excess in computation delays.

4.
While discussing computation time, there are a few different things from case 1: (a) the performance of the PSF model is better than that of the EMD-PSF model in terms of prediction accuracy as well as computation time, and (b) the computation time for models hybridized with the EEMD method noted longer than the models hybridized with the EMD method. For example, the EEMD-PSF consumed 11.41 s, whereas the EMD-PSF completed the task in 9.75 s.

5.
In Table 6, the ANOVA test results are shown. The EEMD-PSF-ARIMA model prediction results show one-sided p-values significant at α = 0.05 and show statistical significance of the proposed comparison.

Conclusions
Accurate modeling of wind power/speed is an essential process for securing a reliable and clean energy production. An energy dispatch center uses real wind power and wind speed data to carry out unit commitment to the energy traders. Hence, the energy scheduling and economic dispatch plays an important role in such business models which are directly dependent on the accurate forecast of wind power and speed data. In this paper, two hybrid prediction methods named EEMD-PSF and EEMD-PSF-ARIMA are developed to enhance the forecasting accuracy in both wind power and speed time-series modeling. The EEMD-PSF model is the hybridization of PSF and EEMD methods; whereas, the EEMD-PSF-ARIMA model is the conditional hybridization version of EEMD, PSF and ARIMA methods. In both models, the EEMD method is used to decompose the original wind speed or power series into a finite number of sub-series and the forecasting of each sub-series was established by respective prediction methods. In the EEMD-PSF model, all sub-series were predicted with the PSF method. On the other hand; the EEMD-PSF-ARIMA model, the sub-series with high and low frequencies were predicted using PSF and ARIMA, respectively. The selection of PSF or ARIMA methods for the prediction process was dependent on the time-series characteristics of the decomposed series obtained with EEMD method. The KPSS test was used to characterize the sub-series into stationary and non-stationary subsets. In the EEMD-PSF-ARIMA model, stationary series were processed with the PSF method, while remaining series were processed with the ARIMA method.
Both proposed models were evaluated and assessed with iterated and direct strategy of prediction for the targeted forecasting value of the next 24 h. Different statistical indicators such as RMSE, MAE, MAPE and computation time were performed for the results evaluation. In all prediction environments, the EEMD-PSF-ARIMA model outperformed all the other methods; whereas, the EEMD-PSF secured the second best position. Simulations reveal that both methods have the following advantages (All quantitative comparisons presented here are on the basis of the RMSE metric.):

1.
In case of short-term wind power time-series prediction, both proposed methods have shown at least 18.03 and 14.78 percentage improvement in forecast accuracy as compared to contemporary methods considered in this study for direct and iterated strategies, respectively. Similarly, for wind speed data, those improvement observed to be 20.00 and 23.80 percentages, respectively.

2.
In all cases, EEMD-PSF-ARIMA has outperformed in terms of prediction accuracy improvements by at least 10.03 and 8.33 percentages in wind power and speed data, respectively. In wind power data, this achievement is attained at the cost of minute computation delay in the EEMD-PSF-ARIMA model better than EEMD-PSF model by merely few seconds. Conversely, in wind time-series, the EEMD-PSF-ARIMA model takes lesser computation delay as compared to the EEMD-PSF model. Hence, it can be stated that the forecasting accuracy benefits are much greater than the harm produced by the time delay. 3.
Furthermore, the hybridization of a prediction method with the EEMD method has improved the prediction accuracy significantly. For example, in wind power time-series, EEMD-PSF, EEMD-ARIMA, and EEMD-LSSVM models have shown 23.56, 29.34 and 6.76 percentage improvements in prediction accuracy, better than simple PSF, ARIMA, and LSSVM models, respectively.
The results' comparison indicates that ARIMA models succeed in predicting low-frequency components (IMFs) more accurately than PSF models, but the overall enhancement in the prediction accuracy was achieved with high-frequency components prediction using PSF models as can be observed by the comparison of EEMD-PSF and EEMD-PSF-ARIMA model prediction results with that of EEMD-ARIMA model. In addition, the relation between the prediction error and computation time for the models hybridized with EMD and EEMD are discussed, which shows improvement in forecasting results at the cost of computation delays. In future, the computational delay can be reduced by aggregating IMFs with similar characteristics and reducing the number of models than used in the present study. Finally, the significance of the proposed models are assured with an ANOVA test and the R package (decomposedPSF) for proposed models are discussed in Appendix A. time horizon to be predicted with respective functions. For sure, it is not necessary that the proposed models will be performing accurate for all other time-series as they are performing for wind power and wind speed data. Sometimes, it can be a necessity to check the performance of other hybrid methods, too. Hence, other hybrid methods are also included in the proposed R package. The detailed information about the package is described in detail in the vignette. Table A1. Functions for respective models in 'decomposedPSF', an R package.