A Novel Hybrid Strategy Using Three-Phase Feature Extraction and a Weighted Regularized Extreme Learning Machine for Multi-Step Ahead Wind Speed Prediction

With the growing penetration of wind power into electric grids, improving wind speed prediction accuracy has become particularly valuable for the exploitation of wind power. In this paper, a novel hybrid strategy based on a three-phase signal decomposition (TPSD) technique, feature extraction (FE) and weighted regularized extreme learning machine (WRELM) is developed for multi-step ahead wind speed prediction. The TPSD including seasonal separation algorithm (SSA), fast ensemble empirical mode decomposition (FEEMD) and variational mode decomposition (VMD) is proposed for the first time to handle the complex and irregular natures of wind speed comprehensively. The FE process is used to capture the useful features of wind speed fluctuations and determine the optimal inputs for a prediction model. The WRELM is employed as a basic predictor for building the prediction model by these selected features. Four real wind speed prediction cases are utilized to evaluate the proposed model, and experimental results verify the effectiveness of the proposed model compared with the benchmark models.


Introduction
In the past few decades, to reduce dependence on fossil fuels with their negative effects on the environment, attention has turned to clean renewable energy sources throughout the world [1]. As one kind of the rapidly growing renewable energy sources, wind energy has been recognized as an attractive alternative to conventional fossil fuels due to several advantages, including renewability and pollution-free environment [2]. However, wind power is recognized as a stochastic process [3] because of the intermittent and multi-scale characteristics of wind speed fluctuation [4,5]. With the increasing penetration of wind power in electric grids, this presents a number of challenges to power system operation, both technically and economically [6]. An accurate wind speed forecast is considered as one of the most efficient ways to mitigate these challenges. Improving the prediction accuracy of wind speed is beneficial for increasing the security of wind energy utilization and reducing the risk of power outages [7].
In recent years, many prediction methods have been proposed in the literature. Weron [8] presented a review of the state-of-the-art with a look into the future for forecasting methods. These models are generally classified into two groups: physical models and statistical methods [9]. presented a hybrid model based on ANN and Markov chain for wind speed prediction, concluded that the developed model was better than the single methods.
Through the previous review, it can be found that most of these models are usually constructed by using the original wind speed signal directly. However, because of the inherent complexity of wind speed fluctuations, it is a difficult task to predict wind speed by these above-mentioned methods. To improve the predictive ability of these models, it is very necessary to consider and analyze the complicated characteristics of wind speed fluctuations. In recent years, a large amount of hybrid methods named decomposition prediction aggregation (DPA) models, which combine signal decomposition techniques and existing prediction models, have been proposed for short-term wind speed forecasting. The common modeling process of these DPA models can be summarized as follows: (1) decomposing raw wind speed signal into several sub-signals using some signal decomposition algorithms mainly including wavelet transform (WT) and empirical mode decomposition (EMD); (2) building forecasting models for each sub-signal; (3) obtaining the final forecasting results by sum of the forecasting result for each sub-signal. For example, De Giorgi et al. [31] built a combined model based on WT and SVM for wind speed prediction. In this method, WT was used to conduct the decomposition with the complicated multi-patterns signal, and SVM was constructed for forecasting all the sub-signals. The result showed that the WT could enhance the prediction performance of the standard SVM model. Ren et al. [32] presented a combined method based on EMD and ARMA for wind speed prediction. In this method, EMD was utilized to implement the decomposition of the original wind speed signal, ARMA was used for sub-signals forecasting, and the final forecasting result was calculated by the sum of the forecasting result of each sub-signal. This study proved that EMD could enhance the forecasting ability significantly. Liu et al. [33] presented a combined EMD-ANN model for wind speed forecasting, and concluded that EMD could enhance the forecasting performance of the ANN model.
Although the DPA models based on signal decomposition algorithms can improve the prediction ability to some extent, there still exist several deficiencies for these models. For example, adopting a single signal decomposition algorithm is inadequate to deal with non-stationary and inherent complexity of wind speed, constructing prediction models for each sub-series needs substantial computational resources and wastes training time, and ignoring the seasonal variation of wind speed will reduce the prediction precision of models. Thus, there still exist some probabilities for enhancing the prediction ability of these models. In this paper, a novel hybrid strategy based on three-phase signal decomposition (TPSD) technique, feature extraction (FE) and weighted regularized extreme learning machine (WRELM) is developed for multi-step ahead wind speed prediction. Firstly, a TPSD framework including seasonal separation algorithm (SSA), fast ensemble empirical mode decomposition (FEEMD), and variational mode decomposition (VMD) is for the first time developed to handle the complex and irregular natures of wind speed comprehensively. In the first phase, the original wind speed signal can be separated into season and trend components by SSA. In the second phase, the trend component can be decomposed into a number of intrinsic mode functions (IMFs) and a residual with different frequencies. For reducing the non-stationarity of the high frequency signal, the high frequencies IMFs (especially IMF1) can be further decomposed into several stationary modes in the third phase. Secondly, a feature extraction (FE) process including partial autocorrelation function (PACF) and regression analysis is proposed to capture the useful features of wind speed fluctuations and determine the optimal input features for a prediction model. Then, an improved extreme learning machine (ELM) named weighted regularized extreme learning machine (WRELM) is established using these selected features, and the prediction results of wind speed can be calculated by WRELM. Finally, the proposed approach is tested using four real wind speed datasets collected from a real-world wind farm of China. The main novelties and contributions of this study can be summarized as follows: (1) Compared with the single-step ahead wind speed prediction, multi-step ahead wind speed prediction can provide more time for wind power scheduling and wind turbines maintenance.
However, due to the cumulative error influence on the prediction accuracy, it is still a challenge task for multi-step ahead prediction. This study develops a novel hybrid strategy using three-phase feature extraction technique and weighted regularized extreme learning machine for multi-step ahead wind speed prediction. (2) Different from the traditional DPA models which build prediction models for each subseries obtained by signal decomposition algorithms, in order to decrease the computation time and increase the prediction accuracy, this study proposes a novel prediction framework which only establishes a prediction model using these selected features from all different subseries. (3) In order to capture the useful features of wind speed signal and obtain the optimal input-output sample pairs, this study proposes a novel feature extraction framework including three signal decomposition processes of SSA, FEEMD and VMD. First, the SSA is employed to separate the season and trend components of wind speed signal, and capture the seasonal features of wind speed fluctuations. Second, the FEEMD is applied to decompose the trend component into lots of intrinsic mode functions (IMFs) and a residual with different frequencies. Considering the negative effect of high frequencies IMFs (especially IMF1) on the prediction accuracy, the VMD is utilized to further decompose the high frequency IMF1 into several stationary modes for reducing the non-stationarity of the high frequency signal. Finally, a feature selection process is used to capture the useful features of wind speed fluctuations and determine the optimal inputs of the prediction models. (4) In order to avoid the over-fitting limitation and reduce the influence of outliers, an improved ELM named WRELM is employed as a basic predictor for building the prediction model by using these selected features.
The rest of this paper is organized as follows: Section 2 gives a brief description of SSA, FEEMD, VMD, PACF and WRELM. Section 3 presents the frame work of the proposed model and the different error criteria. Section 4 shows the numerical results obtained from four real datasets. Finally, the conclusions and future researches are summarized in Section 5.

Related Methodology
This study develops a novel hybrid strategy based on three-phase feature extraction technique and weighted regularized extreme learning machine for multi-step ahead wind speed forecasting. Before presenting the hybrid approach, the key components of the proposed model are introduced as follows.

Seasonal Separation Algorithm (SSA)
The Seasonal separation algorithm (SSA) can implement the separation of both season and trend components from seasonal time series [34]. As a climate-driven renewable resource, the seasonal variations and trend variations of wind speed are two most commonly encountered phenomena. As the first step of the TPSD technique, this study firstly employs the SSA to implement the decomposition of raw wind speed signal for both season and trend components and capture the seasonal features of wind speed fluctuations. The concrete process of the algorithm can be described as follows [35].
Assuming that x t denotes the wind speed at time t, t ∈ {1, 2, . . . , T}, and S j and Tr t represent the seasonal and trend components, respectively. Then Then, the seasonal index S j can be obtained by Because the trend component Tr t is unknown, it is need to be approximate by the average of x i in each cycle.
Assuming that T = l × m, and m and l denote the number of cycles and the number of data items in each cycle, respectively. Then, the dataset {x 1 , x 2 , . . . , x T } can be expressed as x 11 , . . . , x 1j , . . . , x 1l , x 2l , . . . , x 2j , . . . , x 2l , . . . , x k1 , . . . , x kl , . . . , x m1 , . . . , x mj , . . . , x ml (k = 1, 2, . . . , m; j = 1, 2, . . . , l), where x kj represents the j-th datum of the k-th cycle. Then, the average of the k-th cycle can be derived as follows: If S kj denotes the normalization data for items x kj , then: Then, S j can be defined as follows: This definition of S j conforms to the normalization process and is demonstrated as follows: Then, the trend component can be obtained as follows: Considering the cycle influence of wind speed data, in this paper, l = 24 is as a cycle and m = [T/l].

Fast Ensemble Empirical Mode Decomposition (FEEMD)
As a novel signal processing technology, EMD has been frequently adopted for analyzing nonlinear and stochastic signals [32,33]. Compared with the traditional signal processing techniques such as wavelet transform and Fourier transform, the EMD has better performance in multi-resolution and extensive practicability. However, this method presents a serious drawback of mode mixing. An ensemble EMD named EEMD was developed by Wu and Huang in 2008 for tackling the mode mixing problem [36]. Although the EEMD can effectively alleviate the mode mixing problem, it is time consuming to obtain the ensemble means. A new improved version of EMD called fast ensemble empirical mode decomposition (FEEMD) is proposed for overcoming two main disadvantages including mode mixing and time consuming. The superiority of FEEMD has been demonstrated in many fields [37,38]. The main steps of this algorithm can be summarized as follows: Step 1: Initializing the ensemble number en and the replication times M.
Step 2: Obtaining the noise-added signal x tn (t) by adding the Gaussian white noise n tn (t) to the original signal x(t): x tn (t) = x(t) + n tn (t) (8) where tn (tn = 1, 2, L, . . . , M) denotes the number of trial times, and t is the time script.
Step 3: Using EMD to decompose the noise-added signal x tn (t) into a set of intrinsic mode functions (IMFs) and a residue: where c i,tn (t) and r tn (t) represent the i-th IMF and the residue of the tn-th trial at time t, respectively.
Step 4: Adding different white noises to the original signal and repeat step (2) to step (3) until tn = M.
Step 5: Calculating the final decomposition results by the following equations: where c i (t) represents the ensemble mean of the i-th IMF, and r(t) denotes the ensemble mean of the residue components.
Considering the real empirical data, three important parameters of FEEMD which includethe ensemble number en and the replication times M, are respectively set as 8 and 200 in this paper.

Variational Mode Decomposition (VMD)
As a new signal processing technique, variational mode decomposition (VMD) can decompose a complicated signal into a discrete number of modes with specific sparsity properties [39]. Let x(t) be the original signal at time t, y i denotes the i-th component of signal x(t) by VMD, and cp i represents the corresponding center pulsation of y i . Along with the decomposition process of VMD, a center pulsation cp i can be obtained and each mode y i can be compressed around it. To estimate the bandwidth of each mode y i , three main steps can be considered: (i) using Hilbert transform to obtain unilateral frequency spectrum by calculating each mode y i ; (ii) shifting the frequency spectrum of each mode to baseband by mixing an exponential tuned to the respective estimated center frequency; (iii) estimating the bandwidth of each mode y i by making use of the H 1 Gaussian smoothness of the demodulated signal. Therefore, the decomposition process of VMD can be converted into the following optimization problem [40]: where y i (t) denotes the i-th component of signal at time t, δ(t) and ⊗ represent the Dirac distribution and convolution operator, respectively. Considering the penalty terms and Lagrange multiplier, the above constrained problem can be converted to the unconstrained one which is easier to be calculated. The process can be described as follows: where α and λ denote the balancing parameter of the data-fidelity constraint and Lagrange multiplier, respectively.
The alternate direction method of multipliers (ADMM) can be used to update y i and cp i in two directions, and complete the analysis process of VMD. Therefore, the solutions of y i and cp i can be calculated as follows:ŷ where in denotes the number of iterations,x(cp),ŷ z (cp),λ(cp) andŷ in+1 i (cp) represent the Fourier transforms of x(t), y z (t), λ(t) and y in+1 i (t), respectively.

Partial Autocorrelation Function (PACF)
In time series analysis issues, PACF is widely applied to determine the correlation between the current values and the past values of a time variable. Inspired by this, in this paper, PACF is employed to find the correlation between the current values and the past values of wind speed variable, and determine the input number of these prediction models. The brief introduction of the PACF is described as follows [41,42].
If x(t) (t = 1, 2, K, T) is the wind speed at time t and γ(h) denotes the covariance at lag h, then we can get the estimation valueγ(h) of γ(h) as follows: where x is the average value of time series x(t), T is the data size, and L is the maximum lag. The choice of L depends on the length of the data. In general, L = T/4. If ρ(h) is denoted as autocorrelation function (ACF) at lag h, then we can get the estimation valuê where h = 1, 2, L, L.
To assess the significance of autocorrelation between lags, the confidence intervals have been widely adopted. In this study, the 95% confidence interval is employed to determine the optimal lags of wind speed for all models. The definition can be described as follows: where T is the data size, r + 0.95 and r − 0.95 denote the upper and lower critical values, respectively. Ifβ(h, h) ∈ r − 0.95 , r + 0.95 , then x(t − h) is one of the input variables. Otherwise, it is not.

Extreme Learning Machine (ELM)
As a special kind of ANN, the ELM has the advantages of fast learning speed, high forecasting accuracy and better generalization ability relative to traditional ANNs [43]. Huang et al. have demonstrated that ELM can improve the prediction performance than the other ANN and SVM [44]. It has been successfully applied for time series prediction [45,46]. A standard structure of ELM is demonstrated in Figure 1. For given a dataset with N samples , , where Xi is the input vector and Xi ∈ R n , Yi is the target output vector and Yi ∈ R m , set g(X) as the activation function of hidden layer with Q nodes, and the working principle of the standard ELM can be described as follows [43]: where bq is randomly selected as the bias of the q-th hidden node, Wq is randomly selected as the input weight vector between the q-th hidden node and the input nodes, and Vq is the output weight vector between the q-th hidden node and the output nodes. Equations (18) can be simplified as a linear system:  HV Y (19) where H is the output results of hidden layers and: V is the output weights matrix between hidden layer and output layer and Y is the target output results of output layer and The optimal least squares solution V can be obtained by minimizing the empirical risk: where is the optimal least squares estimates of the output weights matrix V. Therefore, the prediction results of ELM can be expressed as: ˆ= Y HV (21) , where X i is the input vector and X i ∈ R n , Y i is the target output vector and Y i ∈ R m , set g(X) as the activation function of hidden layer with Q nodes, and the working principle of the standard ELM can be described as follows [43]: where b q is randomly selected as the bias of the q-th hidden node, W q is randomly selected as the input weight vector between the q-th hidden node and the input nodes, and V q is the output weight vector between the q-th hidden node and the output nodes. Equations (18) can be simplified as a linear system: where H is the output results of hidden layers and: V is the output weights matrix between hidden layer and output layer and and Y is the target output results of output layer and Y = The optimal least squares solution V can be obtained by minimizing the empirical risk: whereV is the optimal least squares estimates of the output weights matrix V. Therefore, the prediction results of ELM can be expressed as:

Weighted Regularized Extreme Learning Machine (WRELM)
As mentioned above, because the ELM considers only the empirical risk minimization, it still suffers from over-fitting in the modeling process. On the other hand, the prediction performance of ELM is also affected by the outliers in train samples. In order to overcome these limitations of ELM, an improved ELM named WRELM based on the principles of both empirical risk minimization and structural risk minimization simultaneously, is employed to build the wind speed predictor in this study. The mathematical expression of WRELM model is shown as follows [47]: where e i is the error variable W i is the weighing factor, W = diag{W 1 , W 2 , · · · , W N } is the extended form of e 2 2 e = [e 1 , e 2 , · · · , e N ] T . C is a regularization parameter. This following formula can get the optimal solution V for WRELM: where I is a unit matrix.
The prediction results of WRELM can be obtained similar to ELM. The related research has demonstrated that the WRELM can improve the prediction performance than other versions of ELM. The detailed process of model derivation and parameter setting of the WRELM can be found in [47].

The Framework of the Proposed Model
From the upper review, it can be found that these signal decomposition algorithms are often employed to enhance the prediction ability of the proposed models. However, due to the insufficiency of single signal decomposition algorithms for dealing with complex wind speed signal and the low computational efficiency caused by modeling for each subseries obtained from signal decomposition process, there still exist some probabilities for improving the prediction accuracy of these models. In this paper, a novel hybrid strategy based on three-phase signal decomposition (TPSD) technique, feature extraction (FE) and weighted regularized extreme learning machine (WRELM) is developed for improving the multi-step ahead wind speed prediction. In this proposed model, Firstly, a TPSD framework including seasonal separation algorithm (SSA), fast ensemble empirical mode decomposition (FEEMD), and variational mode decomposition (VMD) is utilized to handle the complex and irregular natures of wind speed comprehensively. Then, a feature extraction (FE) process including partial autocorrelation function (PACF) and regression analysis is proposed to capture the useful features of wind speed fluctuations and determine the optimal input features for a prediction model. Finally, an improved extreme learning machine (ELM) named weighted regularized extreme learning machine (WRELM) is established using these selected features to improve the forecasting accuracy and computational efficiency. Figure 2 shows the framework of the developed hybrid model, and the modeling process of the proposed approach can be briefly summarized as follows: (1) Develop a TPSD framework to handle the complex and irregular natures of wind speed signal comprehensively. In the first phase, the SSA is employed to separate the season and trend components of wind speed signal, and capture the seasonal features of wind speed fluctuations.
In the second phase, the FEEMD is applied to decompose the trend component into lots of intrinsic mode functions (IMFs) and a residual with different frequencies. Considering the negative effect of high frequencies IMFs (especially IMF1) on the prediction accuracy, the VMD is utilized to further decompose the high frequency IMF1 into several stationary modes for reducing the non-stationarity of the high frequency signal in the third phase. The full dimensions features of wind speed signal can be obtained by TPSD. (2) Propose a feature extraction process to capture the useful features of wind speed fluctuations and determine the optimal features for a prediction model. The PACF is first applied to find the (1) Develop a TPSD framework to handle the complex and irregular natures of wind speed signal comprehensively. In the first phase, the SSA is employed to separate the season and trend components of wind speed signal, and capture the seasonal features of wind speed fluctuations.
In the second phase, the FEEMD is applied to decompose the trend component into lots of intrinsic mode functions (IMFs) and a residual with different frequencies. Considering the negative effect of high frequencies IMFs (especially IMF1) on the prediction accuracy, the VMD is utilized to further decompose the high frequency IMF1 into several stationary modes for reducing the non-stationarity of the high frequency signal in the third phase. The full dimensions features of wind speed signal can be obtained by TPSD.
(2) Propose a feature extraction process to capture the useful features of wind speed fluctuations and determine the optimal features for a prediction model. The PACF is first applied to find the correlation between the current values and the past values of wind speed variable, and determine the initial features for the prediction model. In order to avoid over-fitting, a linear regression is further applied to select the optimal features for the prediction models. In the modeling process of linear regression, the top 80% of training data is called as the learning set which is applied to calculate the parameters of the model, and the remaining 20% of training data is called validation set which is applied to estimate the performance of the model. If one kind of feature combinations can generate the smallest validation error, then the corresponding feature combination is selected as the optimal input features subset for the prediction model. (3) Use these optimal features to build a WRELM prediction model. Different from the traditional signal decomposition-based prediction models which build a prediction model for each sub-series decomposed from original signal by signal decomposition algorithm, this study only constructs a prediction model using these selected optimal features for saving computation time and improving the prediction accuracy.

Evaluation Criteria
In this study, three error criteria which measure the deviation between the real and forecasting values are utilized to quantitatively evaluate prediction performance of all involved forecasting models. Three error criteria have the three measures including the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE). In general, the smaller values of these measures indicate that the corresponding model has better prediction performance. These error measures are given as follows: where fn represents the number of forecasting samples, x(t) andx(t) denote the real value and prediction value of wind speed signal at time t, respectively.

Data Collection
Gansu Province in China has abundant wind energy resources due to its particular favorable terrain and the influence of atmospheric circulation. In this study, the mean hourly wind speed data with 24 observation values every day collected from a real wind farm located in Gansu Province was utilized to evaluate the prediction performance of the proposed model. In order to further verify the seasonal adaptability of the proposed model, four prediction cases were randomly selected:  Figure 3 shows the raw wind speed signals in the four cases. As shown in Figure 3, the wind speed has obvious random fluctuations and multi-pattern characteristics. Table 1 describes the statistical analysis results of these wind speed signals. From Table 1, it can be shown that the four datasets have different statistical measures. The basic idea is used to test if the proposed model can be applied for different seasonal prediction cases. Moreover, each case recorded 744 observation values that can be partitioned into both the training set (80%) and the validation set (20%) in the modeling process. The training set is used for building the prediction model, and the validation set is used to test the forecasting performance of the proposed model.
Energies 2018, 11, x FOR PEER REVIEW 12 of 33 modeling process. The training set is used for building the prediction model, and the validation set is used to test the forecasting performance of the proposed model.

Three-Phase Signal Decomposition of Wind Speed Signal
Due to the influence of the complex climate system on wind speed, wind speed fluctuations have complicated multi-pattern characteristics. It is essential to consider and analyze the complicated multi-pattern characteristics of wind speed fluctuations to improve the prediction ability of the models, but most of the existing studies either ignore the complex multi-pattern characteristics or adopt a single signal decomposition algorithm for capturing the complicated wind speed fluctuations. It is a difficult task to further improve the wind speed prediction accuracy using these mentioned methods. Many studies have shown that a best decomposition algorithm for all situations does not exist. In this paper, we make full use of the latest theoretical achievements of signal decomposition algorithms, and develop a novel combination framework including SSA, FEEMD and VMD for dealing comprehensively with the complicated characteristics of wind speed fluctuations. This algorithm first uses SSA to separate the season and trend components of wind speed signal, and capture the seasonal features of wind speed fluctuations. Then, the FEEMD is applied to decompose the trend component into lots of IMFs and a residual with different frequencies. Finally, considering the negative effect of high frequencies IMFs (especially IMF1) on the prediction accuracy, the VMD is utilized to further decompose the high frequency IMF1 into several stationary modes for reducing the non-stationarity of the high frequency signal. Figure 4 shows the SSA results of four original wind speed signals. From Figure 4, it can be shown that each wind speed dataset is separated into the seasonal variations and trend variations.

Three-Phase Signal Decomposition of Wind Speed Signal
Due to the influence of the complex climate system on wind speed, wind speed fluctuations have complicated multi-pattern characteristics. It is essential to consider and analyze the complicated multi-pattern characteristics of wind speed fluctuations to improve the prediction ability of the models, but most of the existing studies either ignore the complex multi-pattern characteristics or adopt a single signal decomposition algorithm for capturing the complicated wind speed fluctuations. It is a difficult task to further improve the wind speed prediction accuracy using these mentioned methods. Many studies have shown that a best decomposition algorithm for all situations does not exist. In this paper, we make full use of the latest theoretical achievements of signal decomposition algorithms, and develop a novel combination framework including SSA, FEEMD and VMD for dealing comprehensively with the complicated characteristics of wind speed fluctuations. This algorithm first uses SSA to separate the season and trend components of wind speed signal, and capture the seasonal features of wind speed fluctuations. Then, the FEEMD is applied to decompose the trend component into lots of IMFs and a residual with different frequencies. Finally, considering the negative effect of high frequencies IMFs (especially IMF1) on the prediction accuracy, the VMD is utilized to further decompose the high frequency IMF1 into several stationary modes for reducing the non-stationarity of the high frequency signal. Figure 4 shows the SSA results of four original wind speed signals. From Figure 4, it can be shown that each wind speed dataset is separated into the seasonal variations and trend variations. The seasonal features of each case are shown in Table 2, where the seasonal features of the four cases have the different characteristics because of the influence of the climate system on wind speed. The seasonal features of each case are shown in Table 2, where the seasonal features of the four cases have the different characteristics because of the influence of the climate system on wind speed.   Figure 5 shows the decomposed results of trend components using FEEMD in the four cases. From Figure 5, it can be shown that the different trend components in four cases are decomposed into a number of IMFs and a residual with different frequencies. The frequencies of IMFs reflected different natural oscillatory modes are ranged from high to low. The residual is the lowest frequency and represents the basic trend of signal. In this paper, each trend component is decomposed into totally 9 components which are respectively named as IMF1, IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF8 and residual. In addition, to further improve the forecasting performance of the model, the VMD is further employed to decompose the high frequency IMF1 into several stationary modes for   Figure 5 shows the decomposed results of trend components using FEEMD in the four cases. From Figure 5, it can be shown that the different trend components in four cases are decomposed into a number of IMFs and a residual with different frequencies. The frequencies of IMFs reflected different natural oscillatory modes are ranged from high to low. The residual is the lowest frequency and represents the basic trend of signal. In this paper, each trend component is decomposed into totally 9 components which are respectively named as IMF1, IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF8 and residual. In addition, to further improve the forecasting performance of the model, the VMD is further employed to decompose the high frequency IMF1 into several stationary modes for reducing the non-stationarity of the high frequency signal. Figure 6 shows the VMD results of the high frequency IMF1 in four cases. From Figure 6, it can be shown that each IMF1 of four cases is decomposed in total into three components which are respectively named as Mode1, Mode2, and Mode3. The complicated multi-patterns features of wind speed change have been decomposed thoroughly by the above three-phase signal decomposition technique. In next section, a feature selection process is employed to capture the useful features of wind speed fluctuations and determine the optimal features for a prediction model. reducing the non-stationarity of the high frequency signal. Figure 6 shows the VMD results of the high frequency IMF1 in four cases. From Figure 6, it can be shown that each IMF1 of four cases is decomposed in total into three components which are respectively named as Mode1, Mode2, and Mode3. The complicated multi-patterns features of wind speed change have been decomposed thoroughly by the above three-phase signal decomposition technique. In next section, a feature selection process is employed to capture the useful features of wind speed fluctuations and determine the optimal features for a prediction model.

The Initial Feature Selection Process of Training Samples Using PACF
The input nodes number of the model has to be determined from data signal before training the prediction model. In this study, the PACF is first utilized to find the correlation between the current values and the past values of wind speed variable, and determine the input nodes number of the model. Figure 7 shows the plot of PACF against the lag length in the four trend components, respectively. As shown in Figure 7, the PACF graph shows different cutoff phenomena in the four cases. In spring and winter, the PACF shows a cutoff phenomenon after lag 2. In summer and fall, the PACF shows a cutoff phenomenon after lag 3. Table 3 shows the input nodes number of the predictor. According to the PACF results of four cases, the training samples of different cases can be constructed from all the subseries and the corresponding trend component. For instance, the trend component in Spring is decomposed into totally 11 components by FEEMD and VMD which are respectively named as IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF8, residual, Mode1, Mode2, and Mode3, and the input nodes number is identified as 2 by PACF, then there are the 22 initial features of the training samples and can be shown in Table 4. Similarly, the initial features of the training samples in other cases can also be shown in Table 4. The input nodes number of the model has to be determined from data signal before training the prediction model. In this study, the PACF is first utilized to find the correlation between the current values and the past values of wind speed variable, and determine the input nodes number of the model. Figure 7 shows the plot of PACF against the lag length in the four trend components, respectively. As shown in Figure 7, the PACF graph shows different cutoff phenomena in the four cases. In spring and winter, the PACF shows a cutoff phenomenon after lag 2. In summer and fall, the PACF shows a cutoff phenomenon after lag 3. Table 3 shows the input nodes number of the predictor. According to the PACF results of four cases, the training samples of different cases can be constructed from all the subseries and the corresponding trend component. For instance, the trend component in Spring is decomposed into totally 11 components by FEEMD and VMD which are respectively named as IMF2, IMF3, IMF4, IMF5, IMF6, IMF7, IMF8, residual, Mode1, Mode2, and Mode3, and the input nodes number is identified as 2 by PACF, then there are the 22 initial features of the training samples and can be shown in Table 4. Similarly, the initial features of the training samples in other cases can also be shown in Table 4.

Cases Initial Features
Spring

The Optimal Feature Selection Process of Training Samples Using Regression Analysis
In order to select the optimal feature combination and avoid over-fitting, a linear regression is further applied to select the optimal features of training samples in four cases. In the modeling process of linear regression, the top 80% of training data is called as the learning set which is applied to calculate the parameters of the model, and the remaining 20% of training data is called validation set which is applied to estimate the performance of the model. A simple feature selection process which adds the more recent feature to the less recent feature one by one is adopted in this study. If one kind of feature combinations can generate the smallest validation error, then the corresponding feature combination is selected as the optimal features subset of training samples.
Here, RMSE is selected as the validation error, and Figure 8 shows the validation error against the feature number in the four cases. As is shown in Figure 8, the best feature numbers are 14, 19, 17, and 16 in the four cases, respectively. The optimal feature combinations of training samples in the four cases are summarized in Table 5.

The Optimal Feature Selection Process of Training Samples Using Regression Analysis
In order to select the optimal feature combination and avoid over-fitting, a linear regression is further applied to select the optimal features of training samples in four cases. In the modeling process of linear regression, the top 80% of training data is called as the learning set which is applied to calculate the parameters of the model, and the remaining 20% of training data is called validation set which is applied to estimate the performance of the model. A simple feature selection process which adds the more recent feature to the less recent feature one by one is adopted in this study. If one kind of feature combinations can generate the smallest validation error, then the corresponding feature combination is selected as the optimal features subset of training samples.
Here, RMSE is selected as the validation error, and Figure 8 shows the validation error against the feature number in the four cases. As is shown in Figure 8, the best feature numbers are 14, 19, 17, and 16 in the four cases, respectively. The optimal feature combinations of training samples in the four cases are summarized in Table 5.

The Prediction Results of Original Wind Speed Signal
As a novel machine learning algorithm, ELM has the advantages of fast learning speed, high forecasting accuracy and better generalization ability relative to traditional single-hidden layer feed-forward neural networks, and has been successfully applied in the field of time series prediction. However, because the ELM considers only the empirical risk minimization, it still suffers from over-fitting in the modeling process. On the other hand, the prediction performance of ELM is also affected by the outliers in train samples. In order to overcome these limitations of ELM, an improved ELM named WRELM based on the principles of both empirical risk minimization and structural risk minimization simultaneously, is employed to build the wind speed predictor in this study. Different from the traditional signal decomposition-based prediction models which build a prediction model for each sub-series decomposed from original signal by signal decomposition algorithm, this study only constructs a prediction model using these selected optimal features for improving the prediction accuracy. Figure 9 shows the multi-step ahead prediction results of WRELM model for trend components in four cases. From Figure 9, it can be seen that the WRELM model can capture the complicated features of trend component fluctuations from one-step ahead forecasting to three-step ahead forecasting. Finally, the prediction results of original wind speed can be calculated by aggregating the seasonal features to the prediction values of trend component. Figure 10 shows the multi-step ahead prediction results of original wind speed in four cases. Similarly, from Figure 10, it can be also shown that the prediction values of each original wind speed in four cases can capture the main trend of each corresponding original wind speed fluctuations.

The Prediction Results of Original Wind Speed Signal
As a novel machine learning algorithm, ELM has the advantages of fast learning speed, high forecasting accuracy and better generalization ability relative to traditional single-hidden layer feed-forward neural networks, and has been successfully applied in the field of time series prediction. However, because the ELM considers only the empirical risk minimization, it still suffers from over-fitting in the modeling process. On the other hand, the prediction performance of ELM is also affected by the outliers in train samples. In order to overcome these limitations of ELM, an improved ELM named WRELM based on the principles of both empirical risk minimization and structural risk minimization simultaneously, is employed to build the wind speed predictor in this study. Different from the traditional signal decomposition-based prediction models which build a prediction model for each sub-series decomposed from original signal by signal decomposition algorithm, this study only constructs a prediction model using these selected optimal features for improving the prediction accuracy. Figure 9 shows the multi-step ahead prediction results of WRELM model for trend components in four cases. From Figure 9, it can be seen that the WRELM model can capture the complicated features of trend component fluctuations from one-step ahead forecasting to three-step ahead forecasting. Finally, the prediction results of original wind speed can be calculated by aggregating the seasonal features to the prediction values of trend component. Figure 10 shows the multi-step ahead prediction results of original wind speed in four cases. Similarly, from Figure 10, it can be also shown that the prediction values of each original wind speed in four cases can capture the main trend of each corresponding original wind speed fluctuations.

Model Comparisons
In order to comprehensively evaluate the effectiveness of the proposed feature extraction method (FEM)-based prediction approach called FEM-SFVW model, a detailed comparative study is conducted for multi-step ahead wind speed forecasting in this section. Three kinds of models  Table 6 shows the comparison of multi-step ahead prediction performances of different models in spring. The smallest value of each column is marked as boldface in Table 6. As shown in Table 6, compared with these all benchmark models, the proposed model has the smallest error criteria in horizons of one-step, two-step and three-step ahead prediction. Tables 7-9 show the comparisons of multi-step ahead prediction performances of different models in other cases. A similar conclusion is deduced in Tables 7-9. To present the comparison more intuitively,  show the histograms of four cases based on the values of MAE, RMSE and MAPE of different models.

Model Comparisons
In order to comprehensively evaluate the effectiveness of the proposed feature extraction method (FEM)-based prediction approach called FEM-SFVW model, a detailed comparative study is conducted for multi-step ahead wind speed forecasting in this section. Three kinds of models including the single models ( Table 6 shows the comparison of multi-step ahead prediction performances of different models in spring. The smallest value of each column is marked as boldface in Table 6. As shown in Table 6, compared with these all benchmark models, the proposed model has the smallest error criteria in horizons of one-step, two-step and three-step ahead prediction. Tables 7-9 show the comparisons of multi-step ahead prediction performances of different models in other cases. A similar conclusion is deduced in Tables 7-9. To present the comparison more intuitively, Figures 11-14 show the histograms of four cases based on the values of MAE, RMSE and MAPE of different models.             From Figures 11-14, it can be shown that the proposed model has the smallest error criteria compared with other benchmark models. In a word, it is concluded that the proposed model can improve the prediction performance of multi-step ahead wind speed and is superior to all the benchmark models.
To further assess the influence of TPSD technique, FE process and WRELM on the proposed hybrid model, five experiments are designed as follows. Experiment I is designed for proving the advantages of WRELM, and three comparisons are conducted including ELM vs. BP, WRELM vs. BP, and WRELM vs. ELM. The comparison results of the experiment I in four cases are shown in Table 10.
From Table 10, it can be shown that the WRELM model has the smallest multi-step ahead prediction errors compared with BP and ELM. Therefore, we adopt WRELM as the basic predictor for wind speed forecasting in this study. Experiment II is designed to assess the influence of the FE process on the proposed hybrid model, and three comparisons are conducted including FEM-SFVB vs. DPA-SFVB, FEM-SFVE vs. DPA-SFVE, and FEM-SFVW vs. DPA-SFVW. The comparison results of the experiment II in four cases are also shown in Table 10, where it can be seen that the FE process can improve the prediction performance of multi-step ahead wind speed forecasting.
In order to assess the influence of the different signal decomposition algorithms on the proposed hybrid model, three experiments including experiment III, experiment IV and experiment V are designed. Experiment III is designed to evaluate that if the three-phase signal decomposition technique is better than two-phase signal decomposition technique, and nine comparisons are conducted including FEM-SFVB vs.  Table 11 shows the comparison results of the three experiments in spring over different horizons including one-step, two-step and three-step ahead wind speed prediction. From Table 11, it can be shown that the three-phase signal decomposition technique is better than two-phase signal decomposition technique, the two-phase signal decomposition technique is better than single signal decomposition technique, and the single signal decomposition technique is better than the no decomposition process. In a word, the multi-phase signal decomposition algorithms can effectively decrease the three prediction errors including MAE, RMSE and MAPE compared with the single models (BP, ELM and WRELM) in different prediction horizons. Similarly, Tables 12-14 show the comparison results of the three experiments in other three cases over different horizons including one-step, two-step and three-step ahead wind speed prediction. A similar conclusion is deduced in Tables 12-14.

Conclusions
Accurate wind speed prediction is beneficial for the exploitation and utilization of wind power. This study develops a novel hybrid strategy based on three-phase signal decomposition (TPSD) technique, feature extraction (FE) and weighted regularized extreme learning machine (WRELM) for multi-step ahead wind speed prediction. Firstly, a TPSD framework including seasonal separation algorithm (SSA), fast ensemble empirical mode decomposition (FEEMD), and variational mode decomposition (VMD) is developed for the first time to comprehensively handle the complex and irregular nature of wind speed. In the first phase, the original wind speed signal can be separated into season and trend components by SSA. In the second phase, the trend component can be decomposed into a number of intrinsic mode functions (IMFs) and a residual with different frequencies.
For reducing the non-stationarity of the high frequency signal, the highest frequency IMF1 can be further decomposed into several stationary modes in the third phase. Secondly, a feature extraction (FE) process including partial autocorrelation function (PACF) and regression analysis is proposed to capture the useful features of wind speed fluctuations and determine the optimal input features for a prediction model. Then, an improved extreme learning machine (ELM) named weighted regularized extreme learning machine (WRELM) is established using these selected features, and the prediction results of wind speed can be calculated by WRELM. Finally, four real wind speed prediction cases are used to evaluate the proposed model, experimental results show that: (1) both the TPSD technique and feature extraction can improve the prediction performance for wind speed; (2) the novel prediction framework which only establishes a prediction model using these selected features from all different subseries can increase the prediction accuracy; (3) the proposed model has the best prediction performance compared with the benchmark models.
Author Contributions: Jujie Wang and Yanfeng Wang conceived and designed the experiments; Yaning Li performed the experiments; Jujie Wang analyzed the data; Yaning Li contributed reagents/materials/analysis tools; Yanfeng Wang wrote the paper.

Conflicts of Interest:
The authors declare no conflict of interest.