Short-Term Wind Speed Forecasting Based on Low Redundancy Feature Selection

: Wind speed forecasting is an indispensable part of wind energy assessment and power system scheduling. In the modeling of wind speed forecasting, there are problems of insufﬁciency of the high input feature dimension, weak pertinence of the model and a lack of consideration about the redundancy between features. To address these problems, a short-term wind speed forecast method based on low redundancy feature selection is proposed. Firstly, complementary ensemble empirical mode decomposition (CEEMD) is used to pretreat the wind speed data to reduce the randomness and ﬂuctuation of wind speed data. Secondly, conditional mutual information (CMI) is used to analyze the correlation between the input features on different predicted days and wind speed series. The feature order based on conditional mutual information is used to reduce the redundancy between candidate features and establish subsets with candidate features. After that, according to different candidate feature subsets of different predicted days, the outlier-robust extreme learning machine (ORELM) is used to carry out the forward feature selection and obtain optimal feature subsets for different predicted days. Finally, the optimal prediction model is constructed by using the optimal feature subset and the short-term wind speed forecasting is carried out. The validity and advance of the new method are veriﬁed by measured data through comparison experiments.


Introduction
Wind energy is one of the renewable energy sources that could replace fossil fuels.It has grown rapidly in the past decades.By the end of 2015, the cumulative installed capacity of wind energy around the world reached 43.29 GW [1,2].Large-scale wind energy integration into power grid has brought operational problems because of the randomness and volatility of wind power generation [3,4].High precision wind power prediction is one of the solutions for optimizing the power reserves, which is used to balance the fluctuations of wind power.With the assistance of accurate wind power forecasting, the stability of power system operation and the adopt capacity for wind power could be improved [5,6].
Wind speed forecasting is one of the basic components of wind power forecasting.The wind speed forecasting for the next 30 min to 6 h can be classified as short-term wind speed forecasting [7], which can meet the need of power producer on managing the grid operations, reducing the negative impact of wind power volatility in power grid operation [8].The existing wind speed forecasting model construction can be generally divided into three steps: data preprocessing, feature selection and optimal model construction.
The collected original wind speed data has strong randomness and volatility, the outlier data contained can influence the training effect of the prediction model.The existing research uses signal processing to preprocess original wind speed data, which can reduce the effect of outlier wind speed data on the prediction model [9].At present, among the signal processing method applied to wind speed data preprocessing.Empirical Mode Decomposition (EMD) has remarkable self-adaptability and suitable for processing nonlinear data, but it is prone to modal-mixing issues [10].To solve the problem of modal-mixing, Ensemble Empirical Mode Decomposition (EEMD) introduces white noise signals into the original signals, but the decomposition result is contaminated by noise components [11].Complementary Ensemble Empirical Mode Decomposition (CEEMD) is an improvement based on EEMD, which can address the defects of noise components by counteracting the noise components in the decomposition result with two groups of noise signals which have opposite phase [12].Therefore, CEEMD can effectively reduce the randomness and volatility of wind speed data under the premise that the wind speed signal is not contaminated by noise components.
Feature selection is an effective approach to reduce the feature dimensions of wind speed forecasting models that can directly improve the prediction accuracy.By introducing as many meteorological factors as possible, the prediction model can reflect the effect of complex external conditions on wind speed to a certain extent.However, too many input features also greatly complicated the model.High redundancy between input features reduced the prediction accuracy and efficiency of the prediction model.To reduce the prediction complexity, the existing researches choose the optimal feature subset through feature selection process [13].Among the existing feature selection methods, filter methods use some predefined evaluation criteria to evaluate the importance of features for prediction or classification.On this basis, forward feature selection or backward feature selection is carried out through a special feature order obtained by correlation analyzing methods.Filter methods have the characteristics of fast speed and small calculation, which offers strong advantages in engineering applications [14].
The relevance and redundancy between features can directly affect the result of filter feature selection methods.Among the correlation analyzing methods, Mutual Information (MI) can analyze the correlation between feature subset and prediction target, but the result of MI lacks consideration of the redundancy features inside the feature subset [15].When analyzing the relevance between a feature and the predicted object, Conditional Mutual Information (CMI) also considered the redundancy between the feature and other selected features, so CMI can maintain low redundancy among the feature subset under the premise of ensuring the feature subset is strongly relevant to the predicted object [16].With the assistance of the feature order obtained by CMI, filter feature selection can effectively improve the prediction accuracy.Meanwhile, existing studies usually analyze the relationship between wind speed and related features based on annual historical data, not considering the correlation between wind speed and complex meteorological factors in different periods of a year, which can't fully meet the needs of the wind speed forecasting in the different periods [17].
The methods of wind speed predictor construction can be divided into physical methods, statistical methods, intelligent methods and hybrid methods [18].Statistical methods only need historical data to establish the mapping relationship between input features and time series of wind speed, and then carry out the prediction through this mapping relationship [19].Representative statistical methods include the Kalman filter [20] and autoregressive moving average (ARMA) methods [21].The intelligent methods can dig into the potential relationship between the input feature and the time series of wind speed through historical data, which is more suitable to deal with complicated relationships than the traditional statistical methods [22].The intelligent methods include artificial neural networks [23] and Recurrent Neural Networks (RNNs) [24].The artificial neural network methods [23][24][25] have advantage in constructing nonlinear prediction models, which have excellent generalization performance and are suitable for very short-term and short-term wind speed forecasting [26].
Among the artificial neural network methods, Extreme Learning Machine (ELM) has the advantages of extremely fast learning speed and generalization performance compared with traditional neural network, but it easily runs into partial optimization problems [27].Outlier-Robust Extreme Learning Machine (ORELM) improves the ELM generalization ability by introducing standard parameters, and is more suitable for forecasting wind speeds which have characteristics of high randomness [28].
According to the deficiency of existing methods, a multi-step prediction method based on low redundancy feature selection is proposed.Firstly, the CEEMD method was used to pretreat the training set wind speed data.Then, a low redundancy forward feature selection was conducted based on ORELM and the order of feature importance gained by CMI.Finally, the optimal short-term wind speed forecasting model is constructed with the optimal feature subset to predict the specific period wind speed.The feasibility and effectiveness of the new method are proved through the measurement data of the American wind energy technology center.

Structure and Methodology of the New Hybrid Model
The data of training set was pretreated by CEEMD to reduce the effect of outlier data on prediction model.CMI is used to reduce the redundancy of feature selection results.Construct ORELM predictor with low redundancy optimal feature subset to improve the generalization ability of the predictor.

Complementary Ensemble Empirical Mode Decomposition (CEEMD)
CEEMD is an improved method on the basis of EMD and EEMD, the specific iteration of EMD is generated as follows [29]: (d) Repeat the above steps with h 1 until the number of extrema and zero crossings is equal or differ at most by one, and the mean value of e max and e min is zero.The remaining signal is the first Intrinsic Mode Functions (IMF).(e) Remove IMF 1 from the original S and repeat the iterations above until the signal cannot be decomposed, the remaining signal is the remainder function.
With the assistance of noise signals, EEMD makes up for the defects of EMD's prone to model-mixing by using the characteristics of the uniform distribution of the noise spectrum [11].However, the residue noise during the signal reconstruction is difficult to be tolerated, which affects the efficiency of EEMD decomposition.The fallowing improvement has been carried out based on EEMD: (c) Repeat the above steps with the data which removed the IMFs until the signal cannot be decomposed.The remainder function r n (t) is the remainder of the signal, and the final decomposition result of CEEMD is: CEEMD solves the defect of EMD being prone to mode-mixing, and eliminates the effect of white noise induced by EEMD by frequency domain complementation [30].

Conditional Mutual Information (CMI)
The CMI method can calculate the correlation between the target feature and the predicted target with condition of the lowest redundancy between the target feature and the selected features.The MI method uses the probability density function to define the correlation between variables X and Y, the formula is as follows [31]: where P(x) and P(y) represent the marginal probability distribution functions of sample X and Y respectively.P(x, y) represents the joint probability density function for sample x and sample y.
The larger the MI value of the feature is, the higher the correlation between it and the predicted target [16].
In the condition of a given discrete random variable Z, the CMI between X and Y can be expressed as I(X; Y|Z).In the process of wind speed forecasting, assuming that the original feature set is V, the given condition is the selected feature set V j , CMI between the target variable C and the selected feature V i is: where I(C; V i ) represents MI between the target variable C and the selected feature V i (the correlation between features), I(C; V i ; V j ) represents that the information overlaps between feature V i and feature V j meanwhile as the target variable (the redundancy between features).
In conclusion, based on the MI between features and target variables, the CMI method reduces the redundancy between features as another indicator of feature evaluation.It has not only evaluated the contribution of features to the accuracy of prediction model, but also ensured the low redundancy of the corresponding feature arrangement modes.Therefore, CMI can reduce the effect of redundant information between features on feature selection results.

Outlier-Robust Extreme Learning Machine (ORELM)
ELM performs the prediction by minimizing the training error, but it is prone to reduce the generalization performance of the model.ORELM introduces specification parameters C to improve ELM generalization ability.ORELM uses 1 − norm to replace 2 − norm, and converts the target function from: min y − Hβ 2 2 (6) to: min where e = y − HB, y represents the output matrix, β represents the weight matrix between the hidden layer and the output layer.H represents the hidden layer output matrix.When establishing wind speed forecasting model, assuming a data set with N training samples (x i , y i ), i ∈ [1, N], where x i represents the input matrix, y i represents the output matrix.Assume that g(x) represents the activation function, and the number of hidden layer nodes is L. The iterations of the ORELM algorithm are as follows: (a) The implicit layer node parameters, namely the weight matrix w i and the threshold b i are randomly generated, where i ∈ [1, L]; (b) Calculate the output matrix of the hidden layer [32]: (c) Parameter initialization: µ = 2N/ y 1 , where µ represents penalty coefficient, e 1 = 0, the Lagrange multiplier λ 1 = 0; (d) This constraint optimization problem is solved by using the augmented Lagrange multiplier (ALM) method, execute the following iteration process until the loop parameter k reaches the maximum number of iterations: ORELM avoided solving the sparse matrix by converting the ELM's target function into a manageable convex relaxation problem.In addition, the ALM method is adopted to deal with the convex relaxation problem, and the ability of prediction model to deal with discrete data is strengthened, the generalization performance of ELM is improved [28].

Forecasting Accuracy Evaluations
For performance evaluation model, using Root-Mean-Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) as the index to evaluate model performance, which are widely used in the wind speed forecasting field.The RMSE and MAPE are calculated as follows: where xt represents the predicted value corresponding to the real wind speed x t , T represents the number of wind speed samples.

The Features of The Input Set
The wind speed at different time periods of a year has different characteristics and is affected by the surface roughness and air density.Wind shear (1/s) can reflect the surface roughness around the wind tower and the rate of wind velocity at different heights [33].Temperature ( • C), pressure (mBar) and humidity are variables that able to influence the air density and can affect wind speed [34].Therefore, when building the original feature set, the basic features include wind speed x t (m/s), temperature T t ( • C), relative humidity R t (%), absolute humidity S t (g/Kg), atmospheric pressure P t (mBar) and wind shear C t (1/s) of the 16 historical time points (15 min of sampling interval), including the prediction time point t.Meanwhile, the extreme values (max, min), mean, Standard Deviation (std) and Variance (var) of each feature were counted as supplementary features, the final feature dimension is 126.The original historical features and their feature numbers are shown in Table 1.The original statistical features and their feature numbers are shown in Table 2.
Table 1.The original historical feature set and the features' serial number.

Feature Types Historical Features Numbers
Wind speed (x t )

Feature Types Statistical Features Numbers
Wind speed Data were collected using measured data from the National Wind Energy Technology Center (NWTC) M2 wind tower.The geographical location of NWTC and M2 wind tower is shown in Figure 1 [35].The region is located in north latitude 39.91 • and west longitude 105.29 • , the measured height of the wind speed and temperature is 80 m.The test environment is a personal computer with 16 GB of memory and the Intel(R) Core(TM) i7-6820hk processor which has an operating frequency of 2.7 GHz.The experimental platform is Matlab (R2016a, MathWorks, Natick, MA, USA).The multi-input and multi-output Strategy (MIMO) is more accurate than the rolling prediction method [7].Therefore, the new method adopts the MIMO prediction structure, which is constructed with the optimal feature subset obtained by low-redundancy forward feature selection, and builds a four-step wind speed forecasting model that meets the requirements of wind speed forecasting in different time periods (the step length is 15 min).

Data Preprocessing
CEEMD is used to pretreat the signals of wind speed time series.The signal is decomposed into a series of IMFs: 1 IMF ,...,IMF n .CEEMD parameters are determined by reference to existing literature and statistical experimental conclusions [7], adding a 5 db signal-to-noise ratio white noise to the original signal and the number of iteration is 500.If the high-frequency IMF of wind speed signal fluctuates repeatedly in a very short time, it is indicated that the mode contains more non-cyclical components.Although the volatile component contains a small amount of wind speed information, the large number of outliers in the mode can greatly affect the accuracy and stability of the prediction model.Therefore, CEEMD was used to decompose the wind speed samples to 8 to 10 IMFs, and filter out the highly volatile IMFs, the remaining IMFs reconstructs the new wind speed time series.Thus, the effect of outliers on prediction model is reduced, and the prediction accuracy of model is improved.
To compare the pretreatment impact of CEEMD, EEMD and EMD, the wind speed time series of 31 March 2009 is taken as an example.Figure 2 shows the pretreatment results of three decomposition methods.As shown in Figure 2, the wind speed curve obtained after CEEMD pretreatment reduces the volatility of wind speed, and is more accurately follow the trend of wind speed than other methods in the period from 15:45 to 23:45.Wind speed series after EMD and EEMD pretreatment is difficult to reflect the detailed trend of wind speed.

Data Preprocessing
CEEMD is used to pretreat the signals of wind speed time series.The signal is decomposed into a series of IMFs: IMF 1 , . . ., IMF n .CEEMD parameters are determined by reference to existing literature and statistical experimental conclusions [7], adding a 5 db signal-to-noise ratio white noise to the original signal and the number of iteration is 500.If the high-frequency IMF of wind speed signal fluctuates repeatedly in a very short time, it is indicated that the mode contains more non-cyclical components.Although the volatile component contains a small amount of wind speed information, the large number of outliers in the mode can greatly affect the accuracy and stability of the prediction model.Therefore, CEEMD was used to decompose the wind speed samples to 8 to 10 IMFs, and filter out the highly volatile IMFs, the remaining IMFs reconstructs the new wind speed time series.Thus, the effect of outliers on prediction model is reduced, and the prediction accuracy of model is improved.
To compare the pretreatment impact of CEEMD, EEMD and EMD, the wind speed time series of 31 March 2009 is taken as an example.Figure 2 shows the pretreatment results of three decomposition methods.As shown in Figure 2, the wind speed curve obtained after CEEMD pretreatment reduces the volatility of wind speed, and is more accurately follow the trend of wind speed than other methods in the period from 15:45 to 23:45.Wind speed series after EMD and EEMD pretreatment is difficult to reflect the detailed trend of wind speed.

Data Preprocessing
CEEMD is used to pretreat the signals of wind speed time series.The signal is decomposed into a series of IMFs: 1 IMF ,...,IMF n .CEEMD parameters are determined by reference to existing literature and statistical experimental conclusions [7], adding a 5 db signal-to-noise ratio white noise to the original signal and the number of iteration is 500.If the high-frequency IMF of wind speed signal fluctuates repeatedly in a very short time, it is indicated that the mode contains more non-cyclical components.Although the volatile component contains a small amount of wind speed information, the large number of outliers in the mode can greatly affect the accuracy and stability of the prediction model.Therefore, CEEMD was used to decompose the wind speed samples to 8 to 10 IMFs, and filter out the highly volatile IMFs, the remaining IMFs reconstructs the new wind speed time series.Thus, the effect of outliers on prediction model is reduced, and the prediction accuracy of model is improved.
To compare the pretreatment impact of CEEMD, EEMD and EMD, the wind speed time series of 31 March 2009 is taken as an example.Figure 2 shows the pretreatment results of three decomposition methods.As shown in Figure 2, the wind speed curve obtained after CEEMD pretreatment reduces the volatility of wind speed, and is more accurately follow the trend of wind speed than other methods in the period from 15:45 to 23:45.Wind speed series after EMD and EEMD pretreatment is difficult to reflect the detailed trend of wind speed.Meanwhile, in order to test the effect of pretreatment to wind speed forecasting accuracy, a wind speed forecasting experiment is carried out with predictor based on ORELM.The original feature set is used to construct the input feature set.The annual wind speed sample of 2008 is used as the training set, and the data from 30 March to 5 April 2009 is used as the test set.
It can be seen from Table 3 that the wind speed sequence obtained after pretreatment with CEEMD is forecasted and got the highest the accuracy.Compared with raw data, RMSE decreased 0.8437 m/s and MAPE decreased 13.93%, which proved that CEEMD effectively reduced the effect of outliers on wind speed forecasting and improves the prediction accuracy.

Data Set Construction
The correlation between wind speed and meteorological factors was different in different time of year.In order to meet the requirements of wind speed forecasting model in different time periods, a method of daily low redundancy wind speed features selection is proposed.Expect the data of the predicted day as target date, the data of k days before and after the target date in the former four years is selected as training set, then analyze the correlation between features with this data set.Meanwhile, data of the week before target date is selected as the validation set, which can ensure feature selection process get feature subsets with strong pertinence based on the meteorological characteristics of different forecast periods.The validation set can also the optimal feature set guaranteed to the meet of target date requirements.Figure 3 shows the data set on 6 April 2009 as an example.
Energies 2018, 11, x FOR PEER REVIEW 8 of 18 Meanwhile, in order to test the effect of pretreatment to wind speed forecasting accuracy, a wind speed forecasting experiment is carried out with predictor based on ORELM.The original feature set is used to construct the input feature set.The annual wind speed sample of 2008 is used as the training set, and the data from 30 March to 5 April 2009 is used as the test set.
It can be seen from Table 3 that the wind speed sequence obtained after pretreatment with CEEMD is forecasted and got the highest the accuracy.Compared with raw data, RMSE decreased 0.8437 m/s and MAPE decreased 13.93%, which proved that CEEMD effectively reduced the effect of outliers on wind speed forecasting and improves the prediction accuracy.

Data Set Construction
The correlation between wind speed and meteorological factors was different in different time of year.In order to meet the requirements of wind speed forecasting model in different time periods, a method of daily low redundancy wind speed features selection is proposed.Expect the data of the predicted day as target date, the data of k days before and after the target date in the former four years is selected as training set, then analyze the correlation between features with this data set.Meanwhile, data of the week before target date is selected as the validation set, which can ensure feature selection process get feature subsets with strong pertinence based on the meteorological characteristics of different forecast periods.The validation set can also the optimal feature set guaranteed to the meet of target date requirements.Figure 3 shows the data set on 6 April 2009 as an example.To ensure the new method meets the needs of wind speed forecast at different periods of a year, one day is selected randomly from each quarter of 2009 as the target for prediction (the test set).To test the generalization performance of the new method, new method is used to build the optimal prediction model for every predicted day.

Data Set Determination
During the training set construction, the parameter of k effects the prediction accuracy and efficiency of models directly.To obtain the appropriate value of parameter k, multi-step prediction models was constructed with different parameters k for 364 days (total 52 weeks) in 2009, and series of statistical experiments was carried out.Figure 4 shows the weekly average error curves with different values of parameter k.To ensure the new method meets the needs of wind speed forecast at different periods of a year, one day is selected randomly from each quarter of 2009 as the target for prediction (the test set).To test the generalization performance of the new method, new method is used to build the optimal prediction model for every predicted day.

Data Set Determination
During the training set construction, the parameter of k effects the prediction accuracy and efficiency of models directly.To obtain the appropriate value of parameter k, multi-step prediction models was constructed with different parameters k for 364 days (total 52 weeks) in 2009, and series of statistical experiments was carried out.Figure 4 shows the weekly average error curves with different values of parameter k.It can be seen from Figure 4 that the training time of models increases with the increase of parameter k, and the MAPE elevation of the parameter k in the interval from 30 to 45 is only 0.03%.In order to ensure the efficiency and accuracy of the new method, the data set with the value of parameter k is set as 30.

Analysis of Feature Correlation
To analyze the necessity of the feature selection with historical neighborhood data and the analysis advantage of redundancy feature set gained by CMI, the feature correlation on 21 January, 17 April, 18 July and 7 November 2009 was analyzed according to the Data of Adjacent (AD) and the Data of the whole Year (YD). Figure 5 shows the after the normalization analysis results of Pearson Correlation coefficient [36] (PCC), MI and CMI.Table 4 shows numbers of the most important 30dimensional features.
As can be seen from Figure 5, the importance of the same type of features varies greatly in different dates according to the analysis result with AD dataset.For example, in the analysis results of PCC, feature Pt is higher than the rest periods on 17 April significantly.Table 4 shows that the same correlation analysis method has different importance order for the same feature in different time dates.The analysis results with YD data sets cannot reflect the differences.
Table 4 can be used to compare the redundancy analytical capability between features with different correlation analysis methods.The overlap between features is the redundancy of information, and the redundancy of the same type of features is higher.As can be seen from Table 4, the historical wind speed category features xt has the highest frequency.It can be seen from Figure 4 that the training time of models increases with the increase of parameter k, and the MAPE elevation of the parameter k in the interval from 30 to 45 is only 0.03%.In order to ensure the efficiency and accuracy of the new method, the data set with the value of parameter k is set as 30.

Analysis of Feature Correlation
To analyze the necessity of the feature selection with historical neighborhood data and the analysis advantage of redundancy feature set gained by CMI, the feature correlation on 21 January, 17 April, 18 July and 7 November 2009 was analyzed according to the Data of Adjacent (AD) and the Data of the whole Year (YD). Figure 5 shows the after the normalization analysis results of Pearson Correlation coefficient [36] (PCC), MI and CMI.Table 4 shows numbers of the most important 30-dimensional features.
As can be seen from Figure 5, the importance of the same type of features varies greatly in different dates according to the analysis result with AD dataset.For example, in the analysis results of PCC, feature P t is higher than the rest periods on 17 April significantly.Table 4 shows that the same correlation analysis method has different importance order for the same feature in different time dates.The analysis results with YD data sets cannot reflect the differences.It can be seen from Figure 4 that the training time of models increases with the increase of parameter k, and the MAPE elevation of the parameter k in the interval from 30 to 45 is only 0.03%.In order to ensure the efficiency and accuracy of the new method, the data set with the value of parameter k is set as 30.

Analysis of Feature Correlation
To analyze the necessity of the feature selection with historical neighborhood data and the analysis advantage of redundancy feature set gained by CMI, the feature correlation on 21 January, 17 April, 18 July and 7 November 2009 was analyzed according to the Data of Adjacent (AD) and the Data of the whole Year (YD). Figure 5 shows the after the normalization analysis results of Pearson Correlation coefficient [36] (PCC), MI and CMI.Table 4 shows numbers of the most important 30dimensional features.
As can be seen from Figure 5, the importance of the same type of features varies greatly in different dates according to the analysis result with AD dataset.For example, in the analysis results of PCC, feature Pt is higher than the rest periods on 17 April significantly.Table 4 shows that the same correlation analysis method has different importance order for the same feature in different time dates.The analysis results with YD data sets cannot reflect the differences.
Table 4 can be used to compare the redundancy analytical capability between features with different correlation analysis methods.The overlap between features is the redundancy of information, and the redundancy of the same type of features is higher.As can be seen from Table 4, the historical wind speed category features xt has the highest frequency.The feature importance in the whole year Table 4. Feature numbers of the most important 30 dimensions.

Periods of Time
Analysis Methods  Table 4 can be used to compare the redundancy analytical capability between features with different correlation analysis methods.The overlap between features is the redundancy of information, and the redundancy of the same type of features is higher.As can be seen from Table 4, the historical wind speed category features x t has the highest frequency.
In the first 30 dimensional feature set on 17 April, the PCC method selected 15 dimensions wind speed category features, and the MI method selected the 14 dimensions wind speed category features, while the CMI method only selected the 11 dimensions wind speed category features.Meanwhile, the CMI method took the standard deviation of absolute humidity (115 dimensional) and the variance of air pressure (121 dimensional) into the first 30 dimensional feature set to replenish the information on humidity and air pressure.Similar phenomena occurred in other periods.It can be seen above that CMI method can improve the information integrity of feature subset.

Forward Feature Selection
The reorder feature gained by PCC, MI and CMI are respectively combined with ORELM, ELM and Back-Propagation Neural Network [25] (BPNN) for forward feature selection.Parameters of ELM and BPNN are set according to relevant references [26,27], the specification parameter C of ORELM is set to 2 −10 [28].
To compare the effect of the correlation with AD and YD data sets on predictive models, the feature subset gained with different data sets are combined with the predictors respectively.The forward feature selection is carried out for different forecast days in the condition of different training set and same validation set.MAPE is used to evaluate the prediction accuracy of different method with different feature subset.
Figure 6 shows the error curve of test set in the four predicted days include 21 January, 17 April, 18 July and 7 November 2009, respectively.As shown in Figure 6, the optimal feature subset is determined by the MAPE.Comparing Figure 6a,b, error curves of feature selection with the AD data set converged rapidly and achieved the minimum MAPE, while the error curves of feature selection based on the YD data set converged slowly and the minimum of MAPE is larger.
speed category features, and the MI method selected the 14 dimensions wind speed category features, while the CMI method only selected the 11 dimensions wind speed category features.Meanwhile, the CMI method took the standard deviation of absolute humidity (115 dimensional) and the variance of air pressure (121 dimensional) into the first 30 dimensional feature set to replenish the information on humidity and air pressure.Similar phenomena occurred in other periods.It can be seen above that CMI method can improve the information integrity of feature subset.

Forward Feature Selection
The reorder feature gained by PCC, MI and CMI are respectively combined with ORELM, ELM and Back-Propagation Neural Network [25] (BPNN) for forward feature selection.Parameters of ELM and BPNN are set according to relevant references [26,27], the specification parameter C of ORELM is set to 2 −10 [28].
To compare the effect of the correlation with AD and YD data sets on predictive models, the feature subset gained with different data sets are combined with the predictors respectively.The forward feature selection is carried out for different forecast days in the condition of different training set and same validation set.MAPE is used to evaluate the prediction accuracy of different method with different feature subset.
Figure 6 shows the error curve of test set in the four predicted days include 21 January, 17 April, 18 July and 7 November 2009, respectively.As shown in Figure 6, the optimal feature subset is determined by the MAPE.Comparing Figure 6a,b, error curves of feature selection with the AD data set converged rapidly and achieved the minimum MAPE, while the error curves of feature selection based on the YD data set converged slowly and the minimum of MAPE is larger.Tables 5 and 6 show the prediction effect of predictors based on different optimal feature subsets.As Tables 5 and 6 show, the optimal MAPE of AD-ORELM decreased by an average of 4.8% compared with YD-ORELM, the optimal MAPE of AD-ORELM decreased by an average of 3.5% compared with YD-ELM, the optimal MAPE of AD-BPNN decreased by an average of 3.7% compared with YD-BPNN, which proved that analyzing feature correlation with AD data can improve the performance of feature selection.
Meanwhile, AD-CMI-ORELM has the highest prediction accuracy in every predicted days, Tables 5 and 6 show the prediction effect of predictors based on different optimal feature subsets.As Tables 5 and 6 show, the optimal MAPE of AD-ORELM decreased by an average of 4.8% compared with YD-ORELM, the optimal MAPE of AD-ORELM decreased by an average of 3.5% compared with YD-ELM, the optimal MAPE of AD-BPNN decreased by an average of 3.7% compared with YD-BPNN, which proved that analyzing feature correlation with AD data can improve the performance of feature selection.Meanwhile, AD-CMI-ORELM has the highest prediction accuracy in every predicted days, which can be preliminarily proved that the redundancy among the features of the optimal feature subset is reduced by CMI, which improved the prediction accuracy of models.
Table 7 shows types of the optimal features subset obtained by the AD-ORELM method.In combination with Tables 5-7, it can be seen that the redundancy of feature subsets can be reduced after appropriate reduction of similar features.This reduction also improved the prediction accuracy.Meanwhile, adding new types of features with high correlation can enhance the information integrity of the feature subset and further reduce MAPE of the optimal feature subset.
The CMI method can control the redundant between features in four predicted days accurately.In the first three predicted days, the feature subsets obtained by CMI introduces the standard deviation of absolute humidity (115 dimensional) and pressure variance (121 dimensional) in the premise of low redundancy, makes the optimal subset of features obtained by CMI is more advantageous than those from MI and PCC in information integrity, and the smaller MAPE is obtained.On 7 November, CMI ensured that the information integrity in the optimal features subset was achieved, resulting in lower MAPE than MI and PCC.In this review, it can be proved that using CMI to carry out low redundancy forward feature selection can effectively improve the prediction accuracy and reduce the dimension of feature subsets.

Predictive Effect and Model Comparison
In order to verify the effectiveness and advancement of the new method, the optimal feature subset selected by the AD data feature selection is combined with different predictors.Meanwhile, to ensure the comparison results have wider ramifications, Classification and Regression Tree [37] (CART) which can automatically complete the feature selection process according to the training set sample and obtain the optimal feature set is used to predict the wind speed samples of different period of time.Figure 7 shows the prediction curves of the AD-CMI-ORELM method by taking the four predicted days as examples.
On 21 January, Figure 6 shows that the range of wind speed is very wide (minimum 1.55 m/s to 26.07 m/s), and wind speed increased from 2.43 m/s to 18.52 m/s rapidly, which brought extremely high requirement to prediction models.In the other three predicted days, the wind speed was less than 10 m/s and there was a plummet in wind speed.In particular, wind speed on 18 July was up to 17 h in a continuous random fluctuation period below 5 m/s.However, it can be seen from Figure 7 that, although situations of the four predicted days brought challenges to the prediction model, the new method can accurately fit the trend of wind speed, which proves the effectiveness and advancement of the new method.
Table 8 shows the prediction error of predictors construct with different optimal feature subsets.It can be seen from Table 8 that, no matter in which predicted day, the MAPE and RMSE generated with the optimal feature subset obtained by CMI are significantly less than the ones based on PCC and MI.This indicates that low redundancy feature subset can effectively improve the prediction accuracy of the model.
Models established with ORELM generated lower MAPE and RMSE, which proves that ORELM introduced the specification parameters C and adjusted the target function of ELM effectively reduces the effect of outlier data on the prediction precision and improves the generalization performance of the model.
with the optimal feature subset obtained by CMI are significantly less than the ones based on PCC and MI.This indicates that low redundancy feature subset can effectively improve the prediction accuracy of the model.
Models established with ORELM generated lower MAPE and RMSE, which proves that ORELM introduced the specification parameters C and adjusted the target function of ELM effectively reduces the effect of outlier data on the prediction precision and improves the generalization performance of the model.The comparison experiment proves that the new method can obtain better forecast results than AD-CART method, which once again proved that the performance of feature selection can be improved by using adjacent samples as validation set and the effectiveness of ORELM as a predictor.AD-CMI-ORELM model also obtained the minimum MAPE and RMSE in each prediction day.Take the experimental result of 21 January as an example.The AD-CMI-ORELM model decreased by 15.81% compared with MAPE of the worst model AD-PCC-ELM, while RMSE decreased by 20.6%.AD-CMI-ORELM was reduced by 4.6% and RMSE by 11.4%, compared with the suboptimal model AD-MI-ORELM.There is a similar increase in the other three predicted days, which proves the effectiveness and advancement of the new method.Meanwhile, due to the frequent and larger fluctuation of wind speed on 21 January, the RMSE was slightly worse than that of the other three predicted days.
However, compared with the other methods, the new method still has the same proportion of improvement, which further proves the effectiveness and advancement of the new method.
In order to further illustrate the effectiveness of the new method, the 7-day data was randomly selected from each season in 2009 to constitute a test set and verify the prediction effect of different models.The average error of each model in different seasons is shown in Table 9.
It can be seen from Table 9 that the error indexes of AD-AMI-ORELM model adopted by the new method still have obvious advantages in the prediction results obtained in all models, and the validity of the new method is proved again.The comparison experiment proves that the new method can obtain better forecast results than AD-CART method, which once again proved that the performance of feature selection can be improved by using adjacent samples as validation set and the effectiveness of ORELM as a predictor.AD-CMI-ORELM model also obtained the minimum MAPE and RMSE in each prediction day.Take the experimental result of 21 January as an example.The AD-CMI-ORELM model decreased by 15.81% compared with MAPE of the worst model AD-PCC-ELM, while RMSE decreased by 20.6%.AD-CMI-ORELM was reduced by 4.6% and RMSE by 11.4%, compared with the suboptimal model AD-MI-ORELM.There is a similar increase in the other three predicted days, which proves the effectiveness and advancement of the new method.Meanwhile, due to the frequent and larger fluctuation of wind speed on 21 January, the RMSE was slightly worse than that of the other three predicted days.
However, compared with the other methods, the new method still has the same proportion of improvement, which further proves the effectiveness and advancement of the new method.
In order to further illustrate the effectiveness of the new method, the 7-day data was randomly selected from each season in 2009 to constitute a test set and verify the prediction effect of different models.The average error of each model in different seasons is shown in Table 9.
It can be seen from Table 9 that the error indexes of AD-AMI-ORELM model adopted by the new method still have obvious advantages in the prediction results obtained in all models, and the validity of the new method is proved again.
(a) The local maximum and minimum points of the original signal S are connected by a cubic spline to obtain the upper envelope e max and the lower envelope e min .(b) The sequence m 1 = [e max + e min ]/2 is obtained by averaging two envelopes.(c) The first component h 1 is obtained by removing m 1 from S: (a) Two groups of white noise signals with the same amplitude N and opposite phase are introduced into the original signal S respectively, get two generated signals M 1 = S + N and M 2 = S − N; (b) Decomposition two groups of sequences with EMD method, obtain two groups of IMFs im f x1 and im f x2 .Then obtain the IMFs of CEEMD by averaging the components of the two groups of IMFs:

M2Figure 1 .
Figure 1.The geographical location of date measurement in NWTC.

Figure 2 .
Figure 2. Wind speed curves reconstructed by different decomposition methods.

Figure 1 .
Figure 1.The geographical location of date measurement in NWTC.

18 M2Figure 1 .
Figure 1.The geographical location of date measurement in NWTC.

Figure 2 .
Figure 2. Wind speed curves reconstructed by different decomposition methods.Figure 2. Wind speed curves reconstructed by different decomposition methods.

Figure 2 .
Figure 2. Wind speed curves reconstructed by different decomposition methods.Figure 2. Wind speed curves reconstructed by different decomposition methods.

Figure 3 .
Figure 3. Prediction of data set construction on 6 April 2009.

Figure 3 .
Figure 3. Prediction of data set construction on 6 April 2009.

Figure 4 .
Figure 4. Average weekly error curves with different parameter k.

Figure 4 .
Figure 4. Average weekly error curves with different parameter k.

Figure 5 .
Figure 5. Results of the features' correlation analysis.(a) The correlation analysis results on 21 January; (b) The correlation analysis results on 17 April; (c) The correlation analysis results on 18 July; (d) The correlation analysis results on 7 November; (e) The correlation analysis results of the whole year.

Figure 5 .
Figure 5. Results of the features' correlation analysis.(a) The correlation analysis results on 21 January; (b) The correlation analysis results on 17 April; (c) The correlation analysis results on 18 July; (d) The correlation analysis results on 7 November; (e) The correlation analysis results of the whole year.

Figure 6 .
Figure 6.Optimal feature selection curves.(a) Analysis curves with AD data; (b) Analysis curves with YD data.

Figure 6 .
Figure 6.Optimal feature selection curves.(a) Analysis curves with AD data; (b) Analysis curves with YD data.

Figure 7 .
Figure 7. Predicts curves of the proposed method on different predict days (a) Forecast curve of 21 January; (b) Forecast curve of 17 April; (c) Forecast curve of 18 July; (d) Forecast curve of 7 November.

Figure 7 .
Figure 7. Predicts curves of the proposed method on different predict days (a) Forecast curve of 21 January; (b) Forecast curve of 17 April; (c) Forecast curve of 18 July; (d) Forecast curve of 7 November.

Table 2 .
The original statistical feature set and the features' serial number.

Table 3 .
Result of multistep forecasting for wind speed with data before and after pretreatment.

Table 3 .
Result of multistep forecasting for wind speed with data before and after pretreatment.

Table 5 .
The optimal feature subsets of different predict days obtained with AD data.

Table 6 .
The optimal feature subsets of different predict days obtained with YD data.

Table 7 .
The feature variety of the optimal feature subset obtained by AD-ORELM method.

Table 8 .
Prediction accuracy on different predict days.

Table 9 .
Prediction results on all selected predict days.