Multi-Step Ahead Wind Power Generation Prediction Based on Hybrid Machine Learning Techniques

Accurate generation prediction at multiple time-steps is of paramount importance for reliable and economical operation of wind farms. This study proposed a novel algorithmic solution using various forms of machine learning techniques in a hybrid manner, including phase space reconstruction (PSR), input variable selection (IVS), K-means clustering and adaptive neuro-fuzzy inference system (ANFIS). The PSR technique transforms the historical time series into a set of phase-space variables combining with the numerical weather prediction (NWP) data to prepare candidate inputs. A minimal redundancy maximal relevance (mRMR) criterion based filtering approach is used to automatically select the optimal input variables for the multi-step ahead prediction. Then, the input instances are divided into a set of subsets using the K-means clustering to train the ANFIS. The ANFIS parameters are further optimized to improve the prediction performance by the use of particle swarm optimization (PSO) algorithm. The proposed solution is extensively evaluated through case studies of two realistic wind farms and the numerical results clearly confirm its effectiveness and improved prediction accuracy compared to benchmark solutions.


Introduction
Currently, the urgent pursuit of low-carbon economy and advances of wind power technologies are strongly driving the rapid sustainable transition in the energy sector as well as the wind power development across the world [1,2].Due to the intermittent and stochastic nature, the power generation of wind farms needs to be accurately predicted at different time-scales (e.g., daily, hourly or even less) and timely reported to the dispatch center.Accurate short-term wind power forecasting can improve wind power utilization, increase system reliability, reduce operational cost nd allow flexible dispatch strategies [3].In particular, the ultra-short-term wind power prediction for a couple of hours ahead with the small prediction cycle (e.g., 15 min) can provide strong support for frequency modulation and spinning reserve optimization.However, such ultra-short-term prediction is often a non-trivial task that demands advanced algorithmic solutions and tools with sufficient accuracy and acceptable computational complexity.
In the literature, much research effort has been made to address the prediction issues from different aspects, e.g., electricity pricing [4,5] and power generation [6], in energy systems.The available prediction tools and solutions of ultra-short-term prediction can be categorized into three classes: physical-based methods, statistical-based methods, and machine learning-based methods.The physical-based methods focus on the spatial and temporal factors in a full fluid-dynamics atmosphere model [7], which can generally perform well in longer horizons.The statistical-based methods, e.g., autoregressive model (AR), moving-average (MA), auto-regressive integrated moving average (ARIMA) and Kalman filters, carry out statistical analysis based on the historical data to identify the internal regularity and the tendency of variations to deduce the prediction results.The machine learning-based methods include several supervised learning models, e.g., artificial neural networks (ANNs) [8], support vector machines (SVMs) [9], adaptive neuro-fuzzy inference systems (ANFISs) [10] and Gaussian processes (GPs) [11].In addition, some combined or hybrid methods have been proposed aim to improve the prediction performance.For example, a hybrid intelligent method was proposed in [3] using multiple support vector regression (SVR) models with the parameters estimated based on an enhanced harmony search (EHS) algorithm.In [12], a hybrid forecasting model based on K-means clustering and an a priori algorithm was developed for short-term wind speed prediction and the prediction errors are corrected by associated rules.
In recent years, the feature selection and analysis of machine-learning based models for wind power prediction have received much attention.The prediction accuracy can be improved by exploring the information obtained from of the historical wind speed and generation time series data.In [13], the features are firstly extracted from the historical power generation data, and then the dataset is split into subsets based on the stationary patterns.In [14], a novel decomposition approach to fully consider the chaotic nature of wind power time series was proposed.The time series data were separated into different frequency characteristics using ensemble empirical mode decomposition (EMD) before carrying out the chaotic time series analysis and singular spectrum analysis (SSA).A forecasting model combining a support vector machine (SVM) optimized by a genetic algorithm and feature selection based on the phase space reconstruction was presented in [15] for short-term wind speed prediction.In addition, numerical weather prediction (NWP) data (including wind speed, direction, temperature, humidity, atmospheric pressure, etc.) were adopted as the input variables for supervised models.In [16], output data from different NWP models were used and the data with the minimum training error were selected to be used in both ANN and SVM models.Afterwards, the forecasting errors were corrected based on the model output statistics (MOS).The study in [17] proposed a wind power prediction model based on the composite covariance function considering the joint effects among features of NWP data.In [18], a data-driven feature extraction approach was developed to utilize unlabeled NWP data which can be used in the supervised forecasting models.
It should be noted that, as the dimensionality of input variable increases, irrelevant and redundant variables can deteriorate the prediction performance.Therefore, the selection of appropriate variables through dimensionality reduction approaches is needed [19], and two dimensionality reduction techniques, feature selection (variable selection) and feature extraction (feature transform), are often used [20].The latter can produce a new feature space through mapping the original features into lower dimensional ones, e.g., singular value decomposition (SVD), principal component analysis (PCA) and locally linear embedding (LLE).However, such feature extraction methods may often lose physical properties of the original variables, and also difficult to be interpreted.In time series data analysis, the variable selection that filters out some meaningless attributes without any transformation can be more attractive than feature extraction.The variable selection methods can select the compact subset from the original dataset to improve the performance and interpretability of the prediction model.
In general, three types of feature selection methods, filter methods, wrapper methods and embedded methods, are considered [21].The filter method ranks the input variables with a correlation or mutual information (MI) criteria and selects the variable with the highest ranking.It is effectively a pre-processing step before the development of the predictive model [21].The wrapper method identifies and evaluates the subsets of input variables based on the accuracy contributed by the given output variable.Similarly, the embedded one builds the close-loop search into a classifier construction in the training process [22].The wrapper and embedded techniques can generally achieve better accuracy than the filter technique, but the filter method is less likely to lead to over-fitting and with less computational complexity [23].
It should be highlighted that the existing solutions either have not been able to fully consider the available information (e.g., historical data and NWP) or select the appropriate variables for improving the prediction accuracy.To the author's best knowledge, the technical challenge of ultra-short-term prediction of wind power generation remains and the hybrid approach based on data mining and machine learning techniques has not been thoroughly exploited.
This paper attempts to address the challenge of ultra-short-term power generation prediction in wind farms.The main technical contributions made in this work are summarized as follows: a novel hybrid algorithmic solution is presented which considers both historical generation data and NWP data, and selects the optimal combination of input features using a filter method for different prediction steps.The proposed algorithmic solution was extensively evaluated and validated through case studies of two realistic wind farms.The basic idea behind the proposed prediction solution is illustrated in Figure 1.The long-term nonlinear dynamic characteristics of wind power time series data are extracted and recovered by using phase space reconstruction in C_C method.Afterwards, the most appropriate input variables are selected from the reconstructed phase and NWP features with respect to different forecasting steps based on the minimal redundancy and maximal relevance criterion.Finally, an adaptive neuro-fuzzy inference system based algorithmic solution with heuristically optimized parameters is adopted by using the selected input variables to produce the prediction results.
Energies 2018, 7, x FOR PEER REVIEW 3 of 20 achieve better accuracy than the filter technique, but the filter method is less likely to lead to over-fitting and with less computational complexity [23].
It should be highlighted that the existing solutions either have not been able to fully consider the available information (e.g., historical data and NWP) or select the appropriate variables for improving the prediction accuracy.To the author's best knowledge, the technical challenge of ultra-short-term prediction of wind power generation remains and the hybrid approach based on data mining and machine learning techniques has not been thoroughly exploited.
This paper attempts to address the challenge of ultra-short-term power generation prediction in wind farms.The main technical contributions made in this work are summarized as follows: a novel hybrid algorithmic solution is presented which considers both historical generation data and NWP data, and selects the optimal combination of input features using a filter method for different prediction steps.The proposed algorithmic solution was extensively evaluated and validated through case studies of two realistic wind farms.The basic idea behind the proposed prediction solution is illustrated in Figure 1.The long-term nonlinear dynamic characteristics of wind power time series data are extracted and recovered by using phase space reconstruction in C_C method.Afterwards, the most appropriate input variables are selected from the reconstructed phase and NWP features with respect to different forecasting steps based on the minimal redundancy and maximal relevance criterion.Finally, an adaptive neuro-fuzzy inference system based algorithmic solution with heuristically optimized parameters is adopted by using the selected input variables to produce the prediction results.The rest of this paper is organized as follows: Section 2 describes the input variable selection (IVS) solution based on (PSR) technique and mRMR criterion.Section 3 presents the framework of the proposed hybrid intelligence prediction model.Section 4 carries out the case studies and presents a set of key numerical results.Finally, the conclusions are given in Section 5.

Input Variable Selection (IVS)
Due to the chaotic property of the weather system, the evolution of dynamic characteristics has initial sensitivity.The correlation between historical time series and future wind power generation will decay rapidly with the increase of forecasting time step, and even deteriorate the prediction performance.Thus, the adoption of both historical generation data and the numerical weather prediction (NWP) data as the input variables is required.NWP aims to predict the variation of weather through solving the process of thermodynamics and hydrodynamics equations based on the meteorological data of the system.However, it can only provide short-term surface wind and other weather characteristics prognoses roughly, which are not entirely adequate for specific local conditions [24].NWP data are often adopted to provide ancillary information, e.g., wind speed, wind direction, temperature, humidity and air pressure, for prediction.For different step-ahead prediction, these input variables can have different impacts on the forecasting targets.Therefore, a two-stage input variable selection is used in the proposed prediction solution.Firstly, the initial variables can be selected from the historical series data through the phase space reconstruction The rest of this paper is organized as follows: Section 2 describes the input variable selection (IVS) solution based on (PSR) technique and mRMR criterion.Section 3 presents the framework of the proposed hybrid intelligence prediction model.Section 4 carries out the case studies and presents a set of key numerical results.Finally, the conclusions are given in Section 5.

Input Variable Selection (IVS)
Due to the chaotic property of the weather system, the evolution of dynamic characteristics has initial sensitivity.The correlation between historical time series and future wind power generation will decay rapidly with the increase of forecasting time step, and even deteriorate the prediction performance.Thus, the adoption of both historical generation data and the numerical weather prediction (NWP) data as the input variables is required.NWP aims to predict the variation of weather through solving the process of thermodynamics and hydrodynamics equations based on the meteorological data of the system.However, it can only provide short-term surface wind and other weather characteristics prognoses roughly, which are not entirely adequate for specific local conditions [24].NWP data are often adopted to provide ancillary information, e.g., wind speed, wind direction, temperature, humidity and air pressure, for prediction.For different step-ahead prediction, these input variables can have different impacts on the forecasting targets.Therefore, a two-stage input variable selection is used in the proposed prediction solution.Firstly, the initial variables can be selected from the historical series data through the phase space reconstruction technique (PSRT), and the initial variables further can be combined with NWP information as the candidate inputs.Secondly, the optimal input variables are filtered based on mRMR criterion.

The Initial Input Variable Selection of Historical Series Using PSRT
The Lyapunov exponent can be used to prove that the wind power generation time series has chaotic characteristics.Therefore, the nonlinear dynamic characteristics of wind power time series can be extracted and recovered by using phase space reconstruction theory.In [25], the time-delay technique was used to reconstruct a finite dimensional phase space of sampled system's time evolution.In the time delay coordinate reconstruction, it is not only very important but also difficult to choose an appropriate time delay τ and a good embedding dimension m since real datasets are finite and noisy.There are currently two different viewpoints for the estimation of the aforementioned parameters.One holds that they are irrelevant and should be chosen independently.To choose time delay τ, one can use methods including autocorrelation function, multiple-autocorrelation, mutual information, and so on.G-P algorithm or False Nearest Neighbor can be used to find the embedding dimension m.However, it is suggested that the delay time and embedding dimension are dependent mutually.The delay time window should be estimated for the choice of m and τ.The delay time window can be estimated using C_C method [26].The C_C method was used to determine the optimal input variables form the historical generation with reduced computational complexity and enhanced efficiency [27].
The phase space reconstruction is an efficient tool to analyze the dynamic pattern of a chaotic time series data.The delay-coordinate method was presented by Takens et al. to perform the phase space reconstruction.The time series x = {x i , i = 1, 2, . . ., N} can be reconstructed into a multi-dimensional phase space X = {X i } to represent the dynamic system, according to: where i = 1, 2, . . ., M, M = N − (m − 1)τ, m is the embedding dimension, and τ is the delay time.
In this study, the C_C method [26] was constructed via two correlation integrals, developed to reconstruct the given time series x = {x i } to simplify the candidate input forms.As suggested in [26], the correlation integral for the embedded time series is defined as: where • ∞ represents the infinite norm, t denotes the index lag, and θ is the which denotes the probability with the distance less than search radius r between any two points in the phase space.To study the nonlinear dependence and eliminate spurious temporal correlations, the given time series x = {x i } must be divided into t disjoint sub-sequences.The disjoint time series can be expressed as (3): where l is the length of subseries, l = I NT(N/t), and I NT(•) denotes reserving integer of the value.
Let us construct a statistic function S(m, N, r, t) as the serial correlation of a nonlinear time series, which is a dimensionless measure of nonlinear dependence.For general t in the above disjoint time series expressed in Equation ( 3), S(m, N, r, t) is defined as Equation ( 4): Finally, when N → ∞ , the following can be obtained: If the time series data follows an independent and identical distribution, S(m, r, t) is equal to zero constantly for fixed value m, t and N → ∞ .However, as the real dataset is finite and the components of series are correlated, S(m, r, t) is non-zero [26].The maximum deviation of S m, r j , t for all radius r can be defined as (6): Here, N, m and r can be estimated based on the Brock-Dechert-Scheinkman (BDS) statistics as N = 3000, m = 2, 3, 4, 5, r i = iσ/2, i = 1, 2, 3, 4, respectively, where σ = std(x) denotes the standard deviation of the time series.Then, Equations ( 7)-( 9) can be obtained as: The optimal delay time τ is determined when the value of S(t) first reaches zero or when ∆S(t) reaches the first minimum point.The optimal embedding window corresponds to the global minimum point of S cor .Furthermore, the embedding dimension m can be obtained by the following formula: m = /τ + 1.
Once the reconstruction parameters, delay time τ and embedding dimension m, are determined by C_C method, the initial variables related to the historical sequence are obtained as His Inputs (t) = [x(t), x(t − τ), . . ., x(t − (m − 1)τ)], where x(t) is the power generation values observed at current time t.The forecasting weather variables provided by NWP can be written as NWP Inputs (t ) = [V wind (t ), (sin(D wind (t )) + cos(D wind (t ))), T(t ), H(t ), P(t )], where V wind (t ), D wind (t ), T(t ), H(t ) and P(t ) in turn represent wind speed, wind direction, temperature, humidity and air pressure at the predicted time t .Therefore, candidate input variables set can be combined into V = His Inputs (t), NWP Inputs (t ) , and the input set dimension is |V| = m + 5.

The Optimal Selection of Candidate Input Variables Using mRMR-Criterion Ranking
The input variables of historical generation and NWP are selected using the minimal redundancy maximal relevance (mRMR) criterion based on mutual information (MI) [28].As MI can measure both the linear and nonlinear dependency between variables, it has been applied for correlation measurement and variable selection [29].The basic idea of variable selection algorithm based on MI is to select the best subset S from the original dataset X by maximizing the joint MI between S and target output Y, namely I(S; Y).In the literature, many MI-based variable selection algorithms are available, e.g., mutual information feature selection (MIFS) [30], mutual information feature selection under uniform information distribution (MIFS-U) [31], the minimal redundancy maximal relevance (mRMR) [28], and normalized mutual information feature selection (NMIFS) [32].In this work, the mutual-information-based mRMR criterion is adopted to find the compact and informative input space.The mRMR technique aims to find a subset of candidate variables with maximal dependency (with respect to the target to be predicted) as well as minimal redundancy (between the variables in the subset).The concept of MI is based on entropy that is described as follows.
The entropy of a random variable indicates the required average amount of information to describe the random variable [33] which has been adopted in many studies [34,35].The entropy of a discrete random variable X = (x 1 , x 2 , . . . ,x N ) is denoted by H(X), where x i refers to the possible values that X can take for discrete variable or the possible value range for continuous variable.H(X) is defined as: where p(x i ) is the probability mass function.
For any two random variables, X and Y = (y 1 , y 2 , . . . ,y M ), the joint entropy is defined as: where p x i , y j is the joint probability mass function of X and Y.The conditional entropy of X given Y is defined as: The conditional entropy is the amount of uncertainty left in Y when a variable X is introduced, so it is less than or equal to the entropy of both variables.The conditional entropy is equal to the entropy if, and only if, the two variables are independent.Mutual Information (MI) is the amount of information that both variables share, and is defined as: MI can be expressed as the amount of information provided by variable X, which reduces the uncertainty of variable Y. MI is zero if the random variables are statistically independent.MI is symmetric, so: The minimal-redundancy-maximal-relevance criterion (mRMR) aims to identify a compact subset of informative input variables by simultaneously considering the maximum relevance scheme and minimum redundancy.The simple combination of individually informative input variables does not necessarily achieve a good forecasting performance [28].Therefore, both the informativeness of individual input variables and redundancy between them should be considered.Thus, the informativeness score for individual variable v i based mRMR criterion is given by: where V is the total candidate variables, S is the selected input variables, and | • | is the number of variables.The mutual information I(v i ; t) is used between the target t and the candidate input variable v i to measure the strength of v i relative to the forecasting process.The goal of second item is to optimally select those variables that reveal a minimum of resemblance or redundancy between them, thus making the selected set more representative or informative of the whole set.
In the implementation, a stepwise search, incremental forward selection (IFS) method, is used to select input variables according to Equation (15), in which greater J(v i , S) scores indicate more promising input variable v i .In the first step, Max-Relevance score of all candidate input variable is calculated, where the variable with the maximum I(v i ; t) score is determined as the first promising input variable: The rest of the variables are selected step by step according to the criterion in Equation (15).In step m (2 ≤ m ≤ |V|), it is supposed that an input variable subset S m−1 , composed of m − 1 promising variables, s 1 , s 2 , . . ., s m−1 , that has been selected from previous step (e.g., step m − 1).The m-th promising variable can be selected from V − S m−1 at step m by optimizing the following condition: As one input variable represents one step forward, the promising variables can be incrementally retrieved until step |V| where a total of input variables V are selected.The variables are also ranked in selection process and the informativeness score (InSc) in m-th step is given by: where s m is the most promising variable to be selected in m-th step according to Equation (17).Thus, the priority of candidate input variables can be ranked through the mRMR-based incremental forward selection (IFS) method.The cumulative amount of InSc, denoted as the information contributed from the newly added variable.The final optimal number of input variables can be determined according to the change trend of CumInSc.

Hybrid Intelligent Prediction Model
In this work, the proposed hybrid algorithmic solution combines the K-means clustering, Particle Swarm Optimization (PSO) and adaptive neuro-fuzzy interference system (ANFIS) in the prediction model.

Subsets Partition Using K-Means Algorithm
The obtained historical dataset is divided into subsets and the data in the same class are with the similar meteorological features.Consequently, the complexity of network training can be significantly reduced with improved generalization capability.The dataset partitioning is implemented using K-means algorithm as follows.
Here, data centralization and normalization are needed before clustering.Z-score standardization of the dataset is expressed as follows: where µ j is the mean of column i, σ j is the standard deviation of column, N is the number of instances, and D is the dimensionality of input variables.Given a training set {z 1 , z 2 , . . . ,z N }, the K-means clustering algorithmic can partition the dataset into k cohesive groups through an unsupervised learning process [36].Here, z i ∈ R D and first choosing cluster centroids {a 1 , a 2 , . . . ,a k } ∈ R D randomly.Then, the fundamental purpose of K-means algorithm is to minimize the following cost function: where • is the function representing the usual Euclidean norm.After determining the cluster center, the training samples are grouped into the subsets of the nearest cluster centers.For k subsets, Each subset U l determines a set of independent ANFIS network parameters during training process.The vector nearest to the cluster center is adopted as the network input when performing the online prediction.

Adaptive Neuro-Fuzzy Inference System
The adaptive neuro-fuzzy interference system (ANFIS) is a data-driven modelling technique [37] to address the multivariable nonlinear system prediction through nonlinear neural network and adaptive fuzzy reasoning process.The fuzzy membership function and fuzzy rules of the system can be obtained through learning from historical data, rather than expert experience or intuition.Figure 2 illustrates the typical structure of two-input ANFIS model.The ANFIS is based on Takagi-Sugeno inference approach that creates a nonlinear mapping from input space to the output space through using the fuzzy if−then rules.The ANFIS is comprised of five layers as follows.
where  is the function representing the usual Euclidean norm.After determining the cluster center, the training samples are grouped into the subsets of the nearest cluster centers.For k subsets, , , ,  , the number of samples in each subset is recorded as  .Each subset l U determines a set of independent ANFIS network parameters during training process.The vector nearest to the cluster center is adopted as the network input when performing the online prediction.

Adaptive Neuro-Fuzzy Inference System
The adaptive neuro-fuzzy interference system (ANFIS) is a data-driven modelling technique [37] to address the multivariable nonlinear system prediction through nonlinear neural network and adaptive fuzzy reasoning process.The fuzzy membership function and fuzzy rules of the system can be obtained through learning from historical data, rather than expert experience or intuition.Figure 2 illustrates the typical structure of two-input ANFIS model.The ANFIS is based on Takagi-Sugeno inference approach that creates a nonlinear mapping from input space to the output space through using the fuzzy if−then rules.The ANFIS is comprised of five layers as follows.x can be expressed as: ( ) ( ) ( ) ( ) 1Rules inference (Layer 2): The rules neurons receive input from their respective fuzzified neurons and calculate the rules active intensity n ω : ( ) ( ) Normalization (Layer 3): Each neuron in this layer receives all neuronal inputs from the previous layer and calculates the normalized active intensity for a given rule n ω : Defuzzifier (Layer 4): This layer computes the posteriori value with weight of given rule n f : Fuzzifier (Layer 1): Neurons in this layer perform the fuzzification operations, and the membership degree of input in different fuzzy sets (e.g., A1, A2, B1, and B2) can be obtained.Fuzzification is represented by fuzzy membership function f , and the output membership degree µ A , µ B for x 1 , x 2 can be expressed as: 1Rules inference (Layer 2): The rules neurons receive input from their respective fuzzified neurons and calculate the rules active intensity ω n : Normalization (Layer 3): Each neuron in this layer receives all neuronal inputs from the previous layer and calculates the normalized active intensity for a given rule ω n : Defuzzifier (Layer 4): This layer computes the posteriori value with weight of given rule f n : 1Output (Layer 5): This layer sums all defuzzified neuron outputs to arrive at the final ANFIS output y: In this work, ANFIS is adopted to fit individual sub-training sets with the parameters heuristically optimized by the use of Particle Swarm Optimization (PSO) algorithm.The structure and parameters of ANFIS are firstly determined using the fuzzy C-Means clustering (FCM) algorithm [38].Afterwards, PSO is adopted to optimize the parameters.Each particle in PSO can identify and maintain its locally optimal solution (Pbest), and also collectively search for the global optimal solution (Gbest) in the swarm [39].The location and velocity function in PSO can be expressed as (26).
where r i1 and r i2 are two random variables in the range of [0, 1], c 1 and c 2 are positive constants, and w is the inertia weight.
represents the position and velocity of the i-th particle, respectively i represent the best previous position of the i-th particle and the best previous position among all the particles in the population, respectively.

Performance Assessment and Numerical Result
To extensively verify the reliability of the proposed prediction solution, the performance assessment based on the data collected in two real wind farms with different locations and seasons are carried out: Anzishan wind farm (capacity of 45 MW, Henan, China, hub height of 70 m) in June 2017, and Xuqiao wind farm (capacity of 94 MW, Anhui, China, hub height of 90 m) in December 2017.The reference height of these two wind farms is 10 m.The power generation (directly measured) and NWP data with a 15-min interval (e.g., 96 observation values per day) are obtained from these two wind farms.The three-month data (about 8640 observation values) prior to the test month, March to May 2017 and September to November 2017, are used in the training process for Anzishan farm and Xuqiao farm, respectively.The proposed hybrid solution is implemented in the MATLAB (version 8.3, MathWorks, Natick, MA, USA) programming environment.

Input Variable Selection Process
The time series data of power generation in the previous month are used to determine the phase-space reconstruction parameters through the C_C method.With different spatial and temporal characteristics, the parameters of each wind farm will be determined, respectively.For Anzishan farm, it can be observed in Figure 3 that ∆S have the first local minimum when t is equal to 26, thus the optimal delay time τ is set to 26.The global minimum point of S cor corresponds to the optimal embedding window = 52 as shown in the figure.Thus, the embedding dimension m can be calculated as 3.As for Xuqiao farm, the optimal delay time τ is also equal to 26 as shown in Figure 4.The embedding window is observed as 134, thus the optimal embedding dimension m is 6.The historical time series data can be selected as m-dimension input variables using the phase-space reconstruction.For the current time t, the initial input variables related to the historical data are , where ( )

Xuqiao Hin Hin Hin Hin Hin Hin WindV WindD Temp Hum AirP
Afterwards, the candidate variables are sorted through mRMR criterion to rank the predictive strength of each input variable.By observing the variation trend of the cumulative amount of InSc , that is, when the CumInSc curve no longer increases or grows very slowly, the optimal number of input variables is selected.As the input variable selection is carried out adaptively in the prediction model, the selected input variables may vary for different prediction steps.
Based on proposed mRMR-criterion filter solution, Figure 5 shows the changes of CumInSc curves of Anzishan and Xuqiao wind farms in different predicted steps (e.g., 1 h, 2 h, 3 h and 4 h ahead), respectively.The number of input variables is selected accordingly as shown in Figure 5.In this study, when the CumInSc curve reaches the maxima, it is believed that adding the following variables at the back of extreme point will not add more useful information.Therefore, the The historical time series data can be selected as m-dimension input variables using the phase-space reconstruction.For the current time t, the initial input variables related to the historical data are His Inputs (t) = [x(t), x(t − τ), . . ., x(t − (m − 1)τ)], where x(t) is the power generation values observed at current time t.The m-dimensional input variables are in turn denoted as the set: {Hin1, Hin2, . . . ,Hinm}.In the same way, the forecasting weather variables provided by NWP data include wind speed, trigonometric wind direction, temperature, humidity, and atmospheric pressure.NWP Inputs (t ) = [V wind (t ), (sin(D wind (t )) + cos(D wind (t ))), T(t ), H(t ), P(t )] is in turn denoted as the set: {windV, windD, Temp, Hum, airP}.Based on the obtained reconstruction parameters by the C_C method, the candidate input variables set of two wind farms can be obtained as follows: Anzishan : {Hin1, Hin2, Hin3, WindV, WindD, Temp, Hum, AirP} Xuqiao : {Hin1, Hin2, Hin3, Hin4, Hin5, Hin6, WindV, WindD, Temp, Hum, AirP} Afterwards, the candidate variables are sorted through mRMR criterion to rank the predictive strength of each input variable.By observing the variation trend of the cumulative amount of InSc, that is, when the CumInSc curve no longer increases or grows very slowly, the optimal number of input variables is selected.As the input variable selection is carried out adaptively in the prediction model, the selected input variables may vary for different prediction steps.
Based on proposed mRMR-criterion filter solution, Figure 5 shows the changes of CumInSc curves of Anzishan and Xuqiao wind farms in different predicted steps (e.g., 1 h, 2 h, 3 h and 4 h ahead), respectively.The number of input variables is selected accordingly as shown in Figure 5.In this study, when the CumInSc curve reaches the maxima, it is believed that adding the following variables at the back of extreme point will not add more useful information.Therefore, the candidate variables before the cumulative information maximum are regarded as the optimal or near optimal input variables to the prediction model.The detailed ranking of candidate variables and the number of input choices in multi-step ahead prediction for each farm are shown in Tables 1 and 2, respectively, where the selected input variables are highlighted in shade.It can be seen that the variables ranking in the step of proximity is similar and asymptotic.For different step-ahead prediction, the proposed hybrid solution with ranking the predictive strength candidate variables can select a compact subset of informative input variables based on the max-relevance and min-redundancy, which can effectively reduce the input dimension and interference information.
and the number of input choices in multi-step ahead prediction for each farm are shown in Tables 1  and 2, respectively, where the selected input variables are highlighted in shade.It can be seen that the variables ranking in the step of proximity is similar and asymptotic.For different step-ahead prediction, the proposed hybrid solution with ranking the predictive strength candidate variables can select a compact subset of informative input variables based on the max-relevance and min-redundancy, which can effectively reduce the input dimension and interference information.

Case Study and Numerical Result
For different prediction steps, the corresponding selected samples are used to train the aforementioned hybrid prediction solution in Section 3. Here, the main parameters and settings for training the optimal ANFIS are summarized in Table 3.After determining the ANFIS parameters by training samples, the multi-step ahead prediction results in the test month can be obtained.Figure 6 presents the prediction result of 150 h from the test month for two wind farms, respectively.To evaluate the effectiveness and accuracy of the prediction solution, the performance metric in terms of normalized root mean squared error (nRMSE) and normalized mean absolute error (nMAE) are adopted [40], as given in Equations ( 27) and (28), respectively.In general, smaller values of these measures indicate that the corresponding solution has better prediction performance.
where P pi is the predicted power of time point i, P mi is the measured mean power of time point i, N is the number of prediction samples, and C i is the operating capacity of time point i.
To evaluate the proposed mRMR-criterion input variable selection (IVS) solution based prediction approach, mRMR-IVS model, a detailed comparative study was conducted for multi-step ahead wind power generation prediction.Two IVS based prediction approaches, the phase space reconstruction based (PSR-IVS) model and principal component analysis based (PCA-IVS) model, were selected as the comparative benchmark.For PSR-IVS based prediction model, the input variables included the NWP variables and the phase space reconstruction variables of the historical time series.The reconstruction was determined based on C_C method as well.In this model, the input variables were the ones which are candidates in proposed mRMR-IVS solution, without further exquisite selection.For PCA-IVS based model, the input variables were transformed from the combination of NWP variables and a time series of 2 h, which use the principal component analysis (PCA) technique [41] to map the dataset from the original space to the principal component space.In this model, the original attribute variables are automatically reduced to appropriate input variables and the independent principal components can well maintain the key characteristics of the original variables.After selecting or extracting the input variables, all the IVS-based approaches used the hybrid model introduced in Section 3 to carry out the predication.
Two error criteria, nRMSE and nMAE, were used to assess the performance of all considered prediction models.Tables 4 and 5 show the comparison of multi-step ahead prediction performance of different models for two wind farms, respectively.As shown in Tables 4 and 5, the proposed model demonstrates the smallest error over all 16 steps of prediction compared with both benchmark models.In addition, compared with the principal component analysis-based model (PCA-IVS), the phase space reconstruction based (PSR-IVS) model performs better in short step prediction, but has larger error in long prediction period.Due to the sophisticated and targeted input variable selection, the proposed model has a better performance than the benchmark models in the overall multi-step ahead prediction.
To present the comparison more intuitively, Figures 7 and 8 show the broken line in two wind farms of three cases based on the values of nRMSE and nMAE of different prediction steps.The curve trend shows that the error level rises with the increase of prediction step, meeting the objective expectation.It is shown that there are fluctuations in the curve of two benchmark models, especially in the intermediate prediction period, in that the selection of input variables cannot adapt to each prediction step.In the proposed model, the error trend increased smoothly, indicating that the mRMR-IVS based model can automatically select the optimal or nearly optimal input variables for different prediction steps to reduce the error.It means that the proposed hybrid solution can select suitable input variables effectively in different geographical environments and seasons, showing better adaptability and robustness.To further validate the integral time-scales of ultra-short-term wind power prediction, which is generally 0-4 h in the future with a resolution of 15 min, 100 integrated time series with consecutive 4-h prediction were randomly selected during the test-month forecasting process to calculate the performance metric in statistics.The nRMSE and nMAE indicators of each integrated time series were calculated.Then, the probability of different error levels could be obtained according to the statistics of appearing frequency in these 100 results.The average errors and the probability distribution of typical error levels of Anzishan and Xuqiao wind farms are shown in Tables 6 and 7, respectively.For both cases, the proposed mRMR-IVS based solution demonstrates the minimum mean errors.The results given in Tables 6 and 7 verify that the proposed model is more competitive for most probability distributions at different error levels.This clearly confirms that the proposed solution can provide improved prediction performance than the comparison benchmarks.

Conclusions and Future Work
This paper develops a novel algorithmic solution for ultra-short-term wind power generation prediction using hybrid machine learning techniques.The proposed solution is implemented through two steps: firstly, the input variable selection (IVS) is carried out using phase space reconstruction (PSR) technique and minimal redundancy maximal relevance (mRMR) criterion.Secondly, the input instances are divided into a set of subsets using the K-means clustering to train the ANFIS with parameters optimized using PSO.The proposed solution was extensively evaluated and validated through case studies based on real wind farms.The numerical results demonstrate the superiority of the proposed model compared with the benchmark models.
In the future, two research directions are considered worth further research effort.The proposed prediction solution can be further exploited using other supervised learning algorithms and advanced error correction techniques as well as extensively validated based on more field measurements.The hybrid machine learning techniques can be further extended and incorporated into control and management strategies of renewable energy systems (e.g., [42][43][44][45]) to improve the system operational performance.

Figure 1 .
Figure 1.The framework of the proposed hybrid algorithmic solution.

Figure 1 .
Figure 1.The framework of the proposed hybrid algorithmic solution.

Figure 4 .
Figure 4. Results of ∆S(t) mean and S(t) cor produced by the C_C method in Xuqiao farm.

Figure 5 .
Figure 5. Cumulative informativeness score curve for IVS of Anzishan and Xuqiao farms: (a) one-hour ahead prediction; (b) two-hour ahead prediction; (c) three-hour ahead prediction; and (d) four-hour ahead prediction.

Figure 5 .
Figure 5. Cumulative informativeness score curve for IVS of Anzishan and Xuqiao farms: (a) one-hour ahead prediction; (b) two-hour ahead prediction; (c) three-hour ahead prediction; and (d) four-hour ahead prediction.

2796 EnergiesFigure 6 .
Figure 6.Partial results of multi-step ahead prediction in the test month: (a) power generation prediction of 150 h in Anzishan wind farm (June 2017); and (b) power generation prediction of 150 h in Xuqiao wind farm (December 2017).

Figure 6 .
Figure 6.Partial results of multi-step ahead prediction in the test month: (a) power generation prediction of 150 h in Anzishan wind farm (June 2017); and (b) power generation prediction of 150 h in Xuqiao wind farm (December 2017).

Figure 7 .Figure 7 .
Figure 7. Error trend in different prediction steps of Anzishan farm: (a) nRMSE of test month; and (b) nMAE of test month.

Figure 7 .
Figure 7. Error trend in different prediction steps of Anzishan farm: (a) nRMSE of test month; and (b) nMAE of test month.

Figure 8 .
Figure 8. Error trend in different prediction steps of Xuqiao farm: (a) nRMSE of test month; and (b) nMAE of test month.
the number of samples in each subset is recorded as t is the power generation values observed at current time t.The m-dimensional input variables are in turn denoted as the set:

Table 1 .
Ranking and selection of candidate variables for Anzishan Farm.

Table 1 .
Ranking and selection of candidate variables for Anzishan Farm.

Table 2 .
Ranking and selection of candidate variables for Xuqiao Farm.

Table 3 .
Main setting parameters for training.

Table 4 .
Comparison of the multi-step ahead prediction performance of Anzishan farm.

Table 4 .
Comparison of the multi-step ahead prediction performance of Anzishan farm.

Table 5 .
Comparison of the multi-step ahead prediction performance of Xuqiao farm.

Table 6 .
Mean errors and probability distributions of Anzishan farm.

Table 7 .
Mean errors and probability distributions of Xuqiao farm.