A Critical Review of Wind Power Forecasting Methods—Past, Present and Future

: The largest obstacle that suppresses the increase of wind power penetration within the power grid is uncertainties and ﬂuctuations in wind speeds. Therefore, accurate wind power forecasting is a challenging task, which can signiﬁcantly impact the e ﬀ ective operation of power systems. Wind power forecasting is also vital for planning unit commitment, maintenance scheduling and proﬁt maximisation of power traders. The current development of cost-e ﬀ ective operation and maintenance methods for modern wind turbines beneﬁts from the advancement of e ﬀ ective and accurate wind power forecasting approaches. This paper systematically reviewed the state-of-the-art approaches of wind power forecasting with regard to physical, statistical (time series and artiﬁcial neural networks) and hybrid methods, including factors that a ﬀ ect accuracy and computational time in the predictive modelling e ﬀ orts. Besides, this study provided a guideline for wind power forecasting process screening, allowing the wind turbine / farm operators to identify the most appropriate predictive methods based on time horizons, input features, computational time, error measurements, etc. More speciﬁcally, further recommendations for the research community of wind power forecasting were proposed based on reviewed literature.


Introduction
In recent years, rapid economic growth has been owed to the power production increment in different ways. Energy extracted from fossil fuels has many opponents because it leads to air pollution, ozone depletion and global warming [1]. According to the Paris agreement, for the aim of limiting the global temperature rise under 2 • C, renewable energies have to supply two-thirds of the global energy demand up to 2050 [2]. Among all kinds of renewable energies like solar photovoltaic, tidal, waves and modern bioenergy, wind power has become extremely popular because it is highly efficient, cheap and beneficial for the environment [3]. Additionally, due to its abundance, wind energy plays a leading role in electricity production of the renewable energy sector [4]. It has the greatest demand and growth among all the renewable energy sources over the last decade [5]. Recent investigations showed that yearly increment of installed wind power capacity is around 30% [6].
In the UK, the total renewable energy production was 119 TWh in 2019, indicating an increase of 8.5% from the previous year. A large share of this growth belonged to wind energy, which is about 64 TWh (53.7%). This high capacity ranks the UK after China, USA, Germany, India and Spain in Table 1. Prediction horizons in wind power forecasting.

Time Horizon
Range Applications very short-term few minutes to 30 min regulation actions, real-time grid operations, market clearing, turbine control short-term 30 min to 6 h load dispatch planning, load intelligent decisions medium-term 6 h to 1 day operational security in the electricity market, energy trading, on-line and off-line generating decisions long-term 1 day to a month reserve requirements, maintenance schedules, optimum operating cost, operation management

Prediction Methodologies
According to applied methodologies, wind power forecasting models can be further divided into persistence methods, physical methods, time series models and artificial neural networks (ANNs). Their differences are located in the required input data, the accuracy at different time scales and the complexity of the process.

Persistence Methods
In this method, which is normally used as a reference, wind power in the future will be equal to measured power in the present. This approach was commonly used to be compared with novel short-term forecasting methods to identify their improvements [9][10][11][12][13][14]. The accuracy of this method can quickly deteriorate with the increment of prediction timescale [10]. Apart from being simple and economical, the main advantage of this method is that neither a parameter evaluation nor external variables are required [15].

Physical Methods
Physical methods use detailed physical characterisation to model wind turbines/farms. This modelling effort was often carried out by downscaling the numerical weather prediction (NWP) data, which requires a description of the area, such as roughness and obstacles as well as weather forecasting data of temperature, pressure, etc. These variables are used in complex mathematical models that are time-consuming to determine wind speed. Then, the predicted wind speed will be taken to the related wind turbine power curve (normally provided by the turbine manufacturer) to forecast wind power. This method does not need to be trained with historical data, but they depend on physical data [12]. In recent decades, many physical methods have been proposed. For example, Focken et al. [16] created a physical wind power forecasting approach for time scales up to 48 h ahead. The method was founded on a physical approach that received input data from a weather prediction model. The boundary layer was first shaped concerning roughness, terrain and wake effect. Besides, the day-to-day change of the thermal stratification of the atmospheric was taken into account to estimate the wind speed at hub height [16]. De Felice et al. [14] used a physical model to predict the electricity demand in Italy by considering 14 months of hourly temperature as inputs. The results of comparing their proposed method with the naive approach by mean absolute error (MAE) showed that NWP models can improve the forecasting performance, especially for the hottest regions. Even though this method is perhaps the best choice for medium to long-term wind power prediction, it is computationally complex and therefore needs considerable computing resources [4].

Statistical Methods
This method is generally based on developing the non-linear and the linear relationships between NWPs data (such as wind speed, wind direction and temperature) and the generated power. To define this statistical relationship, previous history data will be used as the training data. The model is then tuned by comparing the model prediction and the on-line measured power. After that, the model is ready to predict by the NWP forecast of the next few hours and the on-line measurements. This method is easy to model and inexpensive [17]. It is for short term periods, and as the estimation time increases, its prediction accuracy decreases [13]. More specifically, statistical methods can be divided into two main subclassifications: time series based and neural network (NN) established.

Time Series Models
These models, which were proposed by Box-Jenkins, apply historical data to generate a mathematical model for developing the model, estimating parameters and checking simulation characteristic. The general form of the model can be described as: while ϕ i represents the autoregressive parameter, θ j is the moving average parameter, α t is the white noise, p is the order of autoregressive, q is the order of moving average model and the X t will be the forecasted wind power at time t.
The whole equation represents the ARMA (AutoRegressive Moving Average) model, but if p is assumed to be zero, it will represent the moving average model (MA). In the meanwhile, when q is assumed to be zero, it will represent the autoregressive model (AR) [12]. Statistical methods based on this approach are easy to formulate and very applicable in short-term wind power forecasting [18]. They only need low computation time. However, they may not provide adequate prediction performances, especially when the time series are nonstationary [5]. Table 2 shows two-time series models with the specifications of their selected input features, including data size and sampling rate. Firat et al. [21] proposed a statistical model based on independent component analysis and the AR model for wind speed forecasting. Using six years hourly wind speed of a wind farm in the Netherlands, the authors claimed that the proposed model could give higher accuracy rather than the direct forecasting methods for 2-14 h ahead.
De Felice et al. [14] used NWP data and ARIMA models to forecast electricity demand in Italy. The temperature of the 14 months in the years between 2003 and 2009 were used as the main input values. A comparison of MAEs showed that the proposed model outperformed the persistence methods, especially in the hottest locations. Duran et al. [19] designed an AR model with exogenous variable (ARX) model. Using the wind speed as an exogenous variable, they compared the mean error of their model with persistence and traditional AR models and showed significant improvements in accuracy.

ANNs
ANNs, as one of the most commonly used methods for wind power prediction, can identify the non-linear relationships between input features and output data [22]. One of the reasons for the tendency to use neural networks is to avoid the complexity of the mechanical structure in wind turbines [23]. Typically, an ANN model consists of an input layer, one or more hidden layers [11], and an output layer, where the historical data/features are fed for training and testing (see Figure 1). It also consists of processing units called neurons, which are connected with certain weighted connections. The ANN adjusts the weight of these interconnections through the training process. If the desired Energies 2020, 13, 3764 5 of 24 output is known at the beginning of the process, it will be named supervised; contrarily, it will be called unsupervised [24]. Figure 1 shows a typical structure of an ANN model.  Table 3 shows a summary of the ANN models reviewed in this study. The selected input features were introduced along with the specifications of the wind turbine or wind farm, data size and sampling rate. Jyothi and Rao [31] used an adaptive wavelet neural network (AWNN) for short wind power prediction. The minimum normalised root mean square error (NRMSE) that they achieved was 0.02. Bilal et al. [1] designed an MLP network to forecast the wind power of four different wind farms in Senegal. The main input of their model was wind speed, but they also assessed different combinations of input variables like wind direction, temperature, humidity and solar radiation. The results showed that, excepting wind speed, air temperature has the highest impact on improving the accuracy of the model. Regarding the best structure for MLP, the authors considered the Levenberg-Marquardt backpropagation algorithm as training the algorithm and log-sigmoid transfer as the activation function. It was concluded that the MLP with three hidden layers (5, 7 and 8 neurons in each hidden layer) has the lowest NMSE. The performance of ANNs is dependent on many different factors, including data preprocessing, data structure, learning method, connections between input and output data and so forth [25]. There are more than 50 forms of ANNs, including multilayer perceptron (MLP) [1], wavelet neural network (WNN) [26], back-propagation neural network (BPNN), radial basis function neural network (RBFNN) [27], Elman neural network (ENN) [15], long-short-term memory (LSTM) [4], convolutional neural network (CNN) [28], etc. Designing an ANN model requires dealing with two steps: first, the selection of the proper structure of the network and then specifying the direction of the passed information. There are two major topologies, including feed-forward for passing data in one direction from the input to output layers and recurrent for mutual directions. The second step is picking the right learning algorithm among supervised, unsupervised and reinforcement learning [12]. Table 3 shows a summary of the ANN models reviewed in this study. The selected input features were introduced along with the specifications of the wind turbine or wind farm, data size and sampling rate.  Jyothi and Rao [31] used an adaptive wavelet neural network (AWNN) for short wind power prediction. The minimum normalised root mean square error (NRMSE) that they achieved was 0.02. Bilal et al. [1] designed an MLP network to forecast the wind power of four different wind farms in Senegal. The main input of their model was wind speed, but they also assessed different combinations of input variables like wind direction, temperature, humidity and solar radiation. The results showed that, excepting wind speed, air temperature has the highest impact on improving the accuracy of the model. Regarding the best structure for MLP, the authors considered the Levenberg-Marquardt backpropagation algorithm as training the algorithm and log-sigmoid transfer as the activation function. It was concluded that the MLP with three hidden layers (5, 7 and 8 neurons in each hidden layer) has the lowest NMSE.
Xu and Mao [32] used ENN to study a single 15 kW wind turbine on a west wind farm of China. Using input variables of wind speed, wind direction, humidity and temperature, the authors presented satisfying accuracy, particularly after the application of particle swarm optimisation algorithm.
In another investigation, Chang [34] developed a model based on BPNN for 10 min ahead wind power prediction. The historical wind power data of a wind turbine in Taiwan were used to verify the efficiency of this method. The results showed that the proposed neural network could predict wind power easily with an average absolute error of 0.278%.

Hybrid Approach
The combinations of different forecasting methods, such as ANNs and fuzzy logic models, are called hybrid approaches [28]. The main aim of this method is to retain the merits of each technique and improve the overall accuracy. In statistics and machine learning, diverse predictive models are often developed by using multiple algorithms and different training datasets. This process is often named as ensemble modelling, which is a more advanced type of hybrid forecasting. A combination may not always lead to a better result, comparing its constituents. However, it has been proved that there are fewer risks in most of the situations [12]. Many hybrid methods have been proposed based on the combination of different models. Table 4 shows the reviewed hybrid methods in this study, including input features for training models, specifications of the used dataset, database size and sampling rate. In what follows, a number of these methods were introduced in more details. Hong and Rioflorido [28] proposed a hybrid 24-ahead wind power prediction model based on CNN. Different operations in CNN, such as convolution, pooling and kernel, were used to pull out the input features. The defined features were then fed to an RBFNN, implementing the double Gaussian function (DGF) as an activation function. The authors also used adaptive moment estimation (ADAM) to further improve CNN and RBFNN. Using one-year historical power data from a wind farm in Taiwan, the proposed approach provided the best performance compared with other methods like multilayer feedforward neural network MFNN-GA (Genetic Algorithm), RBFNN-GA, RBFNN-DGF, CNN-MFNN and CNN-RBFNN. The authors also concluded that the application of DGF in the RBFNN generated better results than conventional RBFNN with the Gaussian function.
Lin et al. [36] implemented isolation forest (IF) along with deep learning neural network to detect outliers for more accurate wind power forecasting. Wind speed, wind direction, air temperature, etc., were extracted from a supervisory control and data acquisition (SCADA) dataset of an offshore wind turbine to be used as inputs while employing wind power as the output in the predictive model. Comparison results showed that IF is a more effective way of providing accurate forecasting, especially when the investigated data do not follow the normal distribution. In another paper [23], the authors critically evaluated eleven features from a 7 MW wind turbine in Scotland, including four wind speeds at different heights, average blade pitch angle, three measured pitch angles for three blades, ambient temperature, yaw error and nacelle orientation. The results revealed that the blade pitch angle had the greatest effect on the performance of the prediction model, even more than wind speed and wind shear. On the contrary, wind direction and air density contributed the least significance, which allowed their elimination for reducing computational time.
Zhang et al. [4] used the LSTM network to predict wind power production of a wind farm in China. Three-month wind speed data from NWP were used as inputs, and the produced wind power was treated as output. Considering the Chinese grid corporation standard, the authors compared their model with Radial Basis Function (RBF), wavelet, deep belief network (DBN), back propagation (BP) and Elman The results showed that the proposed model had improved the accuracy of forecasting, although its operating time was longer than the others. It also suggested that the performance of the model strongly depends on the range of wind speed. In high speeds, the wavelet provided better performance while the other methods predicted better in lower speed areas. The uncertainty of the forecasted power was also assessed by three different methods, including mixture density neural network (MDN), relative vector machine (RVM) and Gaussian mixture model (GMM). The results showed that the GMM gave the best performance.
Marcos et al. [37] used a mixture of a physical and a statistical model. The input data were atmospheric global-scale forecasts, which were provided by the global forecasting system (GFS). A Brazilian NWP model (BRAMS) was also used to refine the atmospheric global-scale forecasts by using physical considerations about the terrain, such as vegetation cover, soil texture, etc. After that, a systematic error correction filter with the capability of learning the dynamic behaviour of wind data was used to reduce the biases of the forecasted wind. After elimination of the biases, two main methods were used for wind power forecasting, manufacture's power curve and regression equations, which were derived from wind measurements and generated power data from SCADA systems. For generating polynomial regressions, observed one-year data of wind and power were considered using four equations: linear, quadratic, quadratic with considering previous power outputs and cubic. Comparing these four equations with statistical indexes like RMSE showed that cubic regression provides the best results. As other factors can also influence the power output, such as air density, wake effect, orography, etc., the Kalman filter was used again to eliminate systematic errors from the conversion model. Finally, it was concluded using the Kalman filter decreased the value of RMSE and increased the values of anomaly correlation coefficient (ACC) and Nash-Sutcliffe coefficient (NSC), all representing better forecasting.
Zhao et al.
[5] proposed a bidirectional model for 1-6 h ahead wind power forecasting. In their model, the forecasted power from the forward model was used as the input for the optimisation algorithm of the backward model. Then, by comparing the difference between forwarding and backward results, the authors were able to provide the final forecasting. Eight months of hourly measured wind power of an American wind farm were used for training while the other four months were used for further evaluation. Comparing the evaluation criteria of this model with the forward, backward and persistence method showed that it outperformed the others.
Liu at al. [40] combined three different prediction models including BPNN, RBFNN and least square support vector machine (LSSVM) by an adaptive neuro-fuzzy inference system (ANFIS) for 48-h-ahead wind power forecasting. As the first step, a Pearson correlation coefficient (PCC) based method was used to eliminate outliers. Sixty-day datasets of a wind farm in China, containing wind speed, wind direction, temperature and generated power, were used as inputs and an output to train the three methods. The evaluation of the proposed hybrid model showed that it outperformed the three individual forecasting models and can predict with remarkable accuracy progress.
Zhao et al. [41] used a Kalman filter to decrease the systematic errors of wind speed generated from a weather research and forecasting model. This model was used with wind direction, temperature and humidity as input variables for a multilayer feed-forward neural network to forecast a day-ahead wind power. The results showed that filtering the raw speed and application of MFNN could decrease the NRMSE from 17.81% to 16.47%.

Factors to Compare Different Methods
Methods of estimating wind power can be compared through various aspects, such as accuracy, input features, computational time, etc., which were presented in Table 2 (time series models), Table 3 (ANN models) and Table 4 (hybrid methods). After referring to the details of the selected methods to predict wind power, in the third column, the selected input data is specified, varying on the applied methods. The fourth to sixth columns of these tables refer to other specifications of the used database, such as the location it belongs to. The local features, such as temperature, humidity, etc., are highly depending on the selected regions, directly affecting the output power. Due to the significance of the input data volume and sampling rate, these two factors have also been studied. Besides, different forecasting methods have been evaluated by various criteria, which were also investigated in this study. These criteria are examined in Table 5 (time series models), Table 6 (ANN models) and Table 7 (hybrid models), where the used metric obtained from each reviewed article are summarised. Furthermore, in Section 3, the influencing factors of accuracy, input data and computational time will also be discussed in detail. Local measurement of both wind speed and direction improves the forecasting performance.

Accuracy
The accuracy of wind power forecasting is the most important factor for comparing different predictive methods, which can be determined by a certain evaluation metric. Usually, levels are provided for these evaluation factors in different systems, based on which it can be ensured that the model has enough accuracy. For example, in some references, it has been mentioned that the rate of RMSE should be within 10% of installed capacity for most of the models. In China, State Grid Corporation has defined 20% for the maximum acceptable RMSE for short term wind power forecasting and 15% for the forecasted value of 4 h ahead [4]. Methods with higher RMSE do not gain the required performance. In Ireland, the system managers (EirGrid and SONI) require a target accuracy of 6-8% [17].

Computational Time
Forecasting computational time (time required for training/learning) is considered as another significant factor for the selection of proper prediction models, especially for short term forecasting. It is also useful to understand whether it can be applied in real-time. For example, the proposed approach by Marcos et al. [37], which needed about 60-70 min computational time for each 72-h NWP model simulation, was proved that it could be used in real-time for power system operation. The computational time depends on the used methods, required accuracy, volume and sampling rate of input data, used computer, etc. It also relies on the training algorithm. For instance, as Zhao et al. [5] discussed, extreme learning machine (ELM) with feed-forward neural network performed faster than networks based on the backpropagation algorithm.
Singh et al. [8] showed that the training and testing of two-month input data with a 10 min sampling rate for the proposed MLP network could be finished in 30 min on a Pentium 150 MHz computer. The authors also claimed that using a separate neural network for each turbine rather than the wind farm guarantees fast training because the size and complexity of the network will be diminished. Another benefit of this scheme is that the separate models will not be affected by the off-line turbines.
Lin and Liu [23], in an effort for reducing the computational time, removed minor influencer features in the proposed model, including air temperature, nacelle orientation and yaw error. Even though this reduction resulted in a small saving of processing time (0.77 min) for a single wind turbine, the saved simulation time can be considered while taking into account a typical wind farm comprising of more than 100 turbines.

Performance Evaluation in Wind Power Forecasting
To assess the performance of the wind power forecasting methods, there are several statistical metrics, which can show the deviations of forecasted from measured wind power. The statistical description of how evaluation criteria were chosen in prior research was outlined in Figure 2, which is based on the information from Tables 5-7. As reflected in Figure 2, the majority of studies involved a selective number of evaluation criteria such as RMSE, NRMSE, MAE and mean absolute percentage error (MAPE). In the following, the types of techniques for the accuracy assessment of wind power forecasting methods are discussed in detail. metrics, which can show the deviations of forecasted from measured wind power. The statistical description of how evaluation criteria were chosen in prior research was outlined in Figure 2, which is based on the information from Tables 5-7. As reflected in Figure 2, the majority of studies involved a selective number of evaluation criteria such as RMSE, NRMSE, MAE and mean absolute percentage error (MAPE). In the following, the types of techniques for the accuracy assessment of wind power forecasting methods are discussed in detail.

Normalised Error
The most common error measurements that evaluate the performance by specifying the degree of similarity between forecasted and measured data are [43]: where i = hour of the predicted data, l = time horizon, T N (i, l) is the forecasted power, P N (i, l) is the target measured power and M is the total number of predicted data.

Normalised Mean Bias Error (NMBE)
The normalised mean bias error shows the difference between the average forecasted and observed wind power. The value shows if the method overestimates (NMBE > 0) or underestimations (NMBE < 0). This statistical metric displays systematic errors instead of the forecasting method's capability [5]. This statistical index does not offer enough information about the accuracy of the forecasting method when it is used by only itself.

Normalised Mean Absolute Error (%)
One of the most common wind power prediction performance indexes is the normalised mean absolute error [4]. It provides more precise random and systematic error analysing.

Mean Squared Logarithmic Error (MSLE)
The MSLE is a risk metric according to the expected value of the squared logarithmic error and can be indicated as: In this equation, n is the number of data points, (P measured ) k is the measured value of the kth data point from the SCADA database and P predicted k is the predicted wind power of the kth data point from deep learning modelling.

R-Square (R2)
This coefficient of determination shows the variance of the prediction from the measured data. The maximum possible value of the R 2 is 1.0, while negative values indicate a worse prediction. It can be defined as: 4.1.6. Explained Variance Score (EVS) Explained variation estimates the proportion to which a forecasting model scores for the dispersion of a specified dataset. For the best prediction, the value of EVS is 1.0, while lower scores represent worse prediction [23]. The EVS is defined as follows:

Median Absolute Error (MAE)
MAE is a risk metric according to the expected value of the absolute error. It is a non-negative floating-point, and its best value is 0.0. MAE is defined as: P measured is the measured value from the SCADA database and P predicted is the predicted wind power from deep learning modelling.

The Amplitude and Phase Error
The amplitude error shows if the predicted power is overestimated or underestimated, but the phase error is a result of a timing shift between forecasted and real data (Figure 3). With these two types of errors, the standard deviation error (SDE) can be defined [5]: here theÊ i is the mean normalised error, SD bias is amplitude error and DISP the phase error.

Statistical Error Distribution
Error distribution can be investigated by two statistical metrics: the skewness (SKEW) and the Kurtosis (KURT). The SKEW is an estimate of the symmetry of the distribution. If it is near zero, the distribution is symmetric, but negative or positive values shows the distribution is inclined to the left or right respectively. On the other hand, the KURT explains the degree of the distribution peak and shows the distribution of the data rather than a normal distribution [43]. These parameters can be determined as follows:

Enhancement of Predictive Accuracy
Almost all the current modelling efforts being made to predict wind power generation are to reduce forecasting errors. These efforts have led to various enhancements, which are summarised below.

Kalman Filtering
Since the accuracy of NWP data has a very important effect on the accuracy of wind power prediction, one way to improve its performance is to reduce the uncertainty of NWPs. For this

Statistical Error Distribution
Error distribution can be investigated by two statistical metrics: the skewness (SKEW) and the Kurtosis (KURT). The SKEW is an estimate of the symmetry of the distribution. If it is near zero, the distribution is symmetric, but negative or positive values shows the distribution is inclined to the left or right respectively. On the other hand, the KURT explains the degree of the distribution peak and shows the distribution of the data rather than a normal distribution [43]. These parameters can be determined as follows:

Enhancement of Predictive Accuracy
Almost all the current modelling efforts being made to predict wind power generation are to reduce forecasting errors. These efforts have led to various enhancements, which are summarised below.

Kalman Filtering
Since the accuracy of NWP data has a very important effect on the accuracy of wind power prediction, one way to improve its performance is to reduce the uncertainty of NWPs. For this purpose, the Kalman filtering algorithm is used to eliminate systematic errors. The Kalman filter as a group of mathematical equations presents the optimal estimation by merging last weighted observations to mitigate related biases. This method can easily adapt to any change in observations and does not need a long series of basic information.
In another research by Louka et al. [44], the Kalman filter was used to improve input data for the model that predicted wind power. The results showed that Kalman filtered wind information improved the model for long-term forecasting. These results also showed that, instead of spending money for high-resolution applications (<6 km), a combined more moderate NWP resolution and a flexible statistical technique of the Kalman filter can be used to provide more accurate results. Besides, Marcos et al. [37] used the Kalman filter twice in their forecasting model for a wind farm in Brazil. The first implementation of the Kalman filter was for wind speed forecasting error while, in the second application, the goal was eliminating the systematic errors of the wind speed to the wind power conversion model. The latter, in particular, was due to the impact of other variables on generated power, such as roughness, air density and wake effect. The results showed an obvious reduction of RMAE after the application of the Kalman filter.

Outlier Detection
Outliers of SCADA data, which can lead to the inaccuracy of wind power prediction, are usually caused by non-calibration of sensors or degradation over time [45]. As a technique of improving the model accuracy, detection and elimination of those outliers have been investigated in previous studies. Yang et al. [46] used an algorithm for preprocessing SCADA data for CM quality enhancement after examining the influencing factors of a wind turbine, including structural integrity and turbulence. Manobel et al. [47] applied the Gaussian process (GP) for detecting and removing outliers from SCADA data, where RMSE was improved by 25% in comparison with the standard forecasting methods. Besides, Lin et al. [30,36] used IF to deal with outliers to increase accuracy. The results showed that preprocessing the SCADA data would develop more accurate forecasting.

Optimal Combinations
By combining different NWP data and prediction methods, individual benefits can be merged. This combination also reduces the negative effect of errors on each technique in certain situations. In the explanation section of the hybrid method, references were made to several compounds of different approaches and their effect on increasing efficiency. However, in Section 5.3, the optimal combination of NWP data will be discussed.
Vaccaro et al. [48] designed an adaptive framework for wind power forecasting based on a combination of different data sources. The novel part of their investigation was a flexible supervised learning system called the lazy learning algorithm, which combined meteorological data from different sources. This algorithm was able to be updated continuously. Using 12 months of wind speed observation of a generator site in Italy, the authors assessed the forecasting data by MAE and mean square error (MSE). The results showed that the proposed model outperformed local atmospheric models in wind power prediction. Peng et al. [49] showed that combining physical and statistical forecasting techniques can improve the accuracy after evaluating a proposed ANN model. The authors achieved an 80% reduction in RMSE. In another study, Lange and Focken [50] presented in details the benefits of the combination of NWP models in German weather service.

Input Parameters
To establish the most efficient wind power prediction model, the next critical factor is the selection of the best input features from the system [15]. This selection is extremely important in increasing the accuracy of predicting models. As shown in Tables 2-4, wind speed by far is the most used input variable for wind power forecasting. This is actually because the wind power is proportional to the cube of wind speed according to Equation (16) [51].
Zhang et al. [4] considered three different units of data in North China in their investigations. The results showed that wind speed, among all the NWP data, is the most important influencing parameter in terms of accuracy. The authors displayed this fact by comparing the forecasting results of two high accuracy wind speed data of units #10 and #16 with unit #58. The authors also noted that the change in the location of wind turbines is very sensitive in the performance of the forecasting method, because of the change in the wake effect, topography and shadow effect.
Wind direction is another factor with the effect on power generation. Considering the current design of a wind turbine, turbines are allowed to face into the wind during the time of operation [23]. Singh et al. [8] showed that wind speed and wind direction were the top two influence factors on wind power prediction through their MLP prediction model. Lin and Liu [23] presented that wind speed, wind direction, temperature and humidity had been the most used input features through their reviewed literature. They proposed a novel hybrid model, using wind speed in different heights, blade pitch angle, temperature, yaw error and nacelle orientation as input features. Blade pitch angle was used because it plays a vital role in the adjustment of the blades to obtain a safe power generation. After discussing the effect of air density on wind power (according to Equation (16)), the authors cleared that air density itself depends on air temperature, air pressure and relative humidity.
In this review, more than 40 wind power forecasting articles were investigated. Figure 4 gives a view of how different input features were used in the reviewed literature. As can be seen, apart from wind speed, other variables like temperature, wind direction, relative humidity and air pressure are also often used.
Energies 2020, 13, x FOR PEER REVIEW 15 of 22 Wind direction is another factor with the effect on power generation. Considering the current design of a wind turbine, turbines are allowed to face into the wind during the time of operation [23]. Singh et al. [8] showed that wind speed and wind direction were the top two influence factors on wind power prediction through their MLP prediction model. Lin and Liu [23] presented that wind speed, wind direction, temperature and humidity had been the most used input features through their reviewed literature. They proposed a novel hybrid model, using wind speed in different heights, blade pitch angle, temperature, yaw error and nacelle orientation as input features. Blade pitch angle was used because it plays a vital role in the adjustment of the blades to obtain a safe power generation. After discussing the effect of air density on wind power (according to Equation (16)), the authors cleared that air density itself depends on air temperature, air pressure and relative humidity.
In this review, more than 40 wind power forecasting articles were investigated. Figure 4 gives a view of how different input features were used in the reviewed literature. As can be seen, apart from wind speed, other variables like temperature, wind direction, relative humidity and air pressure are also often used. As discussed, choosing the right type of input features for a wind energy prediction model is critical to its performance. This has led to various research in input selection, processing data, as well as combining different input information to increase the accuracy of models. As presented, wind speed is the most significant parameter for wind power prediction. However, additional parameters have also been used to consider the benefits of atmospheric data, etc. Giorgi et al. [15] investigated the impact of numerical weather parameters on the performance of a wind farm power prediction, such as daily wind speed, pressure, relative humidity and ambient temperature. The authors designed eight different As discussed, choosing the right type of input features for a wind energy prediction model is critical to its performance. This has led to various research in input selection, processing data, as well as combining different input information to increase the accuracy of models. As presented, wind speed is the most significant parameter for wind power prediction. However, additional parameters have also been used to consider the benefits of atmospheric data, etc. Giorgi et al. [15] investigated the impact of numerical weather parameters on the performance of a wind farm power prediction, such as daily wind speed, pressure, relative humidity and ambient temperature. The authors designed eight different forecasting models with a variety of combinations of different ANNs and numerical weather parameters. The assessment of those models by normalised mean absolute percentage error (NMAPE) revealed that, apart from the clear importance of the predicted wind velocity, the pressure and temperature bring the highest benefits to the prediction model among other NWPs parameters. Besides, Lange et al. [52] included the predicted wind speed at 100 m height in their investigations. The results showed that the prediction errors (RMSE) decreased by more than 20%.
In another investigation, Velazquez et al. [53] assessed the influence of three input variables for ANN models, including wind speed, wind power density and power output. It was concluded that considering the wind direction as an input will lead to a decrease in forecasting error (MARE).
The results of the research from Bilal et al. [1] on input and output data of four different sites in Senegal showed that higher rates of the standard deviation of wind velocity could lead to a lower average fitting rate for prediction. The authors also proved that considering other climatic variables like temperature, humidity and solar radiation could reach an improvement of 0.3% in accuracy. They also showed that using wind direction improved the fitting rate of their method's prediction for 0.25%.
Apart from the effect of the type of input data on accuracy and performance of wind power prediction models, the data period, as well as the sampling rate, is effective too. The short period of data cannot provide proper information for training refined prediction models. On the other hand, when the period is long, it will not be the representative for the current situation of the wind farm. The recent investigations have shown that the older part of the long data will lead to distortion of the prediction [19]. Duran et al. [19] used different training periods from 3 months to 2.5 years to assess the training period on the prediction accuracy. The results showed that, though data with a period longer than one year had similar accuracy, the results of the case of 2.5 years was worse than two years. Among the cases lower than one year, the shortest period had the least accuracy.

Statistical Downscaling
Statistical downscaling to increase the quality of NWPs data was used to improve wind power forecasting. NWPs are usually provided for a wider area than the wind farm location while by statistical downscaling, higher-resolution computations can be employed to estimate wind speed at wind turbines location. Power predictions with these downscaled NWPs have higher accuracy. Al-Yahyai et al. [54] showed the impact of this factor by discussing the reduction of the prediction error as a result of higher resolution. The authors proved that the increment of the model's resolution to 7 km, providing better wind speed understanding.

Flowchart of Wind Power Forecasting
A flowchart is provided according to what was discussed in the previous sections, which makes the selection of wind power forecasting model more effective. As can be seen in the flowchart of Figure 5, in the beginning, based on different functional requirements, three-time horizons were considered, including short, medium and long term. For the short-term period of prediction, statistical methods have the best performance, which is easy to model and inexpensive. For medium-term horizons, hybrid approaches are suggested while a selection of the best method of prediction for long term periods depends on access to enough computing resource as well as computational time.

Conclusions
This study critically reviewed investigations regarding wind power forecasting models, focusing on methods of analysis, prediction time scales, error measurements and accuracy The flowchart for physical methods is provided based on the explanations given in the previous sections. In this approach, after the application of detailed physical characterisation and downscaling of the NWP data for wind speed calculation, the wind turbine power curve will be used to forecast the wind power. Comparison of this predicted power with on-line measurements with different evaluation criteria like RMSE will lead to performance assessment of the corresponding method.
In the statistical method, after the selection of input features and data preprocessing, the proper ANN shall be selected. Then, the appropriate structure including the layers and neuron numbers Energies 2020, 13, 3764 19 of 24 along with suitable activation and training algorithm will be identified. Next step is the training process, which will be followed by the evaluation of the model. The low accuracy of the model at this stage can be improved by modifying the structure or considering other available inputs features. If these mentioned methods do not improve the performance to the desired level in similar studies, the use of the hybrid method is recommended. In the hybrid approach, the most important factor is to use different combinations of methods to achieve the best performance of prediction. A variety of combinations of data preprocessing, error post-processing and parameter selection and optimisation are available to provide the most appreciated performance.

Conclusions
This study critically reviewed investigations regarding wind power forecasting models, focusing on methods of analysis, prediction time scales, error measurements and accuracy improvements. It was concluded that under the same conditions, physical methods are more complex and need considerable computing resources, but suitable for medium to long-term prediction. On the other hand, statistical methods, which performed better in short to medium term periods, were easy to be modelled and inexpensive. A combination of these two major methods with their merits led to the promising hybrid methods. Besides wind speed, temperature, wind direction, relative humidity and air pressure were the most often used input features in reviewed studies. Additionally, the one-year period and the sampling rate of 10 min were the most common features used for input data. Based on the discussions in this paper, a flowchart for wind power prediction is put forward, allowing the users to select appropriate prediction procedure based on different time horizons, analysis methods, error measurements, etc.
Based on the reviews in this paper, further recommendations were summarised as follows: (1) With the continuous development of high-rated wind turbines, power forecasting will keep increasing its significant role in wind turbine operating stages. More advanced and cost-effective prediction methods need to be developed to better forecast generated power from large-scale wind farms. More specifically, new hybrid methods, including incorporating numerical simulations and neural network, and more advanced combination, such as ensemble learning methods, are recommended. (2) The development of modern computers and storage methods allow handling a larger amount of database. Meanwhile, the larger size of the database has generated new challenges in terms of data preprocessing and error post-processing. Future studies should focus on developing less computational-extensive methods and removing the noise of the raw data. (3) Future wind farms are gradually moving from onshore sites to offshore ones. Offshore wind turbines, especially floating wind turbines, are operating in a different weather condition and terrain. Future wind power prediction methods should focus on developing appropriate methods for offshore wind prediction, especially the selection of features in coastal and offshore zones to balance between accuracy and efficiency. (4) To solve wind power forecasting (a typical regression problem), the perfect predictive model will provide zero error, which is the best performance. However, all wind power forecasting models contain errors due to the stochastic nature of wind and therefore, a perfect score does not exist in practice. Many factors can influence the accuracy of a predictive model, such as specific sizes and sampling rates of training/testing/validation datasets, used algorithms and model optimizations.
Overall, the performance of predictive models is relative and need to be evaluated through a baseline model. Based on the reviewed literature, most investigators used diverse robust baseline models to compare the performance of their newly developed methods. Nevertheless, a widely accepted baseline method of wind power forecasting has not reached a common view in the current research community. A further investigation is still required in developing a baseline model that works reliably in benchmarking other forecasting methods. Funding: This research was funded by the EPSRC Doctoral Training Partnership (EP/R513222/1).

Conflicts of Interest:
The authors declare no conflict of interest.

Latin symbols
Cp power coefficient E i (l) normalised error E i mean normalised error i hour l time horizon M total number of predicted data n number of data points P wind power p order of autoregressive (P measured ) k measured value of the k th SCADA data point P predicted k predicted wind power of the k th data point from deep learning modelling P measured measured value from the SCADA database P predicted Predicted wind power from deep learning modelling P N (i, l) target measured power q order of moving average model R radius of the rotor T(i, l) forecasted power u wind speed X t forecasted wind power