Ultra-Short-Term Prediction of Wind Power Based on Error Following Forget Gate-Based Long Short-Term Memory

To improve the accuracy of ultra-short-term wind power prediction, this paper proposed a model using modified long short-term memory (LSTM) to predict ultra-short-term wind power. Because the forget gate of standard LSTM cannot reflect the correction effect of prediction errors on model prediction in ultra-short-term, this paper develops the error following forget gate (EFFG)-based LSTM model for ultra-short-term wind power prediction. The proposed EFFG-based LSTM model updates the output of the forget gate using the difference between the predicted value and the actual value, thereby reducing the impact of the prediction error at the previous moment on the prediction accuracy of wind power at this time, and improving the rolling prediction accuracy of wind power. A case study is performed using historical wind power data and numerical prediction meteorological data of an actual wind farm. Study results indicate that the root mean square error of the wind power prediction model based on EFFG-based LSTM is less than 3%, while the accuracy rate and qualified rate are more than 90%. The EFFG-based LSTM model provides better performance than the support vector machine (SVM) and standard LSTM model.


Introduction
Renewable energy is increasingly being discussed to phase out fossil fuel power generation to address changes in conditions, such as the new climate system and serious air pollution. Wind power, as one of major source of renewable energies, varies through time and space due to various factors, such as wind speed, wind direction, and temperature. When large-scale wind power is integrated into the grid, fluctuations of wind power bring challenges to the system operations of power systems [1]. Accurate wind power prediction enables secure and economic operation, as well as better utilization of wind power [2]. According to the forecast time scale, wind power forecast is divided into ultra-short-term forecast within 4 h ahead, short term forecast within 3 days, medium-term forecast ranges from 1 week to 1 month, and long-term forecast within 1 year [3]. Ultra-short-term wind power prediction refers to forecast wind power in the next 15 min to 4 h. The prediction interval is 15 min, and the predicted wind power is rolled in next time [4]. Existing wind power forecasting methods can be divided into two categories: physical methods and data-driven methods (statistical and artificial intelligence methods) [5].
The physical models first compute the wind speed at the hub height of wind turbines based on terrain, wind farm layout, numerical weather prediction (NWP) data, and environmental characteristics (topography, land use, etc.). The physical models then calculate the wind power according to the wind speed power curve or meteorological power parameter model. This method can better reflect the physical essence of meteorological factors. References [6,7] used weather (wind, temperature, lightning density, humidity, barometric pressure, etc.) data to predict wind power. However, physical models rely heavily on meteorological data. However, the meteorological data typically provided once or twice a day, the physical model has to wait for hours between two iterations, which limit their application for ultra-short term forecasting.
To solve the problems of the physical model, researchers have shifted to a data-driven model that can deal with a large number of multivariate data. Data-driven methods usually use historical data of wind power and weather for prediction. The auto-regression integrated moving average (ARIMA), artificial neural network (ANN) method, and support vector machine (SVM) method are commonly used at present [8,9]. Some researchers decompose the wind power and then establish a prediction method based on the time series model [10,11]. However, the decomposition error will be generated by decomposing the wind speed and power sequence. This error will be transferred to the prediction model, which will reduce the prediction accuracy. Reference [12] used wind speed, wind direction, air temperature, and air pressure as input variables, and established an SVM model for short-term wind power forecasting. Its prediction accuracy is better than the artificial neural network. Wind power has time series characteristics. However, the wind power at the predicted time is related, not only to the state at the last moment, but also to the past moments. The existing time series models (such as ARMA) and commonly used neural networks (ANN, SVM, etc.) cannot learn the correlation between wind power and wind speed, wind direction, etc. Therefore, these prediction models are difficult to further improve the prediction accuracy.
To solve the problems of the traditional neural network model, researchers shifted to the based on deep learning model which can deal with time series data. Long short-term memory (LSTM) is a neural network based on deep learning. Compared with the traditional neural network, LSTM has obvious advantages in dealing with a large number of samples and nonlinear data. LSTM has recently received researchers' attention and has been used in the field of power system prediction [13,14]. Some researchers have applied it to the field of wind power prediction [15][16][17]. For example, there are prediction methods based on traditional LSTM or LSTM combined with optimization algorithm [18,19]. It shows that the prediction accuracy of LSTM is better than ANN, support vector regression (SVR), back propagation (BP), and Bayesian network. Because the LSTM network has short-term memory capability, it can model the influence of the wind power at the previous time on the wind power at the current time. Reference [20] used empirical mode decomposition and vibration mode decomposition to decompose the wind power sequence, then applied the LSTM model to predict wind power. However, the method using multiple decompositions will increase the prediction error. Because decomposition of the original sequence will produce decomposition errors, these errors will be transferred to the prediction model. Reference [21] used principal component analysis to select input variables and established a short-term wind power forecasting model based on the LSTM network. The input data is NWP data without considering the impact of historical data on the prediction moment [22]. Most of these methods do not improve LSTM for wind power prediction. Because the wind power and meteorological data are dynamic time series with large randomness.
To solve the above problems of the traditional LSTM model for wind power prediction, this paper proposes a modified LSTM, called error following forget gate (EFFG)-based LSTM, which updates the output of forget gate using the difference between the predicted value and the actual value. The input data of this paper is the integrated data of NWP and historical wind power. First, Spearman rank correlation analysis is performed between wind power and NWPs to select influential weather factors on wind power. Second, Spearman rank correlation analysis is performed among historical wind powers to determine the timestep of the prediction network. Finally, the EFFG-based LSTM is developed for ultra-short-term wind power prediction. Therefore, this paper provides a kind of one-step ahead forecasting on a 15-min resolution.

Correlation Analysis to Determine Input Variables for LSTM
There are many factors affecting wind power. Redundant information will be introduced if all meteorological factors are used as input variables. Nevertheless, few factors will result in insufficient information. Wind power has time series characteristics. The wind power at the present moment is related to historical wind power. The time step in the LSTM model determines how long historical wind power data needs to be used as input variables. If the time step is too small, it will cause a lack of prediction information. Otherwise, it will decrease model performance.
Because the probabilistic distribution of NWP data and wind power data is not a normal distribution, Spearman rank correlation coefficient analysis is used to select meteorological factors and determine the time step. The Spearman correlation coefficient is calculated as follows where rg(X i ) and rg(Y i ) are the ranks of each sequence of X and Y, and n is the number of samples. X i and Y i are two sequences of wind power. Figure 1 shows that the correlation analysis between wind power and NWP data and the correlation analysis among wind power time series are carried out using the Spearman rank coefficient to determine the input variables of LSTM. A time series, including real wind power (RWP) at the historical time and NWP at forecast time is reconstructed, which is used as the input of each step of prediction for wind power prediction.

Correlation Analysis to Determine Input Variables for LSTM
There are many factors affecting wind power. Redundant information will be introduced if all meteorological factors are used as input variables. Nevertheless, few factors will result in insufficient information. Wind power has time series characteristics. The wind power at the present moment is related to historical wind power. The time step in the LSTM model determines how long historical wind power data needs to be used as input variables. If the time step is too small, it will cause a lack of prediction information. Otherwise, it will decrease model performance.
Because the probabilistic distribution of NWP data and wind power data is not a normal distribution, Spearman rank correlation coefficient analysis is used to select meteorological factors and determine the time step. The Spearman correlation coefficient is calculated as follows rg Y are the ranks of each sequence of X and Y, and n is the number of samples. i X and i Y are two sequences of wind power. Figure 1 shows that the correlation analysis between wind power and NWP data and the correlation analysis among wind power time series are carried out using the Spearman rank coefficient to determine the input variables of LSTM. A time series, including real wind power (RWP) at the historical time and NWP at forecast time is reconstructed, which is used as the input of each step of prediction for wind power prediction.

Error Following Forget Gate-Based LSTM
LSTM has three gates: input gate, output gate, and forget gate, to protect and control the state of LSTM cell [23]. The inputs of the LSTM cell at time t include the inputs variables at time t, the outputs at time t−1, and the state variable at time t−1. The outputs of the LSTM cell include the output value at time t and LSTM cell state at time t. Figure 2 shows the LSTM model and its internal structure.

Error Following Forget Gate-Based LSTM
LSTM has three gates: input gate, output gate, and forget gate, to protect and control the state of LSTM cell [23]. The inputs of the LSTM cell at time t include the inputs variables at time t, the outputs at time t−1, and the state variable at time t−1. The outputs of the LSTM cell include the output value at time t and LSTM cell state at time t. Figure 2 shows the LSTM model and its internal structure.
The forget gate update of traditional LSTM uses the output of the previous moment and the input of the prediction moment. According to the output at time t−1 and the input at time t, the forget gate f (t) determines how much cell information is saved to t time cell state. The mathematical expression of the forget gate is: where S (·) is the sigmoid activation function; W f is the weight matrix of the forgetting gate, and b f is the bias of the forgetting gate. The forget gate update of traditional LSTM uses the output of the previous moment and the input of the prediction moment. According to the output at time t−1 and the input at time t, the forget gate f (t) determines how much cell information is saved to t time cell state. The mathematical expression of the forget gate is: where S (•) is the sigmoid activation function; Wf is the weight matrix of the forgetting gate, and bf is the bias of the forgetting gate.
The ultra-short-term wind power prediction is to predict the wind power in the next 15 min. Therefore, when the prediction at time t is carried out, the actual power value at the t time and the predicted value are obtained from the model output. The error between the actual value and the predicted value determines how much historical information is forgotten. From Equation (2), it can be seen that the traditional LSTM forgetting gate cannot update with the error between the actual value and the predicted value. The forget gate of traditional LSTM cannot take into account the adjustment effect of the deviation between the predicted value and the actual value at the next predicted moment.
In the ultra-short-term prediction of wind power, the deviation between the actual and the predicted value at the previous moment cannot only reflect the prediction ability of the model but also reflect the positive effect of historical information on the output. If the deviation is large, the influence of the previous moment should have little impact on the prediction moment. The deviation between the predicted value and the actual value at the previous moment is proposed in this paper to update the input for the forget gate. Therefore, the newly proposed forget gate is called error following the forget gate. The calculation formula of the new forget gate is as follows Wf and bf are the weight matrix and bias of the forget gate. The weight Wf and bias bf will be optimized by the optimization algorithm in the model training stage, such as Adam.
In comparison with the traditional LSTM network, the forget gate input is enhanced to take into account of the deviation between the predicted value and the actual value at the previous moment. The input gate and output gate remain the same as the ones in traditional LSTM. Figure 3 shows the new structure of the EFFG-based LSTM network at time t. The ultra-short-term wind power prediction is to predict the wind power in the next 15 min. Therefore, when the prediction at time t is carried out, the actual power value at the t time and the predicted value are obtained from the model output. The error between the actual value and the predicted value determines how much historical information is forgotten. From Equation (2), it can be seen that the traditional LSTM forgetting gate cannot update with the error between the actual value and the predicted value. The forget gate of traditional LSTM cannot take into account the adjustment effect of the deviation between the predicted value and the actual value at the next predicted moment.
In the ultra-short-term prediction of wind power, the deviation between the actual and the predicted value at the previous moment cannot only reflect the prediction ability of the model but also reflect the positive effect of historical information on the output. If the deviation is large, the influence of the previous moment should have little impact on the prediction moment. The deviation between the predicted value and the actual value at the previous moment is proposed in this paper to update the input for the forget gate. Therefore, the newly proposed forget gate is called error following the forget gate. The calculation formula of the new forget gate is as follows In comparison with the traditional LSTM network, the forget gate input is enhanced to take into account of the deviation between the predicted value and the actual value at the previous moment. The input gate and output gate remain the same as the ones in traditional LSTM. Figure 3 shows the new structure of the EFFG-based LSTM network at time t. The update formula of the input gate is as formula (4).
[ ] ( ) 1 , where S(•) is the sigmoid activation function; Wi is the weight matrix of the input gate, and bi is the offset term of the input gate.
The update formula of the current time memory is as formula (5) The update formula of the input gate is as Formula (4).
Energies 2020, 13, 5400 5 of 13 where S(·) is the sigmoid activation function; W i is the weight matrix of the input gate, and b i is the offset term of the input gate. The update formula of the current time memory is as Formula (5) where T(·) is the tanh activation function; W c is the weight matrix of the input gate, and b c is the offset term of the input gate. The update formula of the new cell state is as Formula (6).
The update formula of the output gate is as Formula (7).
where S(·) is the sigmoid activation function; W o is the weight matrix of the input gate, and b o is the offset term of the input gate.
The output of the EFFG-based LSTM network is shown in Formula (8).

Ultra-Short-Term Wind Power Prediction Model Based on EFFG-Based LSTM
The prediction model of wind power based on EFFG-based LSTM is shown in Figure 4. Assume a wind farm, according to Spearman correlation analysis, the RWP takes power at the first three moments of the predicted moment, and NWP takes the wind speed and direction at the prediction moment. The deviation between the predicted value and the actual value at the last moment is used as the input of EFFG-based LSTM model to update the forget gate.  The structure of EFFG-based LSTM is composed of one input layer, one hidden layer, and one output layer. The static network structure is shown in Figure 5. The structure of EFFG-based LSTM is composed of one input layer, one hidden layer, and one output layer. The static network structure is shown in Figure 5.  The structure of EFFG-based LSTM is composed of one input layer, one hidden layer, and one output layer. The static network structure is shown in Figure 5. Input layer: X is the historical wind power of the past time and the wind speed and direction at the predicted time. The input layer normalizes the input data according to Equation (9).
where xmax and xmin are the maximum and minimum values of the variable, respectively. Hidden layer: The hidden layer is the EFFG-based LSTM network. One hidden layer can ensure the faster prediction calculation speed and avoid overfitting phenomenon caused by too many hidden layers.
Output layer: f (.) is the activation function of the output layer. Y is the predicted value of wind power at the next time. The output layer weights and biases the output of the hidden layer, and outputs one-dimensional predicted wind power. The inverse normalization is calculated according to Equation (10). Finally, the predicted value of wind power is obtained. Input layer: X is the historical wind power of the past time and the wind speed and direction at the predicted time. The input layer normalizes the input data according to Equation (9).
where x max and x min are the maximum and minimum values of the variable, respectively. Hidden layer: The hidden layer is the EFFG-based LSTM network. One hidden layer can ensure the faster prediction calculation speed and avoid overfitting phenomenon caused by too many hidden layers.
Output layer: f (·) is the activation function of the output layer. Y is the predicted value of wind power at the next time. The output layer weights and biases the output of the hidden layer, and outputs one-dimensional predicted wind power. The inverse normalization is calculated according to Equation (10). Finally, the predicted value of wind power is obtained.

Data Description and Test Design
The data (NWP and historical real wind power) is from a practical wind farm in northwest China. The period is from January to December 2017. The sampling time interval of wind power is 15 min. the time interval of NWP data is 15 min. NWP data includes wind speed and direction of 170 m, 100 m, and 30 m, temperature, pressure, humidity. The NWP dada is every 15 min as a rapid refresh model. The model is a method of weather forecast based on the mathematical model of atmospheric movement and using the current weather conditions as input data. The forecast is computed for the next 15 min time step.
Firstly, the correlation coefficient between the wind power and NWP features, RWP historical moments is calculated by using the Spearman correlation coefficient, and then the input variables of the prediction model are selected.
It can be seen from Table 1 that the correlation coefficient average value of the wind speed and wind direction at 100 m is 0.57 and 0.52. The correlation coefficient average value of the wind speed at 30 m is 0.51. The two variables are strongly correlated when the correlation coefficient is greater than 0.5 [24]. When forecasting the wind power at time t, the wind speed and wind direction at 100 m, wind speed 30 m at time t are put into the newly constructed input time series. It can be seen from Table 2 that the correlation coefficient of the power at time t−1, t−2, and t−3 are greater than 0.5. Therefore, the current wind power at time t has a strong correlation with the power at time t−1, t−2, and t−3. When forecasting the wind power at time t, the wind power at the past three times is put into the newly constructed input time series. EFFG-based LSTM network parameter set timestep is set to 4, which is the three historical moments of RWP and at t time of NWP. The number of hidden layer neurons (EFFG-based LSTM cell) is not linearly related to the forecast accuracy (such as root mean square error (RMSE)). The number of neurons in the hidden layer should be determined according to the number of input features and the model training accuracy. The number of neurons in the hidden layer is set to 12 to achieve the best prediction accuracy. The EFFG-based LSTM gate activation function remains at the default value.
Finally, use the historical data of March, June, September, and December 2017 of the wind farm as the training data set of the model. Then, the four groups of prediction tests were conducted as follows: (1) Group 1: to predict the wind power within 24 h of February 22.

Forecast Error Computation
Because the actual wind power value has zero value, the mean absolute percentage error (MAPE) in the forecast effect evaluation index will be meaningless [25]. Therefore, root mean square error (RMSE), accuracy R1, and qualification R2, are used to evaluate the prediction results.
The calculation formula of RMSE is: where n is the number of forecast samples; P is the actual value of wind power and P is predicted value of wind power; i is the serial number of the actual value and the predicted value.
Energies 2020, 13, 5400 8 of 13 The calculation formula of the accuracy rate R 1 is: In the formula, P MK is the average value of the actual power in the K period, P PK is the average value of the predicted power in the K period, P cap is the starting capacity of the wind farm in the corresponding period, and N is the total number of predicted periods.
The calculation formula for the pass rate R 2 is: In the formula, 1 −

Comparison of Prediction Results
To verify that EFFG-based LSTM model has higher prediction accuracy, SVM and standard LSTM wind power prediction model are used for prediction and comparative analysis. Therefore, the same input data for all three methods for the prediction test. The predicted wind power and prediction error curve of Group 1 as shown in Figures 6 and 7. The predicted wind power and prediction error curve of Group 2 as shown in Figures 8 and 9. The predicted wind power and prediction error curve of Group 3 as shown in Figures 10 and 11. The predicted wind power and prediction error curve of Group 4 as shown in Figures 12 and 13. The time in the x-axis is the local time in all figure.
Energies 2020, 13, x FOR PEER REVIEW 9 of 14 prediction error curve of Group 2 as shown in Figures 8 and 9. The predicted wind power and prediction error curve of Group 3 as shown in Figures 10 and 11. The predicted wind power and prediction error curve of Group 4 as shown in Figures 12 and 13. The time in the x-axis is the local time in all figure.                   It can be seen from Figures 6-13 that the SVM prediction model is the worst of the three models. The reason is that SVM is not a prediction model suitable for time series and cannot process time series information. LSTM model and EFFG-based LSTM model not only use deep learning technology to optimize network parameters, but also deal with the correlation information and time correlation between wind power time series as a time series model. When the wind power fluctuates, the accuracy of EFFG-based LSTM model is better than traditional LSTM model. The reason is that the traditional LSTM forget gate is updated by the last time output and input data, which cannot reflect the influence of the error between the predicted value and the actual power value on the forget gate. The EFFG-based LSTM forget gate is updated by the error between the predicted value and the actual power value. When the error is large, the forgetting coefficient of the model is large. Then the historical output value will be forgotten more, and the effect of historical value on the model will be smaller. When the wind power suddenly changes, there is no correlation between the next moment's wind power and the previous moment's power, it is necessary to reduce the role of historical value. Therefore, EFFG-based LSTM prediction model has the highest prediction accuracy.
It is difficult to predict the wind power slope, this paper uses the historical wind power slope data to train the prediction model. However, the following Figure 14 shows the up ramp period of wind power in Figure 8 of our paper. It can be seen from the figure that there is a prediction delay and the prediction error is relatively large. The maximum prediction error of the better model (EFFG-based LSTM) is over 10 MW. The same problem appears in Figure 10 of the up ramp period. This is one of the tasks we need to further study. It can be seen from Figures 6-13 that the SVM prediction model is the worst of the three models. The reason is that SVM is not a prediction model suitable for time series and cannot process time series information. LSTM model and EFFG-based LSTM model not only use deep learning technology to optimize network parameters, but also deal with the correlation information and time correlation between wind power time series as a time series model. When the wind power fluctuates, the accuracy of EFFG-based LSTM model is better than traditional LSTM model. The reason is that the traditional LSTM forget gate is updated by the last time output and input data, which cannot reflect the influence of the error between the predicted value and the actual power value on the forget gate. The EFFG-based LSTM forget gate is updated by the error between the predicted value and the actual power value. When the error is large, the forgetting coefficient of the model is large. Then the historical output value will be forgotten more, and the effect of historical value on the model will be smaller. When the wind power suddenly changes, there is no correlation between the next moment's wind power and the previous moment's power, it is necessary to reduce the role of historical value. Therefore, EFFG-based LSTM prediction model has the highest prediction accuracy.
It is difficult to predict the wind power slope, this paper uses the historical wind power slope data to train the prediction model. However, the following Figure 14 shows the up ramp period of wind power in Figure 8 of our paper. It can be seen from the figure that there is a prediction delay and the prediction error is relatively large. The maximum prediction error of the better model (EFFG-based LSTM) is over 10 MW. The same problem appears in Figure 10 of the up ramp period. This is one of the tasks we need to further study.
It is difficult to predict the wind power slope, this paper uses the historical wind power slope data to train the prediction model. However, the following Figure 14 shows the up ramp period of wind power in Figure 8 of our paper. It can be seen from the figure that there is a prediction delay and the prediction error is relatively large. The maximum prediction error of the better model (EFFG-based LSTM) is over 10 MW. The same problem appears in Figure 10 of the up ramp period. This is one of the tasks we need to further study.   It can be seen from Tables 3-6 that the RMSE, R1, and R2 of the EFFG-based LSTM prediction model are optimal compared with the SVM and LSTM prediction models. The RMSE of the EFFG-based LSTM model changed small from Group 1 to Group 4 prediction, and the R1 and R2 were above 90% in 24 h prediction. The accuracy and qualified rate of SVM and LSTM models decreased from Group 1 Energies 2020, 13, 5400 12 of 13 to Group 4 prediction. SVM and traditional LSTM model can't deal with the period when wind power changes dramatically. This problem is solved by an improved forgetting gate, so the performance of the EFFG-based LSTM prediction model is the best.

Conclusions
This paper proposed the EFFG-based LSTM model for an ultra-short-term wind power forecasting method. The experimental outcomes are as follows: (1) Spearman correlation coefficient method can better find the relationship between predictive factors affecting wind power because the probabilistic distribution of NWP data and wind power data are not known distributions. (2) The input data is the integrated data of NWP and historical wind power. Compared with the monotype input data, it can better reflect the support effect of the wind power prediction. (3) The proposed method can realize an ultra-short-term wind power prediction, considering the influence of the error between the predicted value and the actual value on the prediction model. The forecast accuracy of the EFFG-based LSTM model is better than SVM and the traditional LSTM model. Author Contributions: Conceptualization P.Z. and C.P.; software, C.L. and R.Y.; validation, C.L. and P.Z.; formal analysis, C.L. and R.Y.; writing-original draft preparation, J.T. and C.L.; writing-review and editing, P.Z. and C.P.; visualization, J.T. and C.L.; supervision, P.Z. and M.S. All authors have read and agreed to the published version of the manuscript.