Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam

Huang, Jingtao; Niu, Gang; Guan, Haiping; Song, Shuzhong

doi:10.3390/en16093789

Open AccessArticle

Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam

by

Jingtao Huang

^1,*

,

Gang Niu

¹,

Haiping Guan

² and

Shuzhong Song

¹

College of Information Engineering, Henan University of Science and Technology, Luoyang 471023, China

²

Tongliao SPIC Power Generation Corporation Limited, Tongliao 028001, China

^*

Author to whom correspondence should be addressed.

Energies 2023, 16(9), 3789; https://doi.org/10.3390/en16093789

Submission received: 30 March 2023 / Revised: 24 April 2023 / Accepted: 27 April 2023 / Published: 28 April 2023

(This article belongs to the Section A3: Wind, Wave and Tidal Energy)

Download

Browse Figures

Versions Notes

Abstract

With the rapid increase in wind power, its strong randomness has brought great challenges to power system operation. Accurate and timely ultra-short-term wind power prediction is essential for the stable operation of power systems. In this paper, an LsAdam–LSTM model is proposed for ultra-short-term wind power prediction, which is obtained by accelerating the long short-term memory (LSTM) network using an improved Adam optimizer with loss shrinkage (LsAdam). For a specific network topology, training progress heavily depends on the learning rate. To make the training loss of LSTM shrink faster with standard Adam, we use the past training loss-changing information to finely tune the next learning rate. Therefore, we design a gain coefficient according to the loss change to adjust the global learning rate in every epoch. In this way, the loss change in the training process can be incorporated into the learning progress and a closed-loop adaptive learning rate tuning mechanism can be constructed. Drastic changes in network parameters will deteriorate learning progress and even make the model non-converging, so the gain coefficient is designed based on the arctangent function with self-limiting properties. Because the learning rate is iteratively tuned with past loss-changing information, the trained model will have better performance. The test results on a wind turbine show that the LsAdam–LSTM model can obtain higher prediction accuracy with much fewer training epochs compared with Adam–LSTM, and the prediction accuracy has significant improvements compared with BP and SVR models.

Keywords:

wind power; ultra-short-term prediction; loss shrinkage; adaptive learning rate; LSTM

1. Introduction

In recent years, wind energy has gained significant attention as an environmentally friendly, clean renewable energy. Wind power generation has the potential to alleviate the shortage of conventional energy and mitigate increasing environmental pollution [1,2]. However, due to the strong volatility and randomness of wind power, its integration into the grid on a large scale can adversely impact the stability of the power system [3]. Accurate prediction of wind power can aid power companies, power plants and grid management departments in planning and dispatching wind power generation systems, improving the reliability and stability of power systems and reducing dependence on traditional energy sources [4,5].

Wind power prediction methods mainly include physical methods, statistical methods and combination methods [6]. The physical methods need to establish a model based on the complex physical relationship between various physical quantities such as meteorological information and geomorphological information, which has a high computational cost of prediction and is not suitable for ultra-short-term prediction [7]. Statistical methods establish a model that can express the mapping relationship between input and output by mining the inherent laws between a large number of historical data [8]. Commonly used statistical methods include time series analysis [9,10,11], the BP neural network (BPNN) [12,13], support vector regression (SVR) [14,15,16,17] and the deep learning network [18,19]. Wind power has the characteristics of a time series and has strong non-linearity. Models such as the time series analysis method, BPNN and SVR have difficulty in extracting the deeper features in a wind power series and cannot cope with their complex change trends, so it is difficult to predict wind power accurately using these methods. In contrast, deep learning models have stronger non-linear mapping abilities, self-learning abilities and feature extraction abilities [20]. In particular, the long short-term memory (LSTM) network is suitable for wind power prediction due to its unique network structure [21]. Several prediction models of wind power based on the LSTM have been proposed in [22,23,24,25,26]. These models use the LSTM network to learn the time series characteristics in wind power data and have higher prediction accuracy than linear models, traditional machine learning models and artificial neural networks.

The performance of deep learning networks heavily depends on the learning progress, so the optimization algorithm of the network training is crucial [27]. For the stochastic gradient descent method, the learning rate in the next round of training is adjusted according to the loss change in the training process and the current learning rate, which can improve the training speed and accuracy of the model [28]. However, this method could not adaptively update every parameter in the network, and the prediction accuracy of the model was also limited. The adaptive moment estimation (Adam) algorithm is improved from the root mean square propagation (RMSProp) algorithm [29]. It can adaptively update each parameter in deep learning networks and is also one of the commonly used optimization algorithms in deep learning networks. However, the global learning rate of Adam is a fixed value, and to avoid network non-converging caused by a learning rate that is too large in the later stage, the value of the global learning rate is often very small; therefore, the optimization effect of the Adam algorithm on the network is greatly limited.

To further improve the prediction accuracy and modeling efficiency of the model, we propose a loss shrinkage Adam (LsAdam) optimization algorithm by adjusting the global learning rate of the Adam according to the loss change during the model training process and construct an LsAdam–LSTM wind power prediction model.

2. Principle of LSTM Ultra-Short-Term Wind Power Prediction

2.1. LSTM Model Structure

The recurrent neural network (RNN) is suitable for processing time series problems with its special recurrent structure, while the LSTM network is a deep learning network improved on the basis of the RNN. LSTM is different from the RNN because it uses gating structures, known as “forget gate”, “input gate” and “output gate”, to control information flow [30]. The basic structure of the LSTM cell is shown in Figure 1.

In Figure 1, f_t is the “forget gate”, i_t is the “input gate”, o_t is the “output gate” and

{\hat{c}}_{t}

is the temporary state. The “forget gate” can determine how much historical information is abandoned. The “input gate” determines what is retained from the input. The role of the “output gate” is to control the information that the current unit outputs to the next unit. Current information becomes the temporary state of the unit when it enters the current unit; then, the historical information and the current temporary state are combined to obtain the current state of the unit, and, finally, the output of the current unit is obtained by passing through the output gate. These gated structures can effectively solve gradient explosion and gradient disappearance problems from the RNN, which means that the LSTM network can better learn the long-term dependence between wind power data. The calculation method of the LSTM cell is shown in Algorithm 1.

Algorithm 1 LSTM algorithm

Require : Random initialization U_{f}

, U_{i}

, U_{o}

, U_{c}

, b_{f}

, b_{i}

, b_{o}

and b_{c}

, h_{0} = 0

, c_{0} = 0

For t \leftarrow 1

to

n

do

Calculate “forget gate”

f_{t} = σ (h_{t - 1} \cdot W_{f} + x_{t} \cdot U_{f} + b_{f})

Calculate “input gate”

i = σ (h_{t - 1} \cdot W_{i} + x_{t} \cdot U_{i} + b_{i})

Calculate “output gate”

o_{t} = σ (h_{t - 1} \cdot W_{o} + x_{t} \cdot U_{o} + b_{o})

Calculate temporary state

{\hat{c}}_{t} = \tanh (h_{t - 1} \cdot W_{c} + x_{t} \cdot U_{c} + b_{c})

Calculate current state

c_{t} = f_{t} \circ c_{t - 1} + i_{t} \circ {\hat{c}}_{t}

Calculate output

h_{t} = o_{t} \circ \tanh (c_{t})

end

return h_{1}, h_{2}, \dots, h_{n}

where

c_{t - 1}

is the state of the cell at the previous time,

h_{t - 1}

is the output of the cell at the previous time,

x_{t}

is the input of the cell at the current time,

W_{f}

,

W_{i}

,

W_{o}

and

W_{c}

are the weights of

h_{t - 1}

,

U_{f}

,

U_{i}

,

U_{o}

and

U_{c}

are the weights of

x_{t}

,

b_{f}

,

b_{i}

,

b_{o}

and

b_{c}

are the biases,

n

is the length of the time series,

σ

is the Sigmoid function,

f_{t}

is the output of the forget gate,

i_{t}

is the output of the input gate,

o_{t}

is the output of the output gate,

{\hat{c}}_{t}

is the temporary state in the cell,

c_{t}

is the current state in the cell, and

h_{t}

is the output of the cell at the current time.

2.2. Learning Rate Update Strategy Based on Adam

As one of the commonly used optimization algorithms, Adam can adaptively update each parameter in the deep learning network. The optimization process of Adam is shown in Algorithm 2.

In Algorithm 2,

t

is the counter of iteration number,

β_{1}

and

β_{2}

are the exponential decay rate of moment estimation, and the range of

β_{1}

and

β_{2}

is [0, 1); usually,

β_{1} = 0.9

,

β_{2} = 0.999

.

μ

is a tiny number to prevent the denominator from being 0; usually,

μ = 1 \times 10^{- 8}

.

θ_{0}

is the initial parameter vector,

g_{t}

is the current gradient of the parameter,

m_{t}

is the biased first moment estimate,

v_{t}

is the biased second moment estimate,

{\hat{m}}_{t}

is the corrected first moment estimate,

{\hat{v}}_{t}

is the corrected second moment estimate,

l r

is the global learning rate,

Δ θ

is the update of the parameter, and

θ_{t}

is the updated parameter

Algorithm 2 Adam optimization algorithm

Require:

β_{1}, β_{2} \in [0, 1)

, μ = 1 \times 10^{- 8}

,

m_{0} = 0

, v_{0} = 0

, t = 0

, θ_{0}

: initial parameter ve

ctor, l r

: global learning rate

while θ_{t}

not converged

t \leftarrow t + 1

Calculate the gradient at step t

g_{t} = \nabla_{θ} f_{t} (θ_{t - 1})

Update biased first moment estimate and biased second moment estimate

m_{t} = β_{1} \cdot m_{t - 1} + (1 - β_{1}) \cdot g_{t}

v_{t} = β_{2} \cdot v_{t - 1} + (1 - β_{2}) \cdot (g_{t} ⊙ g_{t})

Update unbiased first moment estimates and unbiased second moment estimates

{\hat{m}}_{t} = m_{t} / (1 - β_{1}^{t})

{\hat{v}}_{t} = v_{t} / (1 - β_{2}^{t})

Update change of the parameters

Δ θ = l r \cdot {\hat{m}}_{t} / (μ + \sqrt{{\hat{v}}_{t}})

Updated parameters

θ_{t} = θ_{t - 1} - Δ θ

end while

return θ_{t}

When the Adam algorithm optimizes the LSTM network, although each parameter in the network can be adaptively updated, its global learning rate is fixed. In addition, to ensure that the LSTM network can converge in the later training stage, the global learning rate can only be set to a small value, and the convergence speed and prediction accuracy of the network will be limited.

A lot of information will be fed back in the process of network training, and some of the information can reflect the pros and cons of the network learning process. We can use this feedback information during the training process to adjust the global learning rate and further optimize the training effect of the network.

3. LsAdam–LSTM Wind Power Prediction Model

Loss change is very important in the training process because the direction and magnitude of loss change can reflect the quality of the network learning process. Therefore, to improve the training speed of the wind power prediction model based on LSTM, further improve the prediction accuracy and overcome the limitations of the fixed global learning rate, it is necessary to adaptively adjust the global learning rate through loss change during network training. If the loss decreases after each training epoch, it is reasonable to think that parameter adjustment can gradually make the network converge, and the global learning rate can be appropriately increased to accelerate the training process. In contrast, if the loss increases after each training epoch, it means that the current parameter adjustment is not conducive to the training process, and the global learning rate can be appropriately reduced to stabilize the training process. To ensure faster loss shrinkage after each epoch of training, a gain coefficient based on loss change is used to dynamically adjust the global learning rate, as shown in Equation (1).

Δ θ = l r_{T} \cdot {\hat{m}}_{t} / (μ + \sqrt{{\hat{v}}_{t}})

(1)

where

l r_{T}

is the global learning rate at the

T_{t h}

training epoch,

l r_{T} = G_{T} \cdot l r_{T - 1}

, and

G_{T}

is the gain coefficient at the

T_{t h}

training epoch.

Unlike the Adam algorithm in which the global learning rate is a fixed value, we consider the loss change during training to adjust the global learning rate in real time and construct the LsAdam algorithm based on Equation (1) to determine the global learning rate at each training epoch. Based on the Adam algorithm, the proposed algorithm adaptively adjusted the global learning rate through the gain coefficient with the information of loss change, which could effectively alleviate the limitations of the model caused by the small fixed value of the global learning rate during the training process.

To effectively determine the gain coefficient, the loss change before and after each epoch of training needs to be evaluated. To better reflect the magnitude of the loss change during training, we use the relative loss change to measure its change, as shown in Equation (2).

Δ l_{T} = (l_{T - 1} - l_{T}) / l_{T - 1}

(2)

where

Δ l_{T}

is the relative change of the loss after the

T_{t h}

training epoch compared with the previous epoch,

l_{T - 1}

is the loss value of the network after the

{(T - 1)}_{t h}

training epoch, and

l_{T}

is the loss value of the network after the

T_{t h}

training epoch.

To prevent the learning rate from being too large or too small due to the introduction of the gain coefficient, this coefficient should be negatively correlated with the current global learning rate. To reduce the influence of random factors such as data noise, an insensitive range

[- ε, ε]

is set, i.e., the global learning rate is kept constant when

Δ l_{T} \in [- ε, ε]

, i.e., the gain coefficient is 1.

Therefore, the rules for the value of the gain coefficient must satisfy the following conditions:

$G_{T + 1}$ should be bounded and positive.
When $- ε \leq Δ l_{T} \leq ε$ , $G_{T + 1} = 1$ .
When $Δ l_{T} > ε$ , $G_{T + 1} > 1$ and $G_{T + 1}$ is positively correlated with $Δ l_{T}$ .
When $Δ l_{T} < - ε$ , $G_{T + 1} < 1$ and $G_{T + 1}$ is negatively correlated with $| Δ l_{T} |$ .
When $Δ l_{T} < - ε$ or $Δ l_{T} > ε$ , $G_{T + 1}$ is negatively correlated with $[- ε, ε]$ .

To this end, the gain coefficient is determined based on the basis of the amount of loss change and the current global learning rate. The inverse tangent function is a monotonically increasing function with limiting characteristics, and the value domain of the positive semi-axis of the independent variable is

(0, π / 2)

. Therefore, the arctangent function is used as the basis for constructing the value rules for the gain coefficients. We construct the gain coefficients as shown in Equation (3); then, the update rule of the global learning rate is shown in Equation (4).

G_{T + 1} = {\begin{cases} 1 + \arctan (k_{2} \sqrt{Δ l_{T}} + l r_{1} / l r_{T}) / k_{1}, Δ l_{T} > ε \\ 1 - \arctan (k_{2} \sqrt{| Δ l_{T} |} + l r_{T} / l r_{1}) / k_{1}, Δ l_{T} < - ε \\ 1, - ε \leq Δ l_{T} \leq ε \end{cases}

(3)

l r_{T + 1} = {\begin{cases} l r_{T} (1 + \arctan (k_{2} \sqrt{Δ l_{T}} + l r_{1} / l r_{T}) / k_{1}), Δ l_{T} > ε \\ l r_{T} (1 - \arctan (k_{2} \sqrt{| Δ l_{T} |} + l r_{T} / l r_{1}) / k_{1}), Δ l_{T} < - ε \\ l r_{T}, - ε \leq Δ l_{T} \leq ε \end{cases}

(4)

where

l r_{1}

is the initial global learning rate, which is generally taken as the global learning rate commonly used in the Adam optimization algorithm, such as

l r_{1} = 0.01

;

ε

is the gain coefficient insensitivity threshold, which is taken as

ε = 0.001

in this paper.

k_{1}

is a constant, which plays the role of controlling the range of values of

G_{T + 1}

and then controlling the change in the global learning rate. To satisfy

G_{T + 1} > 0

, it is necessary to make

k_{1} > π / 2

. If

k_{1}

is smaller, the change in the global learning rate is larger, and the algorithm has a more obvious optimization effect on the model; if

k_{1}

is larger, the change in the global learning rate is smaller, and the risk of model non-convergence is lower. So,

k_{1}

is taken as

k_{1} = 5 π

in this paper. To effectively prevent excessive effects on the gain coefficient due to a wide range of

| Δ l_{T} |

values,

\sqrt{| Δ l_{T} |}

is chosen.

k_{2}

is a constant, which is used to regulate the sensitivity of the gain coefficient to

\sqrt{| Δ l_{T} |}

, and

k_{2} > 0

. A smaller

k_{2}

will make it easier for the global learning rate of the model to reach a smaller value in the later stages of training, which is beneficial to the convergence of the model; a larger

k_{2}

allows the global learning rate to change more significantly according to

\sqrt{| Δ l_{T} |}

, which makes the algorithm have a more obvious acceleration effect. So,

k_{2}

is taken as

k_{2} = 10

in this paper.

In addition, when

l r_{T} > l r_{1}

,

l r_{T}

will have a suppressive effect on the increase in the gain coefficient and a facilitating effect on the decrease in the gain coefficient; when

l r_{T} < l r_{1}

,

l r_{T}

will have a facilitating effect on the increase in the gain coefficient and a suppressive effect on the decrease in the gain coefficient. As the network model converges, the magnitude of loss change will tend to decrease and the degree of determination of

l r_{T}

on

l r_{T + 1}

will gradually become larger, which makes the global learning rate eventually close to

l r_{1}

and is conducive to model convergence.

In the LsAdam algorithm, the global learning rate is kept constant when the loss change is within the threshold range, and when the loss change is beyond the threshold range, the loss change and the current global learning rate will jointly determine the value of the global learning rate in the next training epoch. The process of LsAdam–LSTM is shown in Algorithm 3.

Algorithm 3 LsAdam–LSTM algorithm

Require:

β_{1}, β_{2} \in [0, 1), μ = 1 \times 10^{- 8}, k_{1} > π / 2, k_{2} > 0, ε = 0.001, m_{0} = 0, v_{0} = 0, l r_{1}

: initial global learning rate,

θ_{0}

: initial parameter vector,

T_{\max} : \max epochs,

l o s s_{a i m}

: accuracy requirement,

I

: number of iterations per epoch

T \leftarrow 0

, t \leftarrow 0

LSTM model initialization

Calculate model loss L_{T}

while T < T_{\max}

or L_{T} > l o s s_{a i m}

T \leftarrow T + 1

i \leftarrow 0

while i < I

i \leftarrow i + 1

t \leftarrow t + 1

Calculate the gradient at step t

g_{t} = \nabla_{θ} f_{t} (θ_{t - 1})

Update biased first moment estimate

m_{t} = β_{1} \cdot m_{t - 1} + (1 - β_{1}) \cdot g_{t}

Update biased second moment estimate

v_{t} = β_{2} \cdot v_{t - 1} + (1 - β_{2}) \cdot (g_{t} ⊙ g_{t})

Update unbiased first moment estimates

{\hat{m}}_{t} = m_{t} / (1 - β_{1}^{t})

Update unbiased second moment estimates

{\hat{v}}_{t} = v_{t} / (1 - β_{2}^{t})

Update change of the LSTM model parameters

Δ θ = l r_{T} \cdot {\hat{m}}_{t} / (μ + \sqrt{{\hat{v}}_{t}})

Updated LSTM model parameters

θ_{t} = θ_{t - 1} - Δ θ

end while

Calculate model loss L_{T}

Calculate Δ L_{T}

by Equation (2)

Calculate l r_{T + 1}

by Equation (4)

end while

end

The overall change trend in the global learning rate of LsAdam–LSTM has the following characteristics during the training process. In the early stage of training, the general loss will decrease rapidly, and the global learning rate will increase with the decrease in loss, which can improve the convergence speed of the model. In the middle and late stages of training, the reduction speed of loss becomes slower, and the global learning rate gradually decreases. In the later stage of training, the global learning rate will gradually stabilize near the initial global learning rate, which is conducive to the convergence of the model. Based on this characteristic, the proposed model has higher prediction accuracy and a faster convergence speed.

4. Experimental Verification

4.1. Data Preparation

In this work, data from the SCADA systems are used to test the proposed LsAdam–LSTM, including wind speed, wind direction, generated power, etc. The data were sampled with a 10 min interval from the SCADA system of a wind turbine that works and generates power in Turkey. We only used a past generated power series to predict the next power output (10 min in the future). In addition, one month of observations from April 2018 were used to construct the sample set, with a total of 4320 pieces of data as shown in Figure 2. When the wind speed is between the cut-in wind speed and the rated wind speed, the output power of the wind turbine is higher with a higher wind speed. When the wind speed is between the rated wind speed and the cut-out wind speed, the wind power is approximately the rated power. However, when the wind speed is lower than the cut-in wind speed or higher than the cut-out wind speed, the wind power is zero. Wind speed fluctuates over a wide range, so we can see that wind power has strong randomness and intermittency.

To evaluate the prediction performance more objectively, the max–min normalization method was used to scale the data to the interval [0, 1], as shown in Equation (5).

x^{'} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(5)

where

x

is the original data,

x_{\min}

is the minimum value in the original data,

x_{\max}

is the maximum value in the original data, and

x^{'}

is the scaled data.

For ultra-short-term power prediction, we used the actual power generated from the past 100 min to predict the power output of the next 10 min. To implement the prediction task with deep learning methods, the samples were constructed using the window receding approach. The width of the time window was chosen to be 10 intervals, so every sample included 11 pieces of data. As shown in Table 1, the 1st to 10th data of the sample denote the past actual power, and the last piece of data is used as the future power generated. For one month of power generated records, a sample set with 4310 samples was obtained. To evaluate the generalization performances of the prediction models, the first 80% of the samples (a total of 3448 samples) were used as the training set, and the remaining 20% (a total of 862 samples) were used as the test set.

4.2. Evaluation Index of Ultra-Short-Term Wind Power Prediction Model

According to the Chinese national standard “Technical requirements for dispatching side forecasting system of wind or photovoltaic power” (GB/T 40607-2021), the mean square error (MSE), mean absolute error (MAE) and correlation coefficient (R) of the predicted power and actual power were used as evaluation indexes. Among them, MSE and MAE can reflect the difference between the predicted power and the actual power, and R can reflect the closeness of the correlation between the predicted power and the actual power. These three evaluation indexes were used to measure prediction performance, and the calculation of each index is shown in Equations (6)–(8).

M S E = \frac{1}{n} \sum_{k = 1}^{n} {(P_{M, k} - P_{P, k})}^{2}

(6)

M A E = \frac{1}{n} \sum_{k = 1}^{n} | P_{M, k} - P_{P, k} |

(7)

R = \frac{\sum_{k = 1}^{n} [(P_{M, k} - {\bar{P}}_{M}) (P_{P, k} - {\bar{P}}_{P})]}{\sqrt{\sum_{k = 1}^{n} {(P_{M, k} - {\bar{P}}_{M})}^{2} \sum_{k = 1}^{n} {(P_{P, k} - {\bar{P}}_{P})}^{2}}}

(8)

where n is the number of samples,

P_{M, k}

is the

k_{t h}

true power value,

P_{P, k}

is the

k_{t h}

predicted power value,

{\bar{P}}_{M}

is the average value of the true power, and

{\bar{P}}_{P}

is the average value of the predicted power value.

4.3. Experiment and Discussion

To verify the ultra-short-term wind power prediction performances, several models were trained and tested on the same wind turbine, including traditional shallow network BP, classical machine learning method SVR, Adam–LSTM and the proposed LsAdam–LSTM.

For the deep learning method LSTM, the optimization algorithm used in the training had a heavy effect on the final model. To evaluate the improved LsAdam–LSTM algorithm, the experiments were carried out using LsAdam and standard Adam as the optimization methods for LSTM training, respectively; then, the ultra-short-term wind power prediction models were obtained; namely, LsAdam–LSTM and Adam–LSTM. To further evaluate the prediction accuracy of the proposed method, comparison experiments were implemented with the BP neural network and SVR algorithm to build the ultra-short-term wind power prediction models.

The parameters of all the algorithms used in the experiments were fine-tuned. The main parameters were described as follows. The constructed LSTM network model contained two hidden layers, and each hidden layer contained 64 cells. The activation function was the Tanhyperbolic function, and the parameters in the LsAdam optimization algorithm were chosen as

k_{1} = 5 π

,

k_{2} = 10

and

ε = 0.001

. When training the LsAdam–LSTM model and the Adam–LSTM model, the maximum epoch of training was set to 500, the initial global learning rate was set to 0.01, and full batch training was used. MSE was used as the error (loss) in the back-propagation process. To further check the adaptability and stability of the LsAdam–LSTM model, the LsAdam–LSTM model and Adam–LSTM model were tested under several commonly used global learning rates of 0.005, 0.02 and 0.03. The structure of the BP neural network model was 10-32-1, the learning rate was 0.1, the training epoch was 300, and the activation function was the Sigmoid function. The SVR model used the radial basis function as the kernel function and set the error penalty factor to 1.

The experiments were implemented with initial global learning rates of 0.005, 0.01, 0.02 and 0.03. The loss-changing trends were similar. As shown in Figure 3, the training progress of LsAdam–LSTM was significantly superior to that of Adam–LSTM, where the initial global learning rate was 0.01. Although the loss change of both methods can decrease with the training epoch globally, the loss of LsAdam–LSTM decreased more drastically in the early stage of training and reached a lower final value. Section (a) in Figure 3 shows the detail of the loss changing in the early training stage. The training process of LsAdam–LSTM was more stable and the decline in loss was faster. In contrast, the Adam–LSTM had several large fluctuations in the early stages of training, and the loss after the 4th training epoch was even greater than the loss at the initial time. Section (b) in Figure 3 shows the detail of the loss changing in the middle and late training stages. The loss of LsAdam–LSTM can converge to lower values more quickly, and the training process loss was lower overall than that of the Adam–LSTM. In addition, the Adam–LSTM underwent relatively drastic fluctuations after about the 270th epoch of training, which had a bad impact on the convergence of the Adam–LSTM.

The global learning rate change during the training of the two LSTM models is shown in Figure 4. The global learning of the LsAdam–LSTM increased first and then decreased and, eventually, settled around the initial value, while the global learning rate of the Adam–LSTM was a fixed value. Compared with Adam–LSTM, LsAdam–LSTM introduced gain coefficients with loss change information to regulate the global learning rate during the training process. When the loss decreases, the global learning rate should be appropriately increased, which, in turn, increases the convergence speed of the model, so the loss of LsAdam–LSTM decreased faster. The opposite occurs when the loss increases. The global learning rate should be appropriately reduced, which, in turn, prevents larger oscillations during the convergence of the model; thus, the LsAdam–LSTM had a much smoother loss descent process. In this way, the LsAdam–LSTM model formed a closed-loop feedback mechanism between the training loss and the learning rate and then accelerated the shrinkage of the model’s loss.

The performance metrics of the LsAdam–LSTM model and the Adam–LSTM model with the commonly used initial global learning rates are shown in Table 2. For every initial global learning rate, the prediction accuracy of the LsAdam–LSTM model was improved compared to the Adam–LSTM model, and the number of training epochs could be reduced by at least 68. It is shown that the LsAdam–LSTM model could converge faster while maintaining higher prediction accuracy for all the initial global learning rates. Moreover, the variation of the prediction results of the LsAdam–LSTM model on the test set was smoother under different initial global learning rates. This indicates that the proposed model is less sensitive to the initial global learning rate and has strong adaptability and stability. Meanwhile, the results on the test and training sets show that the proposed model has good generalization ability.

To evaluate the prediction accuracy, we obtained the ultra-short-term wind power prediction models using BP, SVR, Adam–LSTM and LsAdam–LSTM, respectively. The prediction results and local details of the four models on the test set are shown in Figure 5, where the prediction results of both LSTM models are those at an initial global learning rate of 0.01. It can be seen that the prediction results of BP deviate significantly from the true value in almost the entire power range, and the prediction error of SVR is largest near both ends of the power range, while the prediction results of the Adam–LSTM model are improved significantly. The results are improved further with LsAdam–LSTM.

Table 3 shows the prediction performance of all the above models. All of the parameters are the same as those in Figure 5. As for the MSE between the predicted values and true values, the prediction results of LsAdam–LSTM in the training set improved by 62.79%, 48.77% and 4.08% over BP, SVR and Adam–LSTM, respectively. The prediction results in the test set improved by 66.97%, 51.22% and 3.33%, respectively. The prediction accuracy of the LsAdam–LSTM also improved in terms of other evaluation metrics. It can be seen that the prediction results of the LsAdam–LSTM model outperformed the other models on both the training and test sets, which means that the model has higher prediction accuracy and a better generalization ability.

In summary, the proposed LsAdam–LSTM can predict ultra-short-term wind power more accurately and efficiently. By integrating the loss shrinkage information, the training of LSTM can actually be accelerated with better generalization performance. Compared with Adam, the global learning rate of every epoch can be tuned adaptively, which allows the learning progress to improve continuously. Compared with the traditional machine learning methods BP and SVR, the LsAdam–LSTM model has the nature of deep learning architecture, a stronger learning ability, and results that are significantly improved.

5. Conclusions

Ultra-short-term wind power prediction can predict power generation for a future period, which is conducive to the formulation of more reasonable power dispatching plans and makes the supply and demand balance of the power system more stable. Traditional machine learning methods, such as SVR and the BPNN, have difficulty in learning the deeper features of wind power sequences. In contrast, LSTM can extract time series features in a wind power series and has great potential advantages in wind power prediction.

To improve the training efficiency and prediction accuracy of the ultra-short-term wind power prediction model, a novel method named LsAdam–LSTM is proposed. By introducing a gain coefficient with loss change information during the training process, the global learning rate can be continuously tuned adaptively in every epoch according to the loss change. By taking into account the non-linear relationship between the loss change and the learning rate, the learning progress can be improved by the novel learning rate updating strategy. Thus, LsAdam–LSTM can alleviate the limitation caused by the small fixed global learning rate and effectively improve convergence speed and prediction accuracy. The experimental results on a wind turbine showed that the prediction accuracy of the LsAdam–LSTM is higher than that of Adam–LSTM, BP and SVR. In addition, fewer training epochs means that the proposed method has better efficiency, which provides an efficient method for the ultra-short-term prediction of wind power.

Author Contributions

Conceptualization, J.H.; formal analysis, G.N.; methodology, J.H.; project administration, S.S.; resources, H.G.; software, G.N.; supervision, S.S.; validation, H.G.; writing—original draft, G.N.; writing—review and editing, J.H. and G.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Science Foundation of China, grant number U1504617.

Data Availability Statement

The data used in this work are from the publicly archived datasets on Kaggle and are available at https://www.kaggle.com/berkerisen/wind-turbine-scada-dataset, accessed on 3 March 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Veers, P.; Dykes, K.; Lantz, E.; Barth, S.; Bottasso, C.L.; Carlson, O.; Wiser, R. Grand challenges in the science of wind energy. Science 2019, 366, eaau2027. [Google Scholar] [CrossRef] [PubMed]
Kumar, Y.; Ringenberg, J.; Depuru, S.S.; Devabhaktuni, V.K.; Lee, J.W.; Nikolaidis, E.; Afjeh, A. Wind energy: Trends and enabling technologies. Renew. Sustain. Energy Rev. 2016, 53, 209–224. [Google Scholar] [CrossRef]
Tawn, R.; Browell, J. A review of very short-term wind and solar power forecasting. Renew. Sustain. Energy Rev. 2022, 153, 111758. [Google Scholar] [CrossRef]
Bazionis, I.K.; Karafotis, P.A.; Georgilakis, P.S. A review of short-term wind power probabilistic forecasting and a taxonomy focused on input data. IET Renew. Power Gener. 2022, 16, 77–91. [Google Scholar] [CrossRef]
Chen, Y.; Hu, X.; Zhang, L. A review of ultra-short-term forecasting of wind power based on data decomposition-forecasting technology combination model. Energy Rep. 2022, 8, 14200–14219. [Google Scholar] [CrossRef]
Wang, H.; Lei, Z.; Zhang, X.; Zhou, B.; Peng, J. A review of deep learning for renewable energy forecasting. Energy Convers. Manag. 2019, 198, 111799. [Google Scholar] [CrossRef]
He, B.; Ye, L.; Pei, M.; Lu, P.; Dai, B.; Li, Z.; Wang, K. A combined model for short-term wind power forecasting based on the analysis of numerical weather prediction data. Energy Rep. 2022, 8, 929–939. [Google Scholar] [CrossRef]
Hanifi, S.; Liu, X.; Lin, Z.; Lotfian, S. A Critical Review of Wind Power Forecasting Methods-Past, Present and Future. Energies 2020, 13, 3764. [Google Scholar] [CrossRef]
Zhang, F.; Li, P.C.; Gao, L.; Liu, Y.Q.; Ren, X.Y. Application of autoregressive dynamic adaptive (ARDA) model in realtime wind power forecasting. Renew. Energy 2021, 169, 129–143. [Google Scholar] [CrossRef]
Karakus, O.; Kuruoglu, E.E.; Altinkaya, M.A. One-day ahead wind speed/power prediction based on polynomial autoregressive model. IET Renew. Power Gener. 2017, 11, 1430–1439. [Google Scholar] [CrossRef]
Wang, Y.; Wang, D.; Tang, Y. Clustered Hybrid Wind Power Prediction Model Based on ARMA, PSO-SVM, and Clustering Methods. IEEE Access 2020, 8, 17071–17079. [Google Scholar] [CrossRef]
Gao, Y.; Qu, C.; Zhang, K. A Hybrid Method Based on Singular Spectrum Analysis, Firefly Algorithm, and BP Neural Network for Short-Term Wind Speed Forecasting. Energies 2016, 9, 757. [Google Scholar] [CrossRef]
Viet, D.T.; Phuong, V.V.; Duong, M.Q.; Tran, Q.T. Models for Short-Term Wind Power Forecasting Based on Improved Artificial Neural Network Using Particle Swarm Optimization and Genetic Algorithms. Energies 2020, 13, 2873. [Google Scholar] [CrossRef]
Zendehboudi, A.; Baseer, M.A.; Saidur, R. Application of support vector machine models for forecasting solar and wind energy resources: A review. J. Clean. Prod. 2018, 199, 272–285. [Google Scholar] [CrossRef]
Duan, R.; Peng, X.; Li, C.; Yang, Z.; Jiang, Y.; Li, X.; Liu, S. A Hybrid Three-Staged, Short-Term Wind-Power Prediction Method Based on SDAE-SVR Deep Learning and BA Optimization. IEEE Access 2022, 10, 123595–123604. [Google Scholar] [CrossRef]
Alkesaiberi, A.; Harrou, F.; Sun, Y. Efficient Wind Power Prediction Using Machine Learning Methods: A Comparative Study. Energies 2022, 15, 2327. [Google Scholar] [CrossRef]
Tu, C.S.; Hong, C.M.; Huang, H.S.; Chen, C.H. Short Term Wind Power Prediction Based on Data Regression and Enhanced Support Vector Machine. Energies 2020, 13, 6319. [Google Scholar] [CrossRef]
Liu, B.; Zhao, S.; Yu, X.; Zhang, L.; Wang, Q. A Novel Deep Learning Approach for Wind Power Forecasting Based on WD-LSTM Model. Energies 2020, 13, 4964. [Google Scholar] [CrossRef]
Wang, Y.S.; Gao, J.; Xu, Z.W.; Luo, J.; Li, L. A Prediction Model for Ultra-Short-Term Output Power of Wind Farms Based on Deep Learning. Int. J. Comput. Commun. Control. 2020, 15, 3901. [Google Scholar] [CrossRef]
Xiong, B.; Lou, L.; Meng, X.; Wang, X.; Ma, H.; Wang, Z. Short-Term wind power forecasting based on Attention Mechanism and Deep Learning. Electr. Power Syst. Res. 2022, 206, 107776. [Google Scholar] [CrossRef]
Wu, Q.; Guan, F.; Lv, C.; Huang, Y. Ultra-short-term multi-step wind power forecasting based on CNN-LSTM. IET Renew. Power Gener. 2021, 15, 1019–1029. [Google Scholar] [CrossRef]
Son, N.; Yang, S.; Na, J. Hybrid Forecasting Model for Short-Term Wind Power Prediction Using Modified Long Short-Term Memory. Energies 2019, 12, 3901. [Google Scholar] [CrossRef]
Han, L.; Jing, H.; Zhang, R.; Gao, Z. Wind power forecast based on improved Long Short Term Memory network. Energy 2019, 189, 116300. [Google Scholar] [CrossRef]
Xiang, L.; Liu, J.; Yang, X.; Hu, A.; Su, H. Ultra-short term wind power prediction applying a novel model named SATCN-LSTM. Energy Convers. Manag. 2022, 252, 115036. [Google Scholar] [CrossRef]
Wang, D.; Cui, X.; Niu, D. Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF. Sustainability 2022, 14, 7307. [Google Scholar] [CrossRef]
Huang, Q.; Wang, X. A Forecasting Model of Wind Power Based on IPSO-LSTM and Classified Fusion. Energies 2022, 15, 5531. [Google Scholar] [CrossRef]
Liang, J.; Xu, Y.; Bao, C.; Quan, Y.; Ji, H. Barzilai-Borwein-based adaptive learning rate for deep learning. Pattern Recognit. Lett. 2019, 128, 197–203. [Google Scholar] [CrossRef]
Li, Y.; Ren, X.; Zhao, F.; Yang, S. A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning. Appl. Sci.-Basel 2021, 11, 10184. [Google Scholar] [CrossRef]
Yang, L.; Cai, D. AdaDB: An adaptive gradient method with data-dependent bound. Neurocomputing 2021, 419, 183–189. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]

Figure 1. Structure of LSTM cell.

Figure 2. Wind power data in this work.

Figure 3. Loss change during training (initial global learning rate 0.01). (a) Loss change during the initial stage; (b) Loss change during the later stage.

Figure 4. Global learning rate changes during training (initial global learning rate 0.01).

Figure 5. Prediction results for each model on the test set.

Table 1. Experimental samples.

Sample Index	Input (Past 10 Observations)					Label (Future Power)
1	3603.64	3603.22	…	3603.00	3602.82	3602.66
2	3603.22	3603.36	…	3602.82	3602.66	3602.86
…	…	…	…	…	…	…
4308	232.15	176.85	…	59.71	57.46	3.15
4309	176.85	166.88	…	57.46	3.15	74.91
4310	166.88	138.42	…	3.15	74.91	97.37

Table 2. Performance of LsAdam–LSTM and Adam–LSTM with different initial global learning rates.

Initial Global Learning Rate		0.005	0.01	0.02	0.03
Trained Epoch	LsAdam–LSTM	319	241	257	282
Trained Epoch	Adam–LSTM	413	313	325	398
MSE_train (×10⁻³)	LsAdam–LSTM	3.5798	3.5961	3.5872	3.5667
MSE_train (×10⁻³)	Adam–LSTM	3.7459	3.7490	3.7469	3.7484
MAE_train (×10⁻²)	LsAdam–LSTM	2.6517	2.6637	2.6797	2.7085
MAE_train (×10⁻²)	Adam–LSTM	2.7446	2.7493	2.7335	2.7585
R_train	LsAdam–LSTM	0.9831	0.9830	0.9830	0.9833
R_train	Adam–LSTM	0.9823	0.9823	0.9823	0.9823
MSE_test (×10⁻³)	LsAdam–LSTM	2.1560	2.1628	2.1639	2.1483
MSE_test (×10⁻³)	Adam–LSTM	2.2752	2.2374	2.1990	2.2461
MAE_test (×10⁻²)	LsAdam–LSTM	2.6069	2.6016	2.6202	2.6323
MAE_test (×10⁻²)	Adam–LSTM	2.6992	2.6215	2.6405	2.6108
R_test	LsAdam–LSTM	0.9903	0.9903	0.9903	0.9903
R_test	Adam–LSTM	0.9898	0.9900	0.9901	0.9898

Table 3. Comparison of the prediction performance of each model.

Dataset	Model	MSE (×10⁻³)	MAE (×10⁻²)	R
Train set	BP	9.6654	4.9652	0.9537
	SVR	7.0193	7.0999	0.9813
	Adam–LSTM	3.7490	2.7493	0.9823
	LsAdam–LSTM	3.5961	2.6637	0.9830
Test set	BP	6.5487	4.9301	0.9702
	SVR	4.4337	5.3506	0.9867
	Adam–LSTM	2.2374	2.6215	0.9900
	LsAdam–LSTM	2.1628	2.6016	0.9903

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Niu, G.; Guan, H.; Song, S. Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam. Energies 2023, 16, 3789. https://doi.org/10.3390/en16093789

AMA Style

Huang J, Niu G, Guan H, Song S. Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam. Energies. 2023; 16(9):3789. https://doi.org/10.3390/en16093789

Chicago/Turabian Style

Huang, Jingtao, Gang Niu, Haiping Guan, and Shuzhong Song. 2023. "Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam" Energies 16, no. 9: 3789. https://doi.org/10.3390/en16093789

APA Style

Huang, J., Niu, G., Guan, H., & Song, S. (2023). Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam. Energies, 16(9), 3789. https://doi.org/10.3390/en16093789

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ultra-Short-Term Wind Power Prediction Based on LSTM with Loss Shrinkage Adam

Abstract

1. Introduction

2. Principle of LSTM Ultra-Short-Term Wind Power Prediction

2.1. LSTM Model Structure

2.2. Learning Rate Update Strategy Based on Adam

3. LsAdam–LSTM Wind Power Prediction Model

4. Experimental Verification

4.1. Data Preparation

4.2. Evaluation Index of Ultra-Short-Term Wind Power Prediction Model

4.3. Experiment and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI