Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF

Wang, Dongyu; Cui, Xiwen; Niu, Dongxiao

doi:10.3390/su14127307

Open AccessArticle

Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF

by

Dongyu Wang

^*,

Xiwen Cui

and

Dongxiao Niu

School of Economics and Management, North China Electric Power University, Beijing 102206, China

^*

Author to whom correspondence should be addressed.

Sustainability 2022, 14(12), 7307; https://doi.org/10.3390/su14127307

Submission received: 6 May 2022 / Revised: 3 June 2022 / Accepted: 10 June 2022 / Published: 15 June 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Improving the accuracy of wind power forecasting can guarantee the stable dispatch and safe operation of the grid system. Here, we propose an EMD-PCA-RF-LSTM wind power forecasting model to solve problems in traditional wind power forecasting such as incomplete consideration of influencing factors, inaccurate feature identification, and complex space–time relationships between variables. The proposed model incorporates Empirical Mode Decomposition (EMD), Principal Component Analysis (PCA), Random Forest (RF), and Long Short-Term Memory (LSTM) neural networks, And environmental factors are filtered by the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm when pre-processing the data. First, the environmental factors are extended by the EMD algorithm to reduce the non-stationarity of the series. Second, the key influence series are extracted by the PCA algorithm in order to remove noisy information, which can seriously interfere with the data regression analysis. The data are then subjected to further feature extraction by calculating feature importance through the RF algorithm. Finally, the LSTM algorithm is used to perform dynamic time modeling of multivariate feature series for wind power forecasting. The above combined model is beneficial for analyzing the effects of different environmental factors on wind power and for obtaining more accurate prediction results. In a case study, the proposed combined forecasting model was verified using actual measured data from a power station. The results indicate that the proposed model provides the most accurate results when compared to benchmark models: MSE 7.26711 MW, RMSE 2.69576 MW, MAE 1.73981 MW, and adj-R² 0.9699203s.

Keywords:

wind power forecasting; empirical mode decomposition; principal component analysis; random forest; long short-term memory

1. Introduction

The issue of climate warming is currently a global concern. One of the major contributors to the problem is the use of fossil fuels. In recent years, countries around the world have been actively promoting a clean and low-carbon energy transition. Various countries have formed a new global energy pattern by formulating energy transition policies and by increasing investment in renewable energy and other programs. In this context, China has taken a series of measures to promote a low-carbon transition in its energy supply system. It is foreseeable that with the deepening of this energy transition, wind power will keep developing rapidly and become an important part of China’s new power system construction in the future. However, as wind power is affected by wind speed, temperature and other environmental factors and has high randomness and fluctuation, large-scale wind power installation presents great challenges with respect to the safe and stable operation of the power system. Thus, in order to improve wind power consumption capacity and realize the optimal dispatch of the power system, it is important to improve the accuracy of short-term wind power forecasting. The current wind power forecasting models are, on the one hand, constrained by the source data of wind farms, and tend to ignore the influence of various environmental factors on wind power, resulting in the multivariate series of environmental factors not being used effectively. On the other hand, due to the nonlinear variation of the wind power and multivariate environmental information series [1,2,3], the convergence of forecasting models gradually slows down, and problems with over-fitting problem can occur with increased in input variables.

The main current wind power forecasting methods are the physical, statistical, and machine learning methods. The physical method involves the construction of a physical model to forecast from meteorological data and the surface information around the electric field [4], represented by NWP (Numerical Weather Prediction). The main statistical methods are the time series method [5], gray forecasting method [6], autoregressive moving average method [7], and a few others which refer to the construction of a nonlinear relationship between historical data and wind power data to make forecasts. The main machine learning methods are neural networks [8], support vector machines [9], extreme learning machines [10], etc. Due to the ability of neural networks to mine the nonlinear relationships and deep features in training data, they generally have better forecasting performance [11,12], and are now widely used in forecasting for wind power generation [13]. Comparisons in the literature [14,15,16,17] between LSTM (Long Short-Term Memory neural network) and other forecasting models have shown LSTM models to be better than other forecasting models for both long-term and short-term forecasting; however, the LSTM model has the problems of model complexity and long training time. Thus, proper data processing can enhance the learning effect of the LSTM model. In [18], the authors proposed a PCA (Principal Component Analysis)-based LSTM forecasting model which was able to effectively reduce the dimensionality of the input variables of the LSTM. In [19], the authors proposed an LSTM wind power forecasting method based on RR (Robust Regression) and VMD (Variational Mode Decomposition) which improved the LSTM forecasting accuracy by decomposing the wind power series to eliminate the noise. In [20], the authors proposed a multiple variable LSTM algorithm and performed dimensionality reduction on the original data using the wavelet noise reduction method, then reconstructed pre-selected input data using chaos analysis and the classification forecast tree method, showing that dimensionality reduction can effectively improve LSTM forecasting accuracy. Both [21,22] improved the LSTM model by introducing an attention mechanism which was able to allocate limited computational resources to more important tasks while solving the information overload problem. In [23,24], the authors similarly introduced an attention mechanism into the CNN-LSTM forecasting model to improve forecasting accuracy. The above examples illustrate that improving the quality of input data is beneficial to the forecasting accuracy of the LSTM model.

To improve the accuracy of wind power forecasting, the data quality first needs to be improved from several perspectives, including data decomposition and dimensionality reduction. Wind power series and multivariate environmental factor series are often non-stationary [25], and forecasting the original series directly may result in large errors. Decomposing the signal of the original series can improve forecasting accuracy while reducing the complexity of the data [26,27]. The most commonly used signal decomposition methods are FFT (Fast Fourier Transform), WT (Wavelet Transform), and EMD (Empirical Mode Decomposition). In [28,29], the authors optimized the signal time domain waveform using FFT, although they did not consider the local features of the time frequency. In [30,31], the authors used Wavelet Transform to decompose the wind speed time series, which has the drawback of requiring manual setting of the parameters. Compared to the two methods mentioned above, EMD [32] can actively identify the features of the input data with self-adaptability and multi-resolution. In [33], the authors proposed an EMD-MODBN (Multi-objective Deep Belief Network) model, and their experimental results proved that decomposition of data using EMD methods is beneficial in helping neural networks to capture the potential connections between data. In [34], the authors decomposed the wind speed data series using EMD to remove the noise from the original data, leading to improved forecasting accuracy. In [35], the authors decomposed the residential load signal using EMD; their results proved that more data peaks correlates to better EMD decomposition. The above examples illustrate the superiority of the EMD algorithm in data decomposition. Thus, in this paper, we chose to decompose the environmental factor series using the EMD algorithm.

In addition to the accuracy and stationarity of the data, the dimensionality plays an important role in the forecasting accuracy of the model. Introducing a dimensionality reduction algorithm into the forecasting model can improve its calculation efficiency. PCA (Principal Component Analysis) is an effective method for reducing the dimensionality of data by analyzing the covariance structure of multivariate data series, calculating the contribution of each series, and selecting the primary series to be expressed in order to achieve the purpose of data dimensionality reduction [36,37,38]. In [39], the authors used PCA based on high-frequency data to forecast the high-dimensional covariance matrix, which can well characterize the long memory behavior of realized eigenvalue series and be easily estimated by OLS. In [40], the authors used the PCA-BP method to filter the model input data, eliminating redundant and irrelevant information, reducing the complexity of the model, and improving the forecasting performance of the subsequent model. In [41], the authors preprocessed the data using PCA in order to achieve dimensionality reduction, which reduced the computation time of the subsequent model. RF (Random Forest) is a supervised learning algorithm which represents an effective method of dimensionality reduction. In [42], the authors used the RF algorithm to rank the importance of individual features based on the Gini index for feature filtering, and experimental results showed that the random forest algorithm-based feature importance ranking was able to filter features and improve the accuracy of subsequent models. In [43], the authors compared a multiple linear regression model with a random forest model and found that the RF model was far superior to the multiple linear regression model in terms of both goodness-of-fit and other evaluation indicators. In [44], the authors preferentially selected features via RF algorithm, and the experimental results showed that random forest feature selection can effectively reduce data redundancy, eliminate features with poor differentiability, and improve the recognition effect of the model. In [45,46,47,48], the authors computed feature importance with RF algorithms in different fields, all of which improved the subsequent model effects. Thus, in this paper, we chose to reduce the dimensionality of the EMD decomposed series using a combined PCA-RF algorithm.

To date, data decomposition-based and data dimensionality reduction-based wind power forecasting has been studied by several scholars. In [49], the authors used complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) to divide the volatility into several fluctuation components with different frequency characteristics; their experimental results demonstrated the advantages of data decomposition. In [50], the authors used VMD to decompose wind speed into the nonlinear part, the linear part, and the noise part, and experiment results showed that the model proposed in their paper well and truly reflected the characteristics of the wind speed. In [51], the authors extracted principal components from high-dimensional raw data using PCA results as input variables for subsequent models; their experimental results demonstrated improvement in forecasting accuracy. In [52], the authors processed the input variables through VMD decomposition and PCA dimensionality reduction, and the experimental results showed that the proposed method had higher forecasting accuracy than the traditional methods. It can be seen from the above examples that both data decomposition and data dimensionality reduction are widely used in the field of wind power forecasting, although there are fewer studies that combine decomposition and dimensionality reduction.

In summary, to solve the problems of inaccurate feature identification and slow convergence in traditional wind power forecasting models, this paper makes full use of the environmental factor series that affect wind power, mines the features of wind power and environmental factors over time, and constructs an EMD-PCA-RF-LSTM forecasting model to improve the forecasting accuracy of wind power in order to provide certain technical support for the safe dispatch of power systems and enhance power system’s abilities with respect to the consumption of wind power. In this paper, the environmental factors are filtered by the LASSO (Least Absolute Shrinkage and Selection Operator) algorithm at the very beginning. Second, the environmental factor series are decomposed by the EMD algorithm to reduce non-stationarity. Third, the key influencing factor series are extracted and feature importance is calculated by the PCA-RF algorithm for further feature extraction from the data. Finally, dynamic time modeling of multivariate feature series is performed by the LSTM algorithm to forecast the wind power. A comparative analysis including a traditional BP neural network, SVM, EMD-PCA-RF-BP, EMD-PCA-RF-SVM and single LSTM, EMD-PCA-LSTM, and EMD-RF-LSTM showed that the combined forecasting model proposed in this paper had the highest accuracy.

The rest of the sections in this paper are arranged as follows: the relevant theories of each algorithm are described in Section 2; data preprocessing, and the design of the combined model, and the evaluation indicators are described in Section 3; in Section 4, the performance of proposed model is analyzed according to the results of five combined models and three benchmark models on a case study; finally, the conclusions of this study are presented in Section 5.

2. Methods

2.1. LASSO Algorithm

The LASSO algorithm, first proposed by Robert Tibshirani [53], is a regression analysis method that enables simultaneous variable selection and parameter estimation. The LASSO method is used to compress the regression coefficients by constructing a penalty function. Coefficients with small absolute values are compressed to zero, thus achieving the purpose of variable selection. The process is as follows.

A general linear regression model can be expressed by the following formula:

y_{i} = α + \sum_{j = 1}^{p} β_{j} x_{i j} + ε_{i}, ε_{i} \sim N (0, σ^{2})

(1)

In Formula (1),

y_{i}

is the dependent variable,

i = 1, 2, \dots, n

;

α

is a constant term;

β_{1}, β_{2,} \dots, β_{p}

are the regression parameters;

x_{i j}

is the independent variable that affects the dependent variable; and

ε_{i}

is a normal distribution with zero mean.

The estimator

(\hat{α}, \hat{β})

of LASSO can be expressed by the following formula:

(\hat{α}, \hat{β}) = \arg \min {\sum_{i = 1}^{n} {(y_{i} - α - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2}} s . t . \sum_{j = 1}^{p} | β_{j} | \leq t

(2)

In Formula (2),

t

is an adjustable parameter which satisfies

t \geq 0

for all the estimated values of

t

and

α

satisfying

\hat{α} = \bar{y}

. Without loss of generality, assume that

\bar{y} = 0

; then,

α = 0

. Then, Formula (1) can be written as follows:

\hat{β} = \arg \min {{\sum_{i = 1}^{n} (y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2}} s . t . \sum_{j = 1}^{p} | β_{j} | \leq t

(3)

Let

t_{0} = \sum_{j = 1}^{p} | {\hat{β}}_{j}^{0} |

, where

{\hat{β}}_{j}^{0}

is the least squares estimation of the regression coefficient. When

t < t_{0}

, the regression coefficients are compressed and certain of the regression coefficients converge to 0 or even equal 0. Then, Formula (3) can be written as follows:

\hat{β} = \arg \min {\sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2} + λ \sum_{j = 1}^{p} | β_{j} |}

(4)

where

\sum_{i = 1}^{n} {(y_{i} - \sum_{j = 1}^{p} β_{j} x_{i j})}^{2}

is the loss function reflecting the fitting effect of the model,

λ \sum_{j = 1}^{p} | β_{j} |

is the penalty function reflecting the model’s compression strength of the coefficients, and

λ

is the penalty parameter subject to

λ \in [0, \infty)

, which determines the compression strength of the coefficient. As

λ

increases, the coefficients of the variables in the model are gradually compressed, and when

λ

reaches a certain value, the coefficients of certain variables are compressed to zero, achieving the purpose of variable selection.

In this paper, K-fold Cross-Validation is used to determine the penalty parameter,

λ

. The sample datasets are divided into

K

subsets. One subset is used as the dataset for validating the model, while the other

K - 1

subsets are used to construct the model. The cross-validation repeats

K

times, with each subset validated once. The process is as follows.

Step 1. Divide the sample datasets

T

into

K

subsets,

T = {T_{1}, T_{2}, \dots, T_{K}}

.

Step 2. Use one of the subsets as the validation set and the remaining

K - 1

subsets as the training set, that is, train

K

times and validate

K

times.

Step 3. For each penalty parameter

λ

, use the training set to find the estimator

{\hat{β}}^{(t)} (λ)

of

β

; the statistic for cross-validation is

C V (λ) = {\sum_{t = 1}^{K} {y_{i} - \sum_{j = 1}^{p} {\hat{β}}^{(t)} (λ) x_{i j}}}^{2}

(5)

Step 4. The corresponding penalty parameter is obtained by minimizing

C V (λ)

:

{\hat{λ}}_{c v} = \underset{λ > 0}{\arg \min} C V (λ)

(6)

2.2. EMD Algorithm

In this paper, the EMD algorithm [34] is used to decompose the environmental factor series. The EMD algorithm can be used to obtain the local features of the environmental factor series that affect the wind power at different time scales, which results in a more detailed, though larger, number of data series.

EMD, as a data-driven adaptive nonlinear time-varying signal decomposition method, is based on Fourier analysis and Wavelet analysis, and is applicable for smoothing nonlinear non-stationary signals in a step-by-step process. The filtering process of the EMD algorithm decomposes complex time series data into a finite number of Intrinsic Mode Functions (IMFs); these decomposed IMFs contain the fluctuating information of the original data on different time scales [54]. The process is as follows.

Step 1. For any processed signal

x (t)

, determine its local maximum and minimum values and record the difference between the signal data,

x (t)

, and the mean value of the upper and lower envelope,

m_{1} (t)

, as

h_{1} (t)

, which can be expressed as follows:

h_{1} (t) = x (t) - m_{1} (t)

(7)

Step 2. Repeating the above process, the first-stage IMF

h_{1} (t)

filtered from the original signal usually contains the highest frequency component of the signal. Separate

h_{1} (t)

from

x (t)

to obtain the difference signal,

r_{1} (t)

, with the high-frequency component removed, and repeat the above filtering steps with

r_{1} (t)

as the new signal until the residual signal in the n-th stage is a monotonic function that no longer filters out the

r_{1} (t)

of the IMF. The expression is as follows:

{\begin{cases} r_{1} (t) = x (t) - h_{1} (t) \\ r_{2} (t) = r_{1} (t) - h_{2} (t) \\ ⋮ ⋮ \\ r_{n} (t) = r_{n - 1} (t) - h_{n} (t) \end{cases}

(8)

According to the decomposition algorithm,

x (t)

can be expressed as a sum of n IMFs and a single residual, with the following expression:

x (t) = \sum_{j = 1}^{n} h_{j} (t) + r_{n} (t)

(9)

In Formula (9),

r_{n} (t)

is the residual, indicating the average trend in the signal, while

h_{j} (t)

is the j-th IMF, j = 1, 2,…, n, which denote the different components of the signal from high to low frequencies, respectively.

The end of the filtering process is mainly based on a Cauchy-like convergence criterion [55], and the standard deviation (SD) is usually set from 0.2 to 0.3, with the following expression:

S D = \sum_{j = 1}^{n} \frac{{| h_{j} (t) - h_{j - 1} (t) |}^{2}}{{| h_{j - 1} (t) |}^{2}}

(10)

2.3. PCA-RF Combined Algorithm

In this paper, the PCA-RF combined algorithm is used to reduce the feature dimensions without losing the original data information. The results are then used as the input of the LSTM forecasting model in order to improve the calculation efficiency and accuracy of feature extraction.

2.3.1. PCA Algorithm

The data series obtained by EMD decomposition enriches the number of feature series, although the dimensionality of the input variables increases as well. PCA was first proposed by Karl Pearson in 1901, and has been widely used to reduce the number of feature vectors; thus, it can achieve the dimensionality reduction of data. It can be used to reduce the computation times of artificial neural networks as well as to increase their calculation speed. The process is as follows [56].

Assuming an original dataset

X = {x_{11}, x_{12}, x_{i j}, \dots, x_{m n}}

where

i

is the time observation point and

j

is the series of environmental factors constituting the dataset matrix:

X = [\begin{array}{l} x_{11} & x_{12} & \dots & x_{1 n} \\ x_{21} & x_{22} & \dots & x_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x_{m 1} & x_{m 2} & \dots & x_{m n} \end{array}]

(11)

Step 1. Normalize the data series to obtain the normalization matrix,

X^{*}

:

X^{*} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(12)

Step 2. Use linear transformation to obtain the covariance matrix,

R

:

R = \frac{1}{n} {(X^{*})}^{T} X^{*}

(13)

Step 3. Solve

| λ I - R | = 0

to obtain the feature matrix,

λ

; finally, calculate the accumulated contribution rate,

β_{i}

:

β_{i} = \sum_{i = 1}^{k} \frac{λ_{i}}{\sum_{i = 1}^{n} λ_{i}}

(14)

Usually, the first

k

principal components, with an accumulated contribution of 75% to 95%, are able to contain most of the information that the

n

original variables can provide.

2.3.2. RF Algorithm

Following PCA dimensionality reduction, the optimal feature series are selected via RF by evaluating the feature importance of the series. The RF algorithm, first proposed by Breiman in 2001 [57], is an integrated algorithm based on decision trees. The sampling method of RF sets about 1/3 of the sample as not selected, and this part of the sample becomes out-of-bag (OOB) data.

RF obtains the feature importance by perturbing the out-of-bag data that are not involved in decision tree training and calculating their classification accuracy difference. The process is as follows [44].

Step 1. RF carries out Bootstrap sampling by taking

K

sample datasets to generate

K

decision trees, each of which is generated independently.

Step 2. Let

k = 1

to train decision tree

T_{k}

. The training input is the

k

-th dataset; calculate the accuracy,

L_{k}

, for the

k

-th out-of-bag data.

Step 3. Rearrange the feature,

f

, in the out-of-bag dataset and calculate the accuracy,

L_{k}^{f}

.

Step 4. Repeat Steps 2 and 3 for all sample datasets

k = 2, 3, \dots, K

.

Step 5. Calculate the classification accuracy error after feature rearrangement, which can be expressed as

e_{k}^{f} = L_{k} - L_{k}^{f}, k = 2, 3, \dots, K

(15)

Step 6. From Formula (15), we can obtain the influence of features on the accuracy of the out-of-bag data:

e^{f} = \frac{1}{K} \sum_{k = 1}^{K} e_{k}^{f}

(16)

where the variance of

e^{f}

is as follows:

S^{2} = \frac{1}{K - 1} \sum_{k = 1}^{K} {(e_{k}^{f} - e^{f})}^{2}

(17)

Step 7. From Formulas (16) and (17), the importance of a feature

f

can be calculated.

f_{V I} = \frac{e^{f}}{S}

(18)

Step 8. From Formula (18), the

f_{V I}

of all the features can be obtained.

To select the optimal feature subset, the feature subset is generated by removing one feature at a time from the sorted feature set, calculating the accuracy of the feature subset, and finally selecting the one with the highest accuracy as the most optimal feature set.

2.4. LSTM Algorithm

A Long Short-Term Memory (LSTM) network is an extension of a Recurrent Neural Network (RNN) [58]. The LSTM network contains one input layer, one output layer, and several hidden layers; its structure is shown in Figure 1. The key to the LSTM network is the cell state;

C_{t - 1}

and

C_{t}

are the old state and new state of the cell, respectively. In an LSTM, the information about the cell state is censored by the gate structure, which allows for selective passage of information. It consists of a sigmoid neural network layer and a pairwise multiplication operation. The sigmoid is a nonlinear activation function and is contained in the gate structure. The gate structure output ranges from 0 to 1 and defines the degree of information passing through. The tanh layer in Figure 1 is an activation function that maps a real number input to the range [−1, 1].

The Cell of an LSTM includes an input gate, output gate, and forget gate, which is the key to controlling the information flow. In the following formulas,

i_{t}

,

o_{t}

, and

f_{t}

denote the state values of the input gate, output gate, and forget gate, respectively [24].

Step 1. The sigmoid layer by the forget gate decides whether the information is forgotten from the old cell state,

C_{t - 1}

, with the input,

x_{t}

, of the current layer and the output,

h_{t - 1}

, of the previous layer; then, the current cell state output is as follows:

f_{t} = σ (W_{1}^{f} \cdot x_{t} + W_{h}^{f} \cdot h_{t - 1} + b_{f})

(19)

Step 2. In order to generate the information that needs to be updated and stored in the cell state, first, the input gate updates the information using the result,

i_{t}

, of the sigmoid layer. Then, the new candidate value,

C_{t}

, generated by the tanh layer is added to the cell state.

C_{t}

is obtained from multiplying the old cell state by

f_{t}

to forget the unwanted information and adding it to the new candidate information,

i_{t} \cdot {\tilde{C}}_{t}

:

i_{t} = σ (W_{1}^{i} \cdot x_{t} + W_{h}^{i} \cdot h_{t - 1} + b_{i})

(20)

{\hat{C}}_{t} = \tanh (W_{1}^{C} \cdot x_{t} + W_{h}^{C} \cdot h_{t - 1} + b_{C})

(21)

C_{t} = i_{t} \cdot {\tilde{C}}_{t} + f_{t} \cdot C_{t - 1}

(22)

Step 3. To determine the information output by the output gate, the initial output is first obtained through the sigmoid layer, the value of the cell state is scaled between [−1, 1] using the tanh layer, and the output obtained by sigmoid is multiplied pair by pair to obtain the output,

h_{t}

:

o_{t} = σ (W_{1}^{o} \cdot x_{t} + W_{h}^{o} \cdot h_{t - 1} + b_{o})

(23)

h_{t} = o_{t} \times \tanh (C_{t})

(24)

In Formulas (19)–(23),

W_{1}^{i}

,

W_{1}^{f}

,

W_{1}^{o}

,

W_{1}^{C}

are the weight matrices communicating

x_{t}

with the input gate, forget gate, output gate, and cell input, respectively;

W_{h}^{i}

,

W_{h}^{f}

,

W_{h}^{o}

,

W_{h}^{C}

are the weight matrices connecting

h_{t - 1}

with the input gate, forget gate, output gate, and cell input, respectively;

b_{i}

,

b_{f}

,

b_{o}

,

b_{C}

are the biases of the input gate, forget gate, output gate, and cell input, respectively; and

σ

is the sigmoid activation function.

3. Data Pre-Processing and Model Design

3.1. Data Sources

The experimental sample dataset used in this paper consisted of time series data of wind power from 1 October 2019 to 30 October 2020 provided by a domestic power station. The series of environmental factors has six columns, including the weather forecast data (humidity, temperature, air pressure, air density) obtained by the corresponding environmental monitors and the actual data (wind direction, wind speed) measured by the wind tower in the actual field. The observation points are forecasted at 15 min intervals, which is in line with actual production operation changes. Due to observation errors and communication failures in the actual operation of the power station, the observation dataset needed to be cleaned; “bad data” with null values and abnormal values were eliminated on a per-day basis to obtain the experimental observation dataset. The observation time period was 24 h throughout the day, the observation interval was 15 min, and there were 96 data observation points per day. The variables included in each dataset were the 24-h weather forecast data, the field wind tower actual measurement data, and the actual wind power data, as shown in Table 1. Finally, 35,586 sets of time section data were obtained. The original wind power time series and the environmental factors (in terms of wind speed) time series are shown in Figure 2 and Figure 3.

3.2. Environmental Factors Selection Based on LASSO

First, the input series of humidity, temperature, pressure, air_density, wind_dir, wind_speed, and power were normalized by their maximum–minimum values according to Formula (12). In this paper, the penalty parameter,

λ = 0.0005

, in the LASSO method was determined by three-fold cross-validation, at which value LASSO kept five variables and filtered one variable with a coefficient of zero, as shown in Figure 4.

The exact variable coefficients are shown in Table 2.

The regression coefficient of the air_density variable was zero per the LASSO calculation, and was thus eliminated. From the perspective of variable coefficients, the wind speed coefficient reached 1.614955, which was highly positively correlated with the wind power. When the wind speed was less than the rated wind speed, the wind turbine output power was less than the rated power, and when the wind speed was greater than the rated wind speed, the wind turbine output power was stable at the rated power. Therefore, there was a strong correlation between wind speed and wind power. The coefficient of the wind direction was 0.097354, which was positively correlated with wind power. The wind direction represents the angle between the wind speed and the fan, which determines the effective wind speed acting vertically on the fan. The coefficient of air pressure was 0.006481, which was positively correlated with the output power. Low air pressure affects wind power converting efficiency, and may lead to a corona phenomenon on the wind turbine. The coefficient of humidity was −0.11514, which was negatively correlated with wind power. High humidity leads to corrosion of the external components of wind power generation equipment, such as the tower and blades, as well as to a reduction in the performance of various components inside the wind turbine. The coefficient of temperature was −0.12377, which was negatively correlated with wind power. High temperatures can lead to softening and oxidation of the parts and materials used in wind turbine equipment, resulting in reduced performance and lower wind power converting efficiency.

Through the above analysis, wind_speed, wind_dir, pressure, humidity, temperature, and power were taken as the input variables of the follow-up models for decomposition, dimensionality reduction, and forecasting in this paper.

3.3. Construction of Wind Power Forecasting Model

3.3.1. Wind Power Forecasting Model Based on EMD-PCA-RF-LSTM

In this paper, we constructed a wind power forecasting model based on EMD-PCA-RF-LSTM; in the actual forecasting process, the dataset that was applied for training the LSTM model was constructed first. The flowchart of the EMD-PCA-RF-LSTM-based wind power forecasting model is shown in Figure 5. Algorithm 1 is the step corresponding to the flowchart.

Algorithm 1: EMD-PCA-RF-LSTM-based wind power forecasting algorithm.

Input: DATE, humidity, temp, pressure, wind_dir, wind_speed, power

Output: Forecasting results, model evaluation indicators (MSE, RMSE, MAE, R²/adj-R²).

(1): Data cleaning: Cleaning the wind power series and environmental factor series data. The “bad data” due to observation errors, communication failures, etc. are eliminated on a per-day basis.
(2): Through the EMD algorithm, the environmental factor series are decomposed into intrinsic mode functions ${I M F_{1}, I M F_{2}, \dots, I M F_{m}}$ of different frequencies and the residual $r_{n}$ , thus decomposing the different scale fluctuations or trends present in the original environmental signal level by level.
(3): The data series decomposed in Step 2 have their dimensionality reduced by PCA and the key series affecting the wind power are filtered out, with an accumulated contribution rate of 85% as the threshold to eliminate the redundancy and correlation of the different time series obtained by EMD decomposition.
(4): The key series obtained by dimensionality reduction in Step 3 are further subjected to feature extracted. A new feature set is obtained through the RF algorithm by eliminating a certain proportion of features based on feature importance.
(5): The dataset obtained in Step 4 and the wind power history data are normalized and transformed into an applicable dataset for LSTM network training. The dataset is then divided into a training set and a test set.
(6): The parameters of the LSTM model are initialized and the training set input into the LSTM model for training until achieving the target accuracy.
(7): Following model training, the training file is saved and input to the test set for testing.
(8): The forecasting results and model evaluation indicators (MSE, RMSE, MAE, R²/adj-R²) are output and the algorithm ends.

3.3.2. Model Evaluation Indicators

As expressed in Formulas (25) to (28), the mean absolute error (

M A E

), root mean square error (

R M S E

), R-Square (

R^{2}

), and adjusted R-Square (

a d j - R^{2}

) were chosen to evaluate the accuracy of the proposed forecasting model. R-Square (

R^{2}

) is used to evaluate different models with same number of features, and

a d j - R^{2}

is used to evaluate different models with a different number of features. The smaller the

M A E

and

R M S E

, and the closer the value of

R^{2}

/

a d j - R^{2}

is to 1, the higher the accuracy of the forecasting results:

M A E = \frac{1}{m} \sum_{i = 1}^{m} | (y_{i} - {\hat{y}}_{i}) |

(25)

R M S E = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(y_{i} - {\hat{y}}_{i})}^{2}}

(26)

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{m} {({\bar{y}}_{i} - y_{i})}^{2}}

(27)

a d j - R^{2} = 1 - \frac{(1 - R^{2}) (m - 1)}{m - p - 1}

(28)

where

y_{i}

is the true value of wind power,

{\hat{y}}_{i}

is the forecasted value of wind power,

{\bar{y}}_{i}

is the mean value of wind power,

p

is the number of features, and

m

is the number of test sets.

4. Case Study

The experimental process described in this Section was performed with Python 3.7, AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz, a 64-bit operating system, and 16.00 GB of RAM. PyCharm Community Edition 2021.3.2 and IBM SPSS Statistics 25 were used for all experiments.

4.1. Decomposition by EMD

The environmental series in the experimental sample were non-stationary signals and had a certain randomness and sudden variability due to the influence of weather changes. The EMD algorithm was used to decompose the original environmental series data with a maximum iteration number of 1000 to obtain the IMFs and residual of each series, which could then be used to highlight the local features of the original environmental series. Table 3 shows the number of IMFs and the residuals obtained after EMD decomposition. The decomposition resulted in 66 IMFs and five residuals, for a total of 71 dimensional feature series as the new set of feature series.

The decomposition results of each series are shown in Figure 6.

4.2. Dimensionality Reduction by PCA

In order to reduce the redundancy and correlation of the feature series, the experiments were performed using IBM SPSS Statistics 25 to implement PCA and reduce the dimensionality of the previously obtained feature series. The IMFs were standardized and then input to PCA for a total of 71 dimensions. The calculated principal component eigenvalues and the accumulated contribution rates are shown in Table 4. As can be seen from the table, the accumulated contribution rate η of the first 46 feature series reaches 85%; the filtered principal components strongly represent the original feature series and have high comprehensive information capability. Thus, the first 46 feature series were selected to substitute for the original series.

4.3. Feature Extraction by RF

The remaining series following EMD decomposition and PCA dimensionality reduction were named “FAC1”, “FAC2”, …, “FAC45” and “FAC46” and used as the input features of the RF algorithm. The training set was selected as the first 70% of the data, the test set as the last 30%, and the number of decision trees was 1000. The results are shown in Table 5.

The intuitive view is shown in Figure 7.

Considering the feature importance ranking and the data dimension, we set three thresholds of importance, 0.025/0.024/0.023, and used the filtered variables as input variables for the follow-up LSTM model; and the model with the best effect was selected and saved.

4.4. Results and Comparison

4.4.1. Model Parameters

The LSTM model parameters are shown in Table 6. For the LSTM, the first 90% of the input data were used as the training set and the last 10% as the test set. The determination of dense numbers is described in Section 4.4.2.

4.4.2. Determination of RF Threshold

The forecasting error evaluation results with the RF threshold set as importance = 0.025 are shown in Table 7.

Table 8 shows the error with the RF threshold set as importance = 0.024.

Table 9 shows the error with the RF threshold set as importance = 0.023:

According to Table 7, Table 8 and Table 9, the forecasting effect is optimal at importance = 0.023 and dense = 3, which results in the lowest MSE and RMSE and the highest R². Thus, the RF threshold was set to 0.023 and the dense parameter of the LSTM model was set to 3.

4.4.3. Analysis of Experimental Results

In order to verify the superiority of the EMD-PCA-RF-LSTM wind power forecasting model proposed in this paper, the effectiveness of EMD decomposition, and the necessity of PCA-RF dimensionality reduction, as well as to show that the combined forecasting model proposed in this paper can significantly enhance and improve the accuracy of wind power forecasting, three forecasting models, namely, LSTM, EMD-PCA-LSTM and EMD-RF-LSTM, were used for forecasting. In addition, the model proposed in this paper was compared with two classical time series forecasting models, namely, a BP (back propagation) neural network and an SVM (Support Vector Machine), while using EMD-PCA-RF for data decomposition and dimensionality reduction. The comparison of the models is shown in Figure 8.

The absolute error values of the different models are shown in Figure 9.

The error evaluation results of the models are shown in Table 10.

According to Figure 6 and Figure 7, the combined EMD-PCA-RF-LSTM forecasting model outperforms the other models and has the smallest error compared to the actual wind power historical data.

According to Table 10, using SVM and BP alone is less effective, with R² of 0.5052151 and 0.6937693, respectively. Compared with SVM and BP, the R² of LSTM improved by 0.443411 and 0.254856, respectively, reaching 0.9486257. This is because LSTM has a long-time memory function, which improves the gradient disappearance problem, and has more nonlinear and generalization ability than BP.

After the data were decomposed and dimensionality was reduced by EMD-PCA-RF, the R² of SVM and BP improved by 0.212309 and 0.15953, and the improvement rates were 42.02% and 22.99%, respectively, which indicates that EMD-PCA-RF can process the data effectively and enhance the comprehensive information capability of the input variables while reducing the dimensionality.

Compared with LSTM, the R² of EMD-RF-LSTM reached 0.9655112, and that of EMD-PCA-LSTM reached 0.9599812. Both improved in R² compared to the benchmark model LSTM. The R² of EMD-PCA-RF-LSTM was the highest at 0.9699203, and the improvement rate was 2.24%.

In comparing EMD-RF-LSTM, EMD-PCA-LSTM and EMD-PCA-RF-LSTM, it can be seen that EMD-PCA-RF-LSTM has the best forecasting effect, illustrating the superiority of the combined forecasting model.

5. Conclusions

In this paper, we have proposed a combined EMD-PCA-RF-LSTM forecasting model for addressing with the fluctuation and randomness of wind power which fully considers the five main environmental factors that affect wind power generation, namely, temperature, air pressure, humidity, wind direction, and wind speed. Our main conclusions are as follows.

The environmental factors were filtered using the Least Absolute Shrinkage and Selection Operator (LASSO) method; variables with regression coefficients of 0 were eliminated in order to achieve variable selection. The time series of five columns of environmental factors were decomposed using the Empirical Modal Decomposition (EMD) method to obtain IMFs, and the key influencing series affecting wind power were filtered out by Principal Component Analysis (PCA) to reduce the dimensionality. The key influencing factor series after PCA dimensionality reduction were subjected to further feature extraction by Random Forest (RF), a certain percentage of features were eliminated according to the feature importance ranking, and a new feature set was ultimately obtained as the input variable of the LSTM model. Finally, the non-linear relationship between environmental factor time series and wind power series was modeled dynamically by a long and short-term memory (LSTM) neural network to construct a forecasting model to forecast wind power.

By comparing SVM, BP, LSTM, EMD-PCA-RF-SVM, EMD-PCA-RF-BP, EMD-RF-LSTM, EMD-PCA-LSTM, and EMD-PCA-RF-LSTM, it was determined that the combined forecasting model proposed in this paper works optimally. With the combined EMD-PCA-RF decomposition reduction method, the MSE, RMSE, and MAE of SVM, BP, and LSTM all decreased, the adj-R² of SVM improved by 42.02%, the adj-R² of BP improved by 22.99%, and the adj-R² of LSTM improved by 2.24% to 0.9699203, achieving the best results among all of the models.

This paper verifies the practicality of the EMD-PCA-RF-LSTM model in the field of wind power forecasting and extends the application scope of deep learning techniques. The proposed forecasting method provides a new perspective for in-depth discussion of the economic operation and scheduling of wind power grid-connected systems which has good application prospects and engineering application value in reality. Related research is already in progress.

Author Contributions

D.W. analyzed the data and completed the case study part. X.C. contributed to the background of the article and the literature review. D.N. guided the framework of the article and the construction of the model. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2020YFB1707801).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zheng, H.; Chen, H.P.; Li, H.; Zheng, Y.K. Wind Power Prediction Based on Wavelet Transform and RBF Neural Network. Lab. Res. Explor. 2019, 38, 36–39. [Google Scholar]
Liu, C.; Lang, J. Wind Power Prediction Method Using Hybrid Kernel LSSVM with Batch Feature. Acta Autom. Sin. 2020, 46, 1264–1273. [Google Scholar]
Gao, H.L.; Wei, X.; Ye, J.H.; Su, Y.P. Research on Wind Power Prediction Based on Optimal Combination of NSGA-II Algorithm. Water Power 2020, 46, 114–118. [Google Scholar]
Wang, Y.; Zhang, N.; Kang, C.Q.; Miao, M.; Shi, R.; Xia, Q. An Efficient Approach to Power System Uncertainty Analysis with High-Dimensional Dependencies. IEEE Trans. Power Syst. 2018, 33, 2984–2994. [Google Scholar] [CrossRef]
Yang, M.; Chen, X.X.; Du, J.; Cui, Y. Ultra-Short-Term Multistep Wind Power Prediction Based on Improved EMD and Reconstruction Method Using Run-Length Analysis. IEEE Access 2018, 6, 31908–31917. [Google Scholar] [CrossRef]
Yang, M.; Huang, B.Y.; Jiang, B. Real-time Wind Power Prediction Based on Probability Distribution and Gray Relational Decision-making. Proc. CSEE 2017, 37, 7099–7108. [Google Scholar]
Chen, H.; Zhang, J.Z.; Xu, C.; Tan, F.L. Short-term wind power forecast based on MOSTAR model. Power Syst. Prot. Control 2019, 47, 73–79. [Google Scholar]
Cui, M.J.; Sun, Y.J.; Ke, D.P. Wind Power Ramp Events Forecasting Based on Atomic Sparse Decomposition and BP Neural Networks. Autom. Electr. Power Syst. 2014, 38, 6–11. [Google Scholar]
Zendehboudi, A.; Baseer, M.A.; Saidur, R. Application of support vector machine models for forecasting solar and wind energy resources: A review. J. Clean. Prod. 2018, 199, 272–285. [Google Scholar] [CrossRef]
Zhao, R.Z.; Ding, Y.F. Short-term prediction of wind power based on MEEMD-KELM. Electr. Meas. Instrum. 2020, 57, 92–98. [Google Scholar]
Han, Z.F.; Jing, Q.M.; Zhang, Y.K.; Bai, R.Q.; Guo, K.M.; Zhang, Y. Review of wind power forecasting methods and new trends. Power Syst. Prot. Control 2019, 47, 178–187. [Google Scholar]
Muzaffar, S.; Afshari, A. Short-Term Load Forecasts Using LSTM Networks. Energy Procedia 2019, 158, 2922–2927. [Google Scholar] [CrossRef]
Kisvari, A.; Lin, Z.; Liu, X. Wind power forecasting—A data-driven method along with gated recurrent neural network. Renew. Energy 2021, 163, 1895–1909. [Google Scholar] [CrossRef]
Memarzadeh, G.; Keynia, F. A new short-term wind speed forecasting method based on fine-tuned LSTM neural network and optimal input sets. Energy Convers. Manag. 2020, 213, 112824. [Google Scholar] [CrossRef]
Chen, G.G.; Li, L.J.; Zhang, Z.Z.; Li, S.Y. Short-Term Wind Speed Forecasting with Principle-Subordinate Predictor Based on Conv-LSTM and Improved BPNN. IEEE Access 2020, 8, 67955–67973. [Google Scholar] [CrossRef]
De Melo, G.A.; Sugimoto, D.N.; Tasinaffo, P.M.; Santos, A.H.; Cunha, A.M.; Dias, L.A. A New Approach to River Flow Forecasting: LSTM and GRU Multivariate Models. IEEE Lat. Am. Trans. 2019, 17, 1978–1986. [Google Scholar] [CrossRef]
Li, J.Q.; Li, Q.J. Wind Power Prediction Method Based on Kriging and LSTM Network. Acta Autom. Sin. 2020, 41, 241–247. [Google Scholar]
Wang, Y.N.; Xie, D.; Wang, X.T.; Li, G.J.; Zhu, M.; Zhang, Y. Prediction of Interaction Between Grid and Wind Farms Based on PCA-LSTM Model. Proc. CSEE 2019, 39, 4070–4081. [Google Scholar]
Shi, J.R.; Zhao, D.M.; Wang, L.H.; Jiang, T.X. Short-term wind power prediction based on RR-VMD-LSTM. Power Syst. Prot. Control 2021, 49, 63–70. [Google Scholar]
Zhang, Q.; Tang, Z.H.; Wang, G.; Yang, Y.; Tong, Y. Ultra-Short-Term Wind Power Prediction Model Based on Long and Short Term Memory Network. Acta Autom. Sin. 2021, 42, 275–281. [Google Scholar]
Lin, J.; Ma, J.; Zhu, J.G.; Cui, Y. Short-Term Load Forecasting Based on LSTM Networks Considering Attention Mechanism. Int. J. Electr. Power Energy Syst. 2022, 137, 107818. [Google Scholar] [CrossRef]
Zhu, K.D.; Li, Y.P.; Mao, W.B.; Li, F.; Yan, J.H. LSTM enhanced by dual-attention-based encoder-decoder for daily peak load forecasting. Electr. Power Syst. Res. 2022, 208, 107860. [Google Scholar] [CrossRef]
Chung, W.H.; Gu, Y.H.; Yoo, S.J. District heater load forecasting based on machine learning and parallel CNN-LSTM attention. Energy 2022, 246, 123350. [Google Scholar] [CrossRef]
Qu, J.Q.; Qian, Z.; Pei, Y. Day-ahead hourly photovoltaic power forecasting using attention-based CNN-LSTM neural network embedded with multiple relevant and target variables prediction pattern. Energy 2021, 232, 120996. [Google Scholar] [CrossRef]
Hu, H.L.; Wang, L.; Tao, R. Wind speed forecasting based on variational mode decomposition and improved echo state network. Renew. Energy 2021, 164, 729–751. [Google Scholar] [CrossRef]
Zhang, D.; Peng, X.G.; Pan, K.D.; Liu, Y. A novel wind speed forecasting based on hybrid decomposition and online sequential outlier robust extreme learning machine. Energy Convers. Manag. 2019, 180, 338–357. [Google Scholar] [CrossRef]
Zhao, Q.; Huang, J.T. On ultra-short-term wind power prediction based on EMD-SA-SVR. Power Syst. Prot. Control 2020, 48, 89–96. [Google Scholar]
Huang, B.Z.; Yang, J.H.; Shen, H.; Ye, J.G.; Huang, J.H. Power Optimization Control of District Drive Wave Power System Based on FFT. Acta Autom. Sin. 2021, 42, 206–213. [Google Scholar]
Liang, S.J. Research on Time-domain Waveform Optimization of Digital Echo Signal Based on FFT. Fire Control Command Control 2021, 46, 73–77. [Google Scholar]
Fang, B.W.; Liu, D.C.; Wang, B.; Yan, B.K.; Wang, X.T. Short-term wind speed forecasting based on WD-CFA-LSSVM model. Power Syst. Prot. Control 2016, 44, 37–43. [Google Scholar]
Tian, Z.D.; Li, S.J.; Wang, Y.H.; Gao, X.W. Short-Term Wind Speed Combined Prediction for Wind Farms Based on Wavelet Transform. Transactions of China. Electrotech. Soc. 2015, 30, 112–120. [Google Scholar]
Chen, Y.R.; Dong, Z.K.; Wang, Y.; Su, J.; Han, Z.L.; Zhou, D.; Zhang, K.; Zhao, Y.S.; Bao, Y. Short-term wind speed predicting framework based on EEMD-GA-LSTM method under large scaled wind history. Energy Convers. Manag. 2021, 227, 113559. [Google Scholar] [CrossRef]
Fan, C.D.; Ding, C.K.; Zheng, J.H.; Xiao, L.Y.; Ai, Z.Y. Empirical Mode Decomposition based Multi-objective Deep Belief Network for short-term power load forecasting. Neurocomputing 2020, 388, 110–123. [Google Scholar] [CrossRef]
Jiang, Z.Y.; Che, J.X.; Wang, L. Ultra-short-term wind speed forecasting based on EMD-VAR model and spatial correlation. Energy Convers. Manag. 2021, 250, 114919. [Google Scholar] [CrossRef]
Sulaiman, S.M.; Jeyanthy, P.A.; Devaraj, D.; Shihabudheen, K.V. A novel hybrid short-term electricity forecasting technique for residential loads using Empirical Mode Decomposition and Extreme Learning Machines. Comput. Electr. Eng. 2022, 98, 107663. [Google Scholar] [CrossRef]
Wang, X.; Huang, K.; Zheng, Y.H.; Li, L.X.; Lang, Y.B.; Wu, H. Short-term Forecasting Method of Photovoltaic Output Power Based on PNN/PCA/SS-SVR. Autom. Electr. Power Syst. 2016, 40, 156–162. [Google Scholar]
Liu, J.; Wang, X.; Hao, X.D.; Chen, Y.F.; Ding, K.; Wang, N.B.; Niu, S.B. Photovoltaic Power Forecasting Based on Multidimensional Meteorological Data and PCA-BP Neural Network. Power Syst. Clean Energy 2017, 33, 122–129. [Google Scholar]
Malvoni, M.; De Giorgi, M.G.; Congedo, P.M. Photovoltaic forecast based on hybrid PCA–LSSVM using dimensionality reducted data. Neurocomputing 2016, 211, 72–83. [Google Scholar] [CrossRef]
Jian, Z.H.; Deng, P.J.; Zhu, Z.C. High-dimensional covariance forecasting based on principal component analysis of high-frequency data. Econ. Model. 2018, 75, 422–431. [Google Scholar] [CrossRef]
Zhang, Y.G.; Chen, B.; Pan, G.F.; Zhao, Y. A novel hybrid model based on VMD-WT and PCA-BP-RBF neural network for short-term wind speed forecasting. Energy Convers. Manag. 2019, 195, 180–197. [Google Scholar] [CrossRef]
Ziane, A.; Necaibia, A.; Sahouane, N.; Dabou, R.; Mostefaoui, M.; Bouraiou, A.; Khelifi, S.; Rouabhia, A.; Blal, M. Photovoltaic output power performance assessment and forecasting: Impact of meteorological variables. Sol. Energy 2021, 220, 745–757. [Google Scholar] [CrossRef]
Zhang, S.Y.; Yang, K.; Xia, C.M.; Jin, C.L.; Wang, Y.Q.; Yan, H.X. Research on Feature Reduction and Classification of Pulse Signal Based on Random Forest. Mod. Tradit. Chin. Med. Mater. Mater. World Sci. Technol. 2020, 22, 2418–2426. [Google Scholar]
Yu, W.M.; Zhang, T.T.; Shen, D.J. County-level spatial pattern and influencing factors evolution of carbon emission intensity in China: A random forest model. China Environ. Sci. 2022. [Google Scholar] [CrossRef]
An, Q.; Wang, Z.B.; An, G.Q.; Li, Z.; Chen, H.; Li, Z.; Wang, Y.Q. Non-intrusive Load Identification Method Based on RF-GA-ELM. Sci. Technol. Eng. 2022, 22, 1929–1935. [Google Scholar]
Lam, K.L.; Cheng, W.Y.; Su, Y.T.; Li, X.J.; Wu, X.Y.; Wong, K.H.; Kwan, H.S.; Cheung, P.C.K. Use of random forest analysis to quantify the importance of the structural characteristics of beta-glucans for prebiotic development. Food Hydrocoll. 2020, 108, 106001. [Google Scholar] [CrossRef]
Chen, Y.Y.; Zheng, W.Z.; Li, W.B.; Huang, Y.M. Large group activity security risk assessment and risk early warning based on random forest algorithm. Pattern Recognit. Lett. 2021, 144, 1–5. [Google Scholar] [CrossRef]
Makariou, D.; Barrieu, P.; Chen, Y.N. A random forest based approach for predicting spreads in the primary catastrophe bond market. Insur. Math. Econ. 2021, 101, 140–162. [Google Scholar] [CrossRef]
Meng, Y.R.; Yang, M.X.; Liu, S.; Mou, Y.L.; Peng, C.H.; Zhou, X.L. Quantitative assessment of the importance of bio-physical drivers of land cover change based on a random forest method. Ecol. Inform. 2021, 61, 101204. [Google Scholar] [CrossRef]
Li, D.; Jiang, F.X.; Chen, M.; Qian, T. Multi-step-ahead wind speed forecasting based on a hybrid decomposition method and temporal convolutional networks. Energy 2022, 238, 121981. [Google Scholar] [CrossRef]
Zhang, Y.G.; Zhao, Y.; Kong, C.H.; Chen, B. A new prediction method based on VMD-PRBF-ARMA-E model considering wind speed characteristic. Energy Convers. Manag. 2020, 203, 112254. [Google Scholar] [CrossRef]
Hu, S.; Xiang, Y.; Huo, D.; Jawad, S.; Liu, J.Y. An improved deep belief network based hybrid forecasting method for wind power. Energy 2021, 224, 120185. [Google Scholar] [CrossRef]
Li, J.; Zhang, S.Q.; Yang, Z.N. A wind power forecasting method based on optimized decomposition prediction and error correction. Electr. Power Syst. Res. 2022, 208, 107886. [Google Scholar] [CrossRef]
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
Chen, C.; Chen, D.L.; He, L.K. Short-term Forecast of Urban Natural Gas Load Based on BPNN-EMD-LSTM Combined Model. Saf. Environ. Eng. 2019, 26, 149–154. [Google Scholar]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A-Math. Phys. Eng. Sci. 1998, 45, 903–995. [Google Scholar] [CrossRef]
Zhou, S.L.; Mao, M.Q.; Su, J.H. Prediction of Wind Power Based on Principal Component Analysis and Artificial Neural Network. Power Syst. Technol. 2011, 35, 128–132. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Zhu, Q.M.; Li, H.Y.; Wang, Z.Q.; Chen, J.F.; Wang, B. Short-Term Wind Power Forecasting Based on LSTM. Power Syst. Technol. 2017, 41, 3797–3802. [Google Scholar]

Figure 1. Basic structure of LSTM.

Figure 2. The original wind power time series.

Figure 3. The original wind speed time series.

Figure 4. LASSO regression coefficients.

Figure 5. EMD-PCA-RF-LSTM-based wind power forecasting process.

Figure 6. EMD decomposition results: (a) wind speed; (b) wind direction; (c) pressure; (d) temperature; (e) humidity.

Figure 7. Results for feature importance.

Figure 8. The results obtained using the different models.

Figure 9. The absolute error values of the different models.

Table 1. Data field names and explanations.

Data Field Name	Field Name Translation	Data Field Explanation
DATE	observation time point	From 0:00 on 1 October 2019 to 17:00 on 30 October 2020, observe every 15 min
humidity	humidity	Weather forecast humidity data (RH)
temp	temperature	Weather forecast temperature data (°C)
pressure	air pressure	Weather Forecast Air Pressure Data (KPa)
air_density	air density	Weather forecast air density data (kg/m³)
wind_dir	wind direction	Actual wind direction data from the wind measurement tower in the field (°)
wind_speed	wind speed	Actual wind speed data measured by wind measurement tower in the field (m/s)
power	wind power	Wind power output (MW)

Table 2. Coefficients of variables at

λ = 0.0005

.

Table 2. Coefficients of variables at

λ = 0.0005

.

Variable	Coefficient	Variable	Coefficient
wind_speed	1.614955	air_density	0
wind_dir	0.097354	humidity	−0.11514
pressure	0.006481	temp	−0.12377

Table 3. The number of IMFs and the residuals.

Variable	IMFs	Residuals
wind_speed	14	1
wind_dir	17	1
pressure	10	1
temp	11	1
humidity	14	1

Table 4. The principal component eigenvalues and their accumulated contribution rates.

Component	Initial Eigenvalue			Component	Initial Eigenvalue
Component	Total	PV ¹	Accumulated %	Component	Total	PV ¹	Accumulated %
1	6.485	9.133	9.133	27	1.023	1.440	60.500
2	2.561	3.607	12.740	28	1.014	1.428	61.928
3	2.166	3.051	15.791	29	1.012	1.425	63.353
4	1.983	2.793	18.584	30	0.998	1.405	64.758
5	1.921	2.706	21.290	31	0.995	1.401	66.159
6	1.843	2.596	23.885	32	0.991	1.396	67.556
7	1.705	2.402	26.287	33	0.983	1.384	68.940
8	1.561	2.198	28.485	34	0.970	1.366	70.306
9	1.466	2.065	30.550	35	0.950	1.339	71.644
10	1.461	2.058	32.608	36	0.949	1.336	72.981
11	1.400	1.972	34.580	37	0.929	1.309	74.290
12	1.345	1.894	36.474	38	0.924	1.301	75.591
13	1.309	1.843	38.317	39	0.922	1.298	76.889
14	1.282	1.806	40.124	40	0.897	1.263	78.152
15	1.225	1.725	41.849	41	0.893	1.258	79.410
16	1.214	1.710	43.558	42	0.864	1.217	80.627
17	1.190	1.677	45.235	43	0.851	1.198	81.824
18	1.159	1.633	46.868	44	0.835	1.176	83.001
19	1.146	1.615	48.482	45	0.832	1.171	84.172
20	1.108	1.561	50.043	46	0.807	1.137	85.309
21	1.094	1.540	51.583	47	0.785	1.105	86.414
22	1.087	1.530	53.114	48	0.774	1.091	87.505
23	1.075	1.514	54.628	…	…	…	…
24	1.057	1.488	56.116	69	0.001	0.002	99.999
25	1.053	1.483	57.599	70	0.000	0.001	100.000
26	1.037	1.461	59.060	71	7.980 × 10⁻⁵	0.000	100.000

¹ PV means Percentage of Variance.

Table 5. Importance value of each feature.

Variable	Importance	Variable	Importance
FAC7	0.032787	FAC17	0.022616
FAC1	0.028421	FAC21	0.022217
FAC5	0.026995	FAC13	0.022152
FAC2	0.025839	FAC23	0.022139
FAC6	0.024718	FAC15	0.022078
FAC41	0.024562	FAC43	0.021977
FAC22	0.023467	…	…
FAC8	0.023178	FAC29	0.019019
FAC14	0.022976	FAC30	0.018696
FAC19	0.022931	FAC31	0.01834

Table 6. The LSTM model parameters.

Parameter	Value of Parameter
train_size	0.9
epochs	100
batch_size	48
lr	0.0001
activation	relu
optimizer	adam
dense	1/2/3
train_size	0.9
epochs	100
batch_size	48

Table 7. Error evaluation of different dense numbers at importance = 0.025.

	MSE	RMSE	MAE	R²
dense = 1	11.33006	3.36602	2.43616	0.953208
dense = 2	7.30723	2.70319	1.70896	0.969822
dense = 3	7.28741	2.69952	1.73744	0.969904

Table 8. Error evaluation of different dense numbers at importance = 0.024.

	MSE	RMSE	MAE	R²
dense = 1	8.64442	2.94014	2.15888	0.964299777
dense = 2	7.35007	2.7111	1.72885	0.969645278
dense = 3	7.69203	2.77345	1.83434	0.968233029

Table 9. Error evaluation of different dense numbers at importance = 0.023.

	MSE	RMSE	MAE	R²
dense = 1	8.4885	2.9135	2.0978	0.964943737
dense = 2	7.64559	2.76507	1.8047	0.968424821
dense = 3	7.26711	2.69576	1.73981	0.969987898

Table 10. The error evaluation results of the models.

	MSE	RMSE	MAE	adj-R²	The Percentage Enhancement of R²
SVM	119.6383	10.93793	8.37387	0.5052151	/
BP	74.04614	8.60501	6.512	0.6937693	/
LSTM	12.42231	3.52453	2.6655	0.9486257	/
EMD-PCA-RF-SVM	68.24442	8.26102	5.02623	0.7175243	42.02% (Compared to SVM)
EMD-PCA-RF-BP	35.44227	5.95334	3.73575	0.8532989	22.99% (Compared to BP)
EMD-RF-LSTM	8.18422	2.86081	1.87817	0.9655112	1.77% (Compared to LSTM)
EMD-PCA-LSTM	9.56465	3.09268	2.20952	0.9599812	1.19% (Compared to LSTM)
EMD-PCA-RF-LSTM	7.26711	2.69576	1.73981	0.9699203	2.24% (Compared to LSTM)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Cui, X.; Niu, D. Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF. Sustainability 2022, 14, 7307. https://doi.org/10.3390/su14127307

AMA Style

Wang D, Cui X, Niu D. Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF. Sustainability. 2022; 14(12):7307. https://doi.org/10.3390/su14127307

Chicago/Turabian Style

Wang, Dongyu, Xiwen Cui, and Dongxiao Niu. 2022. "Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF" Sustainability 14, no. 12: 7307. https://doi.org/10.3390/su14127307

APA Style

Wang, D., Cui, X., & Niu, D. (2022). Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF. Sustainability, 14(12), 7307. https://doi.org/10.3390/su14127307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wind Power Forecasting Based on LSTM Improved by EMD-PCA-RF

Abstract

1. Introduction

2. Methods

2.1. LASSO Algorithm

2.2. EMD Algorithm

2.3. PCA-RF Combined Algorithm

2.3.1. PCA Algorithm

2.3.2. RF Algorithm

2.4. LSTM Algorithm

3. Data Pre-Processing and Model Design

3.1. Data Sources

3.2. Environmental Factors Selection Based on LASSO

3.3. Construction of Wind Power Forecasting Model

3.3.1. Wind Power Forecasting Model Based on EMD-PCA-RF-LSTM

3.3.2. Model Evaluation Indicators

4. Case Study

4.1. Decomposition by EMD

4.2. Dimensionality Reduction by PCA

4.3. Feature Extraction by RF

4.4. Results and Comparison

4.4.1. Model Parameters

4.4.2. Determination of RF Threshold

4.4.3. Analysis of Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI