An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction

Lv, Muyao; Liu, Zejia; Wang, Guoqing; Zhang, Chao; Liu, Yanling; Luo, Chao; Yu, Jiawei; Zhu, Yihua

doi:10.3390/en19112666

Open AccessArticle

An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction

by

Muyao Lv

¹,

Zejia Liu

¹,

Guoqing Wang

¹,

Chao Zhang

^1,*

,

Yanling Liu

¹,

Chao Luo

²,

Jiawei Yu

² and

Yihua Zhu

²

¹

School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China

²

State Key Laboratory of HVDC, Electric Power Research Institute, China Southern Power Grid, Guangzhou 510663, China

^*

Author to whom correspondence should be addressed.

Energies 2026, 19(11), 2666; https://doi.org/10.3390/en19112666

Submission received: 11 April 2026 / Revised: 27 May 2026 / Accepted: 29 May 2026 / Published: 31 May 2026

(This article belongs to the Special Issue Energy Storage Systems: Modeling, State Estimation and Optimal Dispatch)

Download

Browse Figures

Versions Notes

Abstract

For enhancing the level of refinement of short-horizon wind-storage power prediction, this paper introduces an advanced BiLSTM prediction model integrating data preprocessing based on the density-based clustering technique known as DBSCAN, partial least squares regression (PLSR), and particle swarm optimization (PSO). In this paper, “wind-storage power” refers to the net power output of a wind farm integrated with a battery energy storage system (BESS), where the measured data already embed the effects of charge/discharge operations. First, outage and missing data are removed from the historical dataset. DBSCAN is then employed to identify abnormal samples in wind-storage power and meteorological variables, such as wind speed, wind direction, atmospheric pressure, temperature, and humidity, and linear regression is used to correct the detected noise points. Correlation analysis is further conducted to identify the most relevant meteorological inputs, namely wind speed, wind direction, and atmospheric pressure. Next, the PLSR model is applied to generate the preliminary prediction of wind-storage output. On this basis, the BiLSTM network is employed to predict the residual error, which mainly reflects the nonlinear characteristics not captured by the preliminary prediction. Meanwhile, PSO is implemented to determine the most suitable core hyperparameters for the BiLSTM architecture. Ultimately, the preliminary PLSR result is corrected by the predicted residual to obtain the final wind-storage power prediction. The DBSCAN parameters are systematically selected via a k-distance plot (ε = 0.9, MinPts = 2.5), and the PLSR number of components is set to A = 3 based on five-fold cross-validation. Case studies show that, for the 24 h prediction horizon, the proposed method improves prediction accuracy by 2.29%, 11.47%, and 5.54% compared with the BP, Wavelet-LSTM, and standard LSTM models, respectively. Furthermore, statistical significance is confirmed by Diebold–Mariano tests and 10-run confidence intervals.

Keywords:

wind-storage power prediction; BiLSTM; DBSCAN; PLSR

1. Introduction

Renewable energy has become increasingly important in modern power systems. Among the available renewable resources, wind-storage power has been widely integrated due to its sustainability, environmental benefits, and cost competitiveness [1]. Nevertheless, the intrinsic intermittency and stochastic variability of wind-storage resources create substantial challenges to secure grid operation and economic dispatch [2]. Wind energy forecasting serves as a fruitful approach to mitigate these challenges. Through predicting future wind-storage power output and enabling more coordinated dispatch strategies, the negative impacts associated with wind-storage power intermittency can be significantly reduced. Therefore, for the large-scale integration of wind energy into power systems, accurate and dependable wind-storage power forecasting is indispensable [3,4,5,6]. Wind-storage hybrid systems, which couple wind farms with battery energy storage, have emerged as a promising solution to mitigate the intermittency of wind power. The storage system can absorb or release power according to grid needs, thereby improving dispatchability and reducing curtailment. Consequently, accurate prediction of the net power output of such wind-storage systems is essential for optimal energy management and grid integration.

In general, wind-storage power prediction approaches can be classified into two categories: physical and statistical model-based approaches [7,8,9]. Physical models are developed by characterizing the physical relationships among meteorological variables, geographical features, and wind speed. Wind speed is first estimated using equations derived from atmospheric dynamics and meteorological data and subsequently transformed into power output through the turbine power curve. A representative example is the numerical weather prediction (NWP) model, which has been extensively utilized in wind-storage energy prediction [10,11,12]. Physical models typically do not rely on historical power generation data. In contrast, statistical models forecast future power output through the extraction of statistical dependencies and temporal patterns from historical observations. Representative techniques include the Kalman filter, autoregressive (AR) models [6], support vector machines (SVMs) [11], random forests, and XGBoost [13]. For example, Bouyeddou et al. [14] evaluated PCR and PLSR models for wind-storage power forecasting and presented determination coefficients (R²) equal to 0.930 and 0.931, respectively. Although these methods are relatively efficient and practical, their capability to capture the highly nonlinear and time-varying characteristics of wind-storage power is still limited.

Over the past decade, the rapid advancement of neural networks and deep learning has further promoted their application in wind-storage power forecasting [15,16]. Architectures including back propagation neural networks, long short-term memory networks, and convolutional neural networks have been broadly investigated. In ref. [17], a BP neural network tuned by the t-Tent-SSA method was proposed to enhance prediction accuracy. Chen et al. [18] combined CNN with genetic algorithms to estimate wind-storage-related variables, achieving a correlation coefficient of 0.835. Ref. [19] divided wind speed samples into different intervals according to the Beaufort wind scale. The study authors then employed TimeGAN to amplify samples in high wind speed intervals where data were scarce and subsequently used a neural network for power forecasting. With this sample expansion, the RMSE decreased by 2.57% compared to traditional LSTM networks. Hu et al. [20] established a BiLSTM-Attention algorithm to forecast wind-storage power output and utilized the Whale Optimization Algorithm to select optimal network hyperparameters, achieving an R² of 93.23%. In ref. [21], a self-attention mechanism was introduced to develop a hybrid model combining an SATCN and an LSTM. The proposed model reduced the RMSE by 17.56% and 10.99% compared with standard LSTM and TCN, respectively. In ref. [22], a wavelet activation kernel was introduced into the LSTM layer to develop a deep learning-based WN-LSTM model, achieving an MAE of 0.0050. In ref. [23], a CNN–LSTM model whose hyperparameters have been optimized using the Coati Optimization Algorithm (COA) was proposed, reducing the RMSE by 0.5% and 5.8% for day-ahead and hour-ahead short-term forecasting, respectively. Despite these advances, existing deep learning methods still face two important challenges. On the one hand, the quality of input data has a considerable influence on their prediction performance. On the other hand, it remains difficult to effectively combine preliminary linear prediction with subsequent residual correction within a unified forecasting framework.

The literature has explored hybrid strategies that combine linear models with neural networks to improve forecasting accuracy. For instance, Bouyeddou et al. [14] evaluated PCR and PLSR for wind power forecasting, achieving R² values of 0.930 and 0.931, respectively, confirming the effectiveness of PLSR as a baseline linear predictor. In terms of deep learning enhancement, Kari et al. [24] constructed a BiLSTM network to capture temporal correlations and combined it with wavelet transform and an improved genetic algorithm, demonstrating that hybrid configurations can further enhance prediction accuracy. A systematic review by Hanifi et al. [25] classified wind power forecasting methods into physical, statistical, and hybrid categories, highlighting that hybrid models currently achieve the best performance. These findings motivate our PLSR-BiLSTM residual correction framework.

Additionally, the quality of the original dataset significantly affects prediction accuracy. During the operation of wind-storage power plants, anomalous data are frequently observed due to downtime, malfunctions, and other factors. Although these outliers typically constitute a small portion of the dataset, they can significantly degrade prediction performance. Therefore, eliminating anomalous data is crucial [26]. Common approaches include missing value imputation via interpolation, normalization to mitigate dimensionality effects [26], wavelet transform-based data decomposition [27], and clustering-based methods [28] to capture correlations between historical data and predicted values. However, recent studies have pointed out that decomposition-based forecasting models, including wavelet-based methods, may suffer from the boundary issue if future data are inadvertently used during decomposition, leading to unrealistically high performance [29]. In the context of wind turbine SCADA data, recent studies have systematically compared DBSCAN with other anomaly detection methods. Pawlik [30] evaluated Autoencoder (AE), LSTM-Autoencoder (LSTM-AE), One-Class SVM (OCSVM), and Isolation Forest (IF), concluding that reconstruction-based approaches are effective for capturing temporal dependencies. Similarly, Gück et al. [31] reported that LSTM-AE outperformed Isolation Forest in a benchmark study. More importantly, Mehmood and Wang [32] proposed a hybrid iForest-DBSCAN framework for anomaly detection and power curve modelling, achieving over 99% accuracy in both offshore and onshore wind farms, further demonstrating that DBSCAN-based methods can be effectively integrated with other techniques to enhance detection performance. These works confirm that DBSCAN remains advantageous for identifying arbitrarily shaped clusters without requiring prior data labeling. While clustering algorithms are widely adopted, the conventional K-means method fails to identify clusters with complex (e.g., nested) structures. Therefore, density-based approaches, particularly DBSCAN, have been introduced to address this limitation. Nevertheless, data preprocessing alone is not sufficient to fully improve forecasting performance, and an effective prediction framework is still needed to further enhance model accuracy.

Despite these advances, existing hybrid forecasting methods have several limitations that are often overlooked. First, decomposition-based approaches (e.g., Wavelet-LSTM) suffer from the boundary issue when applied in strict out-of-sample multi-step forecasting, as recently reviewed by Chen et al. [29]; using future data during decomposition artificially inflates performance, while a truly online setting leads to error accumulation. Second, most residual-correction models are designed for pure wind power and do not account for the additional noise and dynamics introduced by battery charge/discharge in wind-storage hybrid systems. Third, although DBSCAN has been used for anomaly detection [30,31,32], few studies systematically integrate it with both linear (PLSR) and nonlinear (BiLSTM) components for wind-storage power prediction, nor do they provide rigorous statistical significance tests such as Diebold–Mariano or multiple-run confidence intervals. Therefore, a framework that explicitly handles storage-induced noise, combines linear extraction with residual learning, and offers comprehensive statistical validation is still lacking. To fill these gaps, the work presented herein proposes an advanced BiLSTM prediction model with the following contributions.

The main contributions of this work are as follows:

(1): We propose a wind-storage hybrid system-oriented forecasting framework that explicitly accounts for battery charge/discharge dynamics and the resulting noise in net power measurements.
(2): We integrate DBSCAN-based anomaly detection with linear regression correction to clean wind-storage power data, which is shown to significantly improve prediction accuracy.
(3): We develop a PLSR + BiLSTM residual correction architecture that decomposes the prediction task into linear (PLSR) and nonlinear (BiLSTM) parts and optimize the BiLSTM hyperparameters via PSO.
(4): We provide extensive statistical validation including Diebold–Mariano tests, multiple-run confidence intervals, and comparison with a Wavelet-LSTM baseline.

The remainder of this paper is organized as follows. Section 2 describes the power characteristics of wind-storage hybrid systems. Section 3 presents the DBSCAN-based noise identification and correction method. Section 4 introduces the PLSR-BiLSTM-PSO forecasting model, including principles and cross-validation for PLSR components. Section 5 presents the case study, experimental settings, comparative results, and statistical significance analysis. Section 6 concludes the paper.

To ensure comparability with established benchmarks in wind power forecasting, our case study follows similar validation protocols as the widely used GEFCom2014 dataset [33], which includes NWP and power observations from multiple wind farms. Results obtained from simulations confirm that, compared with BP, Wavelet-LSTM, and standard LSTM models, the proposed method improves prediction accuracy by 2.29%, 11.47%, and 5.54% for the 24 h horizon, respectively.

The main variables and parameters used in this study are summarized in Table 1:

2. Power Characteristics of Wind-Storage Hybrid Systems

A wind-storage hybrid system consists of a wind farm and a battery energy storage system (BESS) connected at the point of common coupling. The net output power

P_{n e t}

is the algebraic sum of wind power

P_{w i n d}

and storage power

P_{s t o r a g e}

(positive when discharging, negative when charging), as expressed in Equation (1):

P_{n e t} = P_{w i n d} + P_{s t o r a g e}

(1)

The BESS can smooth fluctuations, provide frequency regulation, and store excess wind energy. However, the actual net power is also influenced by the state of charge (SoC) of the battery and the control strategy. In practice, measured data from wind-storage plants often contain anomalies caused by both wind turbine malfunctions and battery operation constraints (e.g., forced charging/discharging limits). Figure 1 shows the raw distribution of wind speed versus net power collected from an actual wind-storage farm in Gansu Province over 1 month.

When wind speed falls within the normal operating range of the turbine, wind speed and wind-storage-generated power usually exhibit a strong positive correlation. This characteristic can be used for effective noise identification. According to Figure 1, the raw data on wind-storage power velocity have many abnormal points. Owing to the temporal dependence of wind-storage power data, such anomalies cannot accurately reflect the genuine operating conditions of the turbines and may degrade prediction accuracy. Therefore, systematic data preprocessing is essential before model construction.

Table 2 summarizes the typical types of noise data. Noise caused by mechanical failures or human interventions can significantly distort the normal data distribution. This distortion interferes with supervised model training and weakens the capacity for generalization of the prediction model. Therefore, systematic noise identification and correction can effectively reduce data interference and improve overall wind-storage power prediction performance.

It should be noted that the data used in this study correspond to the net power output of the wind-storage hybrid system, where the effects of battery charge/discharge are already embedded in the measured power values. The specific battery type is lithium-ion with a rated capacity of 48 MWh and a rated power of 24 MW. The control strategy is peak shaving and valley filling, power smoothing, and tracking the dispatched power schedule, thereby improving the dispatchability and economy of the wind-storage hybrid system. Our prediction model does not require explicit modelling of the battery’s internal dynamics, as the net power time series contains the historical impact of storage operations.

3. Noise Identification

3.1. DBSCAN Algorithm

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering technique. In contrast to the K-means, it eliminates the need for predefining the cluster count and is proficient in recognizing clusters of various shapes. Moreover, DBSCAN is more suitable for datasets with irregular sample distributions and noise points, whereas K-means often performs poorly when handling non-convex data distributions.

DBSCAN relies on two key parameters, (

ϵ

, MinPts), to characterize the local density of samples. Here,

ϵ

denotes the neighborhood radius of a sample, and MinPts denotes the minimal count of neighboring points essential for that sample to be considered a core point. Based on these two parameters, DBSCAN classifies samples into core points, border points, and noise points, thereby identifying dense regions and isolating abnormal observations.

Figure 2 illustrates the basic principle of DBSCAN. Once MinPts is specified, the red circles represent core points, the yellow circles represent density-reachable points, and the blue circles represent noise points. Figure 3 illustrates the primary execution stages of the DBSCAN.

3.2. Noise Recognition

The performance of DBSCAN is significantly influenced by the selection of

ϵ

and MinPts. A smaller

ϵ

or a larger MinPts reduces the number of core points, which may cause normal samples to be incorrectly identified as noise and increase the number of clusters. Conversely, a larger

ϵ

or a smaller MinPts increases the number of core points, which may cause some abnormal samples to be absorbed into normal clusters and remain undetected. Therefore, appropriate parameter selection is essential for reliable noise identification.

As shown in Figure 4, there is a clear correlation between wind speed and wind-storage power when the turbine operates within its normal range. Generally, wind-storage power increases with wind speed under normal conditions. However, some samples deviate significantly from this relationship due to sensor malfunctions or data acquisition failures; these should be identified as noise points. Figure 4b shows that inappropriate parameter settings incorrectly classify many high-density normal samples as noise, while Figure 4c shows that many low-density abnormal samples remain undetected. In contrast, the parameter setting in Figure 4a (ε = 0.9, MinPts = 2.5) better preserves the original wind speed vs. wind-storage power distribution and effectively identifies abnormal samples. Therefore, the selected parameters can reduce the interference of noisy data in subsequent time-series forecasting.

To determine the DBSCAN parameter ε in a systematic way, we construct a k-distance plot. For each data point (wind speed, power), we compute its distance to the k-th nearest neighbor, where k = MinPts − 1. Here we set MinPts = 2.5 (approximately 2 × dimensionality, with the features being wind speed and power), so k = 2. The k-distance values are then sorted in descending order and plotted against the point index, as shown in Figure 5. The curve drops sharply from a large value (over 7) and begins to flatten at a distance around 0.9. This “elbow” point indicates the threshold where points transition from sparse (noise) to dense (cluster) regions. Therefore, we choose the elbow value as the neighborhood radius: ε = 0.9. Figure 4a demonstrates that, with ε = 0.9 and MinPts = 2.5, the DBSCAN clustering successfully identifies abnormal samples while preserving the normal wind-speed–power relationship. In contrast, inappropriate parameter settings (Figure 4b,c) either misclassify normal points as noise or fail to detect abnormal points.

3.3. Noise Data Correction

After the noise points are identified, they must be further processed. Common treatment methods include direct deletion and interpolation. When the noisy data account for only a small and discontinuous portion of the original dataset, they may be directly removed. Although deletion is simple to implement, excessive data removal leads to information loss and may further reduce forecasting accuracy.

Alternatively, interpolation replaces abnormal samples with estimated values derived from normal data. Compared with direct deletion, interpolation preserves more of the original data information and imposes fewer restrictions on the distribution of noisy samples. Traditional interpolation methods, such as mean-value or median-value replacement, are easy to implement. However, they may distort the inherent data distribution and consequently affect wind-storage power forecasting performance.

To overcome this drawback, a linear regression method is utilized in the present study to correct the identified noise points. Specifically, a regression equation is established using the non-noise samples, and the fitted relationship is then used to estimate the values of the abnormal points. This method can effectively handle continuous noisy samples while preserving the original data distribution as much as possible.

As shown in Figure 6, most of the corrected noise points are relocated to the neighborhood of the normal wind speed–storage power relationship after linear regression. Compared with the raw noisy data, the processed dataset better follows wind speed–storage power relationship. It also retains more valid information for subsequent forecasting.

4. Advanced BiLSTM Predictive Model

4.1. BiLSTM Principles

Among recurrent neural networks (RNNs), LSTM is a specific subtype. LSTM can significantly relieve the issues of gradient vanishing and explosion by integrating a cell’s stored state and three gate mechanisms, namely input, output, and forget gates. As a result, LSTM has stronger capabilities in historical data learning and long-term dependency modeling, and it is well suited for complex nonlinear problems.

A bidirectional LSTM (BiLSTM) network consists of two parallel LSTM networks and performs separate processing of the input sequence in forward and reverse directions. With such a structure, the model can capture context from time steps that are both past and future, thereby improving the completeness of feature extraction. The memory module organization of the BiLSTM network is visualized in Figure 7.

4.2. PSO Principles

The performance of the BiLSTM network is strongly influenced by its hyperparameters: as an example, the total count of hidden neurons. Therefore, it is essential to optimize these hyperparameters by applying the PSO algorithm.

Being a search algorithm, PSO features relatively fast convergence. During each iteration, the algorithm identifies the current best particle, and the remaining particles update their trajectories accordingly. Particles are updated by considering their own and the global best position [11]. The PSO algorithm performs the search by updating particle velocity and position. The iterative update formula is given in Equation (2).

V_{i} (k + 1) = w \times V_{i} (k) + c_{1} \times r_{1} \times (p_{i b e s t} - X_{i} (k)) + c_{2} \times r_{2} \times (g_{b e s t} - X_{i} (k)) X_{i} (k + 1) = X_{i} (k) + V_{i} (k + 1)

(2)

where

v_{i}

and

X_{i}

are the velocity and position of particle

i

at iteration k;

w

is the inertia weight;

c_{1}

and

c_{2}

are the cognitive and social acceleration coefficients, respectively;

r_{1}

and

r_{2}

are random numbers uniformly distributed in [0, 1];

p_{i b e s t}

is the personal best position of particle

i

; and

g_{b e s t}

is the global best position of the swarm.

Amone them, c₁ is a personal best influence factor, and c₂ is a global learning factor. w is the weight parameter.

4.3. PLSR Principles

Assume that there are

p

independent variables

{x_{1}, \dots, x_{p}}

. Let

X

and

Y

denote the explanatory-variable matrix and the response-variable matrix, respectively. The objective of PLSR is to retrieve the constituents

t_{1}

and

u_{1}

from

X

and

Y

. These components should retain information on variation to the greatest extent possible from their corresponding data matrices. At the same time, the interdependence of

t_{1}

and

u_{1}

ought to be maximized.

After the initial component pair

t_{1}

and

u_{1}

is extracted, regressions of

X

on

t_{1}

and

Y

on

u_{1}

are performed in sequence. If the regression formulas achieve the required accuracy, the extraction process stops. Otherwise, the residual information is used to extract the next pair of components, namely

t_{2}

and

u_{2}

. This procedure is repeated until the necessary level of accuracy is reached. If the sum of

A

components

(t_{1}, \dots, t_{A})

is ultimately isolated from

X

, it can be transformed into a regression equation of

Y

with respect to the original variables, and the PLSR model is then obtained. The detailed steps are as follows.

(1) Observe

n

sets of samples for the independent variables and form matrix

X

. The dependent variable forms matrix

Y

. The data matrices are defined in Equation (3):

X = [\begin{matrix} x_{11} & \dots & x_{1 p} \\ ⋮ & ⋱ & ⋮ \\ x_{n 1} & \dots & x_{n p} \end{matrix}], Y = [\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{matrix}]

(3)

where

X

is the

n \times p

matrix of independent variables,

Y

is the

n \times q

matrix of dependent variables, and

n

is the number of samples. The raw data matrices

X

and

Y

are standardized to obtain the normalized variable matrices

E_{0}

and

F_{0}

, respectively.

(2) Extract the primary pair of dominant components

t_{1}

and

u_{1}

. According to the principal component criterion, the covariance between

t_{1}

and

u_{1}

should be maximized. The covariance criterion is expressed as Equation (4):

C o v (t_{1}, u_{1}) \to \max

(4)

where

t_{1}

and

u_{1}

are the first pair of latent components extracted from

X

and

Y

, respectively, and

Cov

denotes the covariance. Here,

t_{1}

is a linear combination of

E_{0}

with weight vector

w_{1}

, that is,

t_{1} = E_{0} w_{1}

. Similarly,

u_{1}

is a linear combination of

F_{0}

with weight vector

c_{1}

, that is,

u_{1} = F_{0} c_{1}

, where

w_{1}

and

c_{1}

are unit vectors.

(3) The regression relationships among

E_{0}

,

F_{0}

,

t_{1}

, and

u_{1}

are established as Equations (5)–(7):

E_{0} = t_{1} p_{1}^{T} + E_{1}

(5)

F_{0} = u_{1} q_{1}^{T} + F_{1}^{*}

(6)

F_{0} = t_{1} r_{1}^{T} + F_{1}

(7)

where

E_{0}

and

F_{0}

are the standardized matrices of

X

and

Y

,

t_{1}

is the first component,

p_{1}

is the loading vector,

u_{1}

is the first component from

Y

,

F_{0}

is the standardized

Y

,

q_{1}

is the corresponding loading vector,

E_{1}

and

F_{1}

are the residual matrices after extracting the first component, and

E_{0}

,

F_{0}

,

t_{1}

, and

u_{1}

are as defined previously.

The regression vectors (loadings and coefficients) are obtained by ordinary least squares (OLS) regressions of the residual matrices on the extracted components. For the regression of

E_{0}

on

t_{1}

, the OLS estimate of the loading vector

p_{1}

minimizes

∥ E_{0} - t_{1} p_{1}^{T} ∥^{2}

, yielding

p_{1} = (t_{1}^{T} t_{1})^{- 1} t_{1}^{T} E_{0}

. Since

t_{1}

is normalized (

t_{1}^{T} t_{1} = 1

if implemented with unit norm; otherwise the denominator is

∥ t_{1} ∥^{2}

), this simplifies to

p_{1} = E_{0}^{T} t_{1} / {∥ t_{1} ∥}^{2}

. The same derivation applies to

q_{1}

and

r_{1}

[34]. The regression vectors are calculated using Equation (8):

p_{1} = \frac{E_{0}^{T} t_{1}}{{‖t_{1}‖}^{2}}, q_{1} = \frac{F_{0}^{T} u_{1}}{{‖u_{1}‖}^{2}}, r_{1} = \frac{F_{0}^{T} t_{1}}{{‖t_{1}‖}^{2}}

(8)

where

E_{1}

,

F_{1}^{*}

, and

F_{1}

correspond to the residual matrices.

(4) Replace

E_{0}

and

F_{0}

with the residual matrices

E_{1}

and

F_{1}

. Subsequently, derive the second pair of weight vectors

w_{2}

and

c_{2}

, as well as the subsequent pair of foremost components

t_{2}

and

u_{2}

.

(5) Repeat steps 3 and 4. If the rank of

E_{0}

equals

A

, the decomposition can be formulated as Equations (9) and (10):

E_{0} = t_{1} p_{1}^{T} + \dots + t_{A} p_{A}^{T}

(9)

F_{0} = t_{1} r_{1}^{T} + \dots + t_{A} r_{A}^{T} + F_{A}

(10)

where

E_{A}

and

F_{A}

are the residual matrices after extracting

A

components,

t_{i}

are the extracted components, and

p_{1}^{T}

is the loading vector.

(6) Since

t_{1}, \dots, t_{A}

can be represented as linear combinations of

E_{0}

, the above formulas can be transformed into the following PLSR regression equation. The final PLSR regression equation is given in Equation (11):

y = β_{1} x_{1} + β_{2} x_{2} + \dots β_{p} x_{p} + y_{h}

(11)

where

β

is the regression coefficient matrix, and

y_{h}

is the residual term.

A five-fold cross-validation is performed on the training set to determine the optimal number of PLSR components. As shown in Figure 8, the cross-validation mean squared error (CV-MSE) drops sharply from 2169.10 (A = 1) to 527.06 (A = 2) and further decreases to 469.67 (A = 3). Adding more components beyond three yields negligible improvement in CV-MSE and may risk overfitting. Therefore, we select A = 3 components for the PLSR model.

4.4. Data Standardization and Preprocessing Process

To prevent the BiLSTM network from becoming trapped in local optima during training and to eliminate the influence of different input dimensions, the input variables should be standardized to zero mean and unit variance. The input variables are standardized using Equation (12):

x_{i j} = \frac{x_{i j}^{'} - {\bar{x}}_{j}}{σ_{j}}

(12)

where

x_{i j}

is the

i

-th sample of the

j

-th input variable,

{\bar{x}}_{j}

is the mean of the

j

-th input variable, and

σ_{j}

is the standard deviation of the

j

-th input variable.

After the input variables are standardized, correlation analysis is further conducted to filter the input variables. This paper adopts the correlation measure defined by Pearson to quantify the linear interdependence between individual input features and the resulting power. The Pearson correlation coefficient is calculated as in Equation (13):

r = \frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}}

(13)

where

r

is the Pearson correlation coefficient between variables

x_{i}

and

y_{i}

,

n

is the number of samples, and

\bar{x}

and

\bar{y}

are the sample means.

The correlation strength is interpreted as follows. When

|r|

does not exceed 0.3, no linear correlation exists. A weak linear association is present provided that

|r|

satisfies

0.3 < |r| \leq 0.5

. For

|r|

greater than 0.5 but not exceeding 0.8, the linear relationship is considered significant. A strong linear relationship can be inferred when

|r| > 0.8

.

As shown in Table 3, temperature and humidity exhibit almost no linear correlation with output power. Therefore, they are excluded from the input variables used for linear regression. To better retain the unprocessed data information and lessen the disruption caused by noise in time-series data, raw data preprocessing is required before constructing the forecasting model. The detailed preprocessing procedure is presented in Figure 9.

The Spearman correlation and NMI confirm that wind speed maintains the strongest monotonic and mutual-information relationship with power, while temperature and humidity remain weakly correlated. Therefore, the same three variables (wind speed, direction, pressure) are retained as inputs for the PLSR component, consistent with the linear analysis.

4.5. Predictive Model Building

Based on the above data standardization, correlation analysis, and noise preprocessing, the predictive model is constructed using the processed dataset. Figure 10 illustrates the complete workflow of our advanced BiLSTM forecasting model.

First of all, outage data and missing data are removed from the raw dataset. DBSCAN is then used to identify noisy samples, and the detected abnormal samples are corrected by linear imputation. After that, correlation analysis is performed to select the input variables, so that noise interference can be reduced while the temporal characteristics of the original data are preserved as much as possible.

Next, the PLSR model serves to generate the preliminary prediction of wind-storage power. On this basis, the BiLSTM network is employed to forecast the residual error, with the aim of further optimizing prediction precision. Meanwhile, the PSO serves to select the optimal hyperparameters for the BiLSTM network.

The overall procedure of the proposed PLSR–PSO–BiLSTM forecasting framework is summarized in Algorithm 1:

Algorithm 1: PLSR–PSO–BiLSTM wind power forecasting method.

Input:
        Wind power dataset D
        Population size N
        Maximum iterations T Learning rate range

[l r_{\min}, l r_{\max}]

        Hidden units range H
        MiniBatchSize range B
Output:
        Predicted wind power sequence

\hat{Y}

1: Normalize dataset D using Min-Max normalization
2: Divide dataset into training set and testing set
3: Construct PLSR model using meteorological variables
4: Obtain preliminary prediction sequence

Y_{f i t}

5: Calculate residual sequence:

R = Y - Y_{f i t}

6: Construct BiLSTM input sequences:

X = [V_{w}, D_{w}, T, Y_{f i t}]

7: Initialize PSO population:

X_{i} = [B a t c h_{i}, L R_{i}, H i d d e n_{i}]

8: for t = 1 to T do
9: for each particle

X_{i}

do
10:                 Build BiLSTM model using:
                                 MiniBatchSize = Batch
                                 LearningRate = LR
                                 HiddenUnits = Hidden
11:                 Train BiLSTM model
12:                 Predict residual sequence

\hat{R}

13: Recover final prediction:

Y = Y_{f i t} + \hat{R}

14: Compute fitness value:

F i t n e s s (X_{i}) = - R^{2}

15: Update personal best position

P_{b e s t}

16: Update global best position

G_{b e s t}

17: end for
18: for each particle

X_{i}

do
19: Update inertia weight:

w = w_{m a x} - \frac{(w_{m a x} - w_{m i n}) t}{T}

20:                 Update particle velocity
21:                 Update particle position
22:                 Apply boundary constraints
23:         end for
24: end for
25: Obtain optimal parameters:

B a t c h^{*}, L R^{*}, H i d d e n^{*}

26: Rebuild optimized BiLSTM model
27: Train final forecasting model
28: Predict wind power on testing set
29: Compute evaluation metrics:

N M A E, N R M S E, R^{2}

30: return forecasting results

Specifically, the residual series is obtained by subtracting the preliminary PLSR prediction from the actual wind-storage power in the training set. The selected meteorological variables, together with the linear prediction component, are then used as inputs to the BiLSTM model. After PSO-based hyperparameter optimization, the BiLSTM model is trained to predict the residual error. Ultimately, the preliminary PLSR result is corrected by the predicted residual to obtain the final wind-storage power prediction.

By combining data preprocessing, preliminary prediction, residual correction, and hyperparameter optimization within a unified framework, the proposed method achieves good robustness and prediction accuracy.

4.6. Prediction Evaluation

To evaluate prediction performance, the NRMSE serves to measure the overall dispersion of the prediction error. In addition, the NMAE is utilized to determine the mean deviation between the forecast and actual values. The accuracy is measured by NRMSE and NMAE, defined in Equations (14) and (15):

N R M S E = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(p_{i} - p_{i}^{f})}^{2}}}{p_{c a p}} \times 100 %

(14)

N M A E = \frac{1}{N} \sum_{i = 1}^{N} \frac{|p_{i} - p_{i}^{f}|}{p_{c a p}} \times 100 %

(15)

Here,

P_{c a p}

represents the total rated capacity of the wind farm,

{p^{'}}_{i}

denotes the predicted power corresponding to the

i

-th time index,

p_{i}

denotes the measured power at the

i

-th discrete time instant, and

N

denotes the overall count of prediction points during the evaluation day.

5. Case Analysis

5.1. Simulation

The data were collected from 1 September 2020 to 30 September 2020, with a 15 min sampling period. The wind-storage farm comprises 25 wind turbines rated at 2 MW and 101 turbines rated at 1.5 MW, coupled with a lithium-ion battery energy storage system of 48 MWh/24 MW. The net power output (wind + storage) is measured at the point of common coupling. The data were collected from 1 September to 30 September 2020, with a 15 min sampling period. The battery operates under a multi-function control strategy that (i) charges when wind power exceeds a threshold and discharges during peak hours (peak shaving), (ii) smooths short-term wind power fluctuations via ramp-rate limiting, and (iii) tracks the dispatched power schedule to meet grid requirements. The net power time series thus inherently reflects storage actions; no separate battery state data are used as input. The wind speed, wind direction, and atmospheric pressure are selected as the input variables. After data preprocessing, a total of 2396 valid samples are retained. The corresponding data trends are shown in Figure 11.

5.2. Simulation Comparison

For the purpose of verifying the effectiveness of the DBSCAN-based noise identification method, prediction models based on BP and BiLSTM are first constructed using the datasets before and after data preprocessing. The input variables include wind speed, wind direction, and atmospheric pressure. Figure 12 illustrates the scatter distribution of the prediction results before and after preprocessing, and Table 4 presents the corresponding evaluation index.

As shown in Figure 12 and Table 4, compared with the models trained on the raw data, both models trained on the preprocessed data achieve better prediction performance. In addition, the scatter distribution between the observed and the forecasted values is closer to the ideal line y = x. These outcomes verify the effectiveness of the proposed noise identification and preprocessing procedure.

To evaluate the proposed model, the prediction results of BP, standard LSTM, Wavelet-LSTM, and the proposed advanced BiLSTM are compared under identical data preprocessing and data split settings. The dataset is split chronologically: the first 2252 samples (approximately 94%) are used for training, and the remaining 144 samples (corresponding to 36 h, given the 15 min sampling interval) are used for testing. No separate validation set is created; instead, early stopping with a patience of 20 epochs is applied based on the training loss to prevent overfitting.

The hyperparameters of each model are set as follows. For the BP neural network, the hidden layer contains 11 neurons, the learning rate is 0.01, and the training epoch is 150. For the standard LSTM model, the network has one hidden layer with 10 neurons and a learning rate of 0.01 and is trained for 150 epochs using the Adam optimizer. For the proposed advanced BiLSTM model, wind speed, wind direction, and atmospheric pressure are used as the inputs of the PLSR component. After optimization by the PSO algorithm, the BiLSTM network contains one hidden layer with 63 neurons and a learning rate of 0.0114 and is trained for 150 epochs with the Adam optimizer. The BiLSTM adopts a recursive multi-step forecasting strategy without a fixed-length sliding window; it utilizes the entire past sequence and updates its internal state step by step. The activation functions are tanh for the state activation and sigmoid for the gate activations inside the BiLSTM layer, while the output fully connected layer uses linear activation (no activation function). All models are trained with the same chronological split and early stopping policy to ensure a fair comparison.

The algorithm parameters used in this study are listed in Table 5:

Figure 13, Figure 14 and Figure 15 show the prediction curves of the four methods (BP, LSTM, Wavelet-LSTM, and the proposed Advanced BiLSTM) at distinct temporal resolutions, and Table 6 presents the corresponding quantitative evaluation results. In comparison with the BP and conventional LSTM models, the proposed Advanced BiLSTM model exhibits improved prediction accuracy at all considered time scales. According to both evaluation indices, the estimated results generated by the introduced model are more consistent with the measured data. These outcomes demonstrate the performance and robustness of the proposed Advanced BiLSTM model for wind-storage power prediction under noisy data conditions.

To provide a more comprehensive benchmark, we additionally implement a wavelet-based hybrid model (Wavelet-LSTM) using a three-level discrete wavelet transform (DWT) with the db4 wavelet. The original wind power series is decomposed into four sub-series (A3, D3, D2, D1). Each sub-series is predicted independently by an LSTM with the same hyperparameters as the standard LSTM, and the predictions are summed to obtain the final forecast. Strict out-of-sample decomposition is applied to avoid information leakage.

The inferior performance of Wavelet-LSTM can be attributed to two inherent limitations of decomposition-based forecasting methods. First, the boundary effect occurs because wavelet decomposition uses a finite-length signal; at the test set boundary, the decomposed sub-series are distorted, leading to inaccurate predictions. Second, error accumulation arises because the high-frequency components (details) are close to random noise and difficult to predict recursively. Any small error in high-frequency predictions propagates through the summation step and contaminates the final forecast. Recent critical reviews [35] have pointed out that, when wavelet decomposition is applied in a strict out-of-sample rolling prediction setting, the performance often degrades sharply compared to in-sample or offline settings. In contrast, the proposed PLSR-BiLSTM residual correction framework avoids direct prediction of high-frequency components, thereby achieving much better robustness and accuracy.

5.3. Ablation Study on PSO Parameters

To verify the rationality of the key hyperparameters of the particle swarm optimization (PSO) algorithm used in this work and to avoid arbitrary parameter specification, we conducted a systematic ablation study on four core parameters: population size, number of iterations, inertia weight range, and acceleration coefficients (

c_{1}

,

c_{2}

). For each candidate parameter value, we independently repeated the complete PSO search and BiLSTM training procedure three times and recorded the normalized root mean square error (NRMSE) on the 24 h prediction task. The mean and standard deviation of the NRMSE over the three runs for each setting are summarized in Table 7.

As shown in Table 7, when the population size increases from three to five, the mean NRMSE changes only slightly (8.87% vs. 8.90%). However, further increasing the population size to 10 or 15 leads to a noticeable deterioration in prediction accuracy (mean NRMSE rises to 10.07% and 10.74%, respectively), and the standard deviation expands significantly from 0.66% to 2.65%. This trend suggests that, under a limited number of iterations, an excessively large population size, although theoretically increasing population diversity, may prevent the swarm from converging sufficiently and introduce additional random fluctuations. Therefore, a population size of five offers a good balance between solution quality and search stability.

Regarding the number of iterations, increasing the iterations from 10 to 15 reduces the mean NRMSE from 9.66% to 8.90%, a substantial improvement. At 20 iterations the NRMSE slightly increases to 9.29%, while 30 iterations achieve the lowest mean NRMSE of 8.60%. However, as can be seen from the convergence analysis below, the PSO algorithm approaches the global optimum already within the early iterations, and the marginal gain beyond 15 iterations is negligible while the computational cost increases proportionally. Hence, 15 iterations are chosen as the most cost-effective choice.

For the inertia weight range, the classic setting [0.8, 1.2] yields a mean NRMSE of 8.90%, which is very close to the best performing range [0.4, 0.9] (8.73%). The difference is well within one standard deviation and is not statistically significant. Given that [0.8, 1.2] is the most widely used linear decreasing inertia weight scheme in PSO literature, offering better interpretability and reproducibility, we retain this setting.

Among the tested (

c_{1}

,

c_{2}

) pairs, the combination

c_{1} = 2.5

,

c_{2} = 2.0

achieves the lowest mean NRMSE (8.90%), outperforming the other three configurations. This indicates that moderately increasing the cognitive component (

c_{1}

) helps particles more thoroughly explore their own historical best neighborhoods, leading to a better solution in the BiLSTM hyperparameter search space.

In summary, the PSO hyperparameter configuration adopted in this paper is not arbitrary but has been validated through systematic ablation experiments. The chosen parameters either are optimal in terms of accuracy or represent a reasonable trade-off between performance and computational cost, providing a reliable optimization foundation for the construction of the forecasting model.

To further illustrate the search dynamics of the PSO algorithm with the selected parameters (population size = 5, number of iterations = 15, inertia weight range = [0.8, 1.2],

c_{1} = 2.5

,

c_{2} = 2.0

), we present a typical convergence curve in Figure 16, which records the evolution of the best NRMSE found by the swarm in each generation.

As shown in Figure 16, the global best NRMSE is 7.59% in the first generation, drops sharply to 7.01% in the second generation, and then remains at that level until the 15th generation, with the curve becoming flat. This behavior indicates that the PSO algorithm quickly locates a high-quality search region and converges to a competitive hyperparameter combination within the first few iterations. It also confirms that 15 iterations are sufficient to ensure convergence; further iterations would not bring meaningful improvement, but would increase computational time.

Together, the convergence curve and the ablation results demonstrate that the PSO parameter settings used in this paper neither waste computational resources nor risk premature termination that would miss a better solution. The algorithm reliably and efficiently provides competitive hyperparameters for the BiLSTM network.

5.4. Statistical Significance Analysis

To verify that the observed performance improvements are not due to random chance, two complementary analyses are conducted.

For each pair of models, the Diebold–Mariano (DM) test is applied to the 24 h ahead forecast error sequences (96 time points) [36]. The null hypothesis is that the two models have equal predictive accuracy. Table 8 reports the DM statistic and the corresponding

p

-value. All

p

-values are below 0.05, strongly rejecting the null hypothesis and confirming that the proposed Advanced BiLSTM significantly outperforms BP, LSTM, and the Wavelet-LSTM baseline.

To provide a non-parametric complement to the Diebold–Mariano test, we applied the Wilcoxon signed-rank test [37] to the 24 h NRMSE values obtained from the 10 independent runs. This test does not assume normality of the performance metrics and is suitable for small-sample paired comparisons. The results are summarized in Table 9.

As shown in Table 9, the proposed Advanced BiLSTM significantly outperforms BP (p = 0.0254), LSTM (p = 0.0195), and Wavelet-LSTM (p = 0.0020) at the 0.05 significance level. These results are fully consistent with the Diebold–Mariano test (Table 9), confirming the statistical reliability of the proposed model’s superiority over all baselines. The performance of deep learning models can be affected by random initialization. Therefore, the proposed Advanced BiLSTM is trained 10 independent times using the same hyperparameters and data split, with only the random seed changed. Table 10 summarizes the 24 h NRMSE over these 10 runs. The mean NRMSE is 8.96% with a standard deviation of 1.40%, and the 95% confidence interval is [6.22%, 11.70%]. The best single run achieves 6.93% NRMSE, which is 2.58 and 5.74 percentage points lower than BP (9.51%) and LSTM (12.67%), respectively. Even the worst run (11.31%) remains comparable to or better than the standard LSTM. These results demonstrate that the superiority of the proposed model is statistically reliable and not due to a lucky initialization.

6. Conclusions

In the current work, an advanced BiLSTM prediction model is devised to reduce the influence of data outliers and improve wind-storage power prediction accuracy using measured wind-storage farm data. First, the DBSCAN parameters (ε = 0.9, MinPts = 2.5) are systematically determined via a k-distance plot, DBSCAN is employed to identify abnormal samples in the historical dataset, and linear regression is used to correct the detected noise points. Then, the PLSR model with the number of components set to A = 3 based on five-fold cross-validation is implemented to generate the preliminary wind-storage energy prediction. In addition, the BiLSTM network serves to predict the residual error, which mainly reflects the nonlinear characteristics not captured by the preliminary prediction. Ultimately, the preliminary PLSR result is corrected by the predicted residual to acquire the ultimate wind-storage power forecast. The proposed method is compared not only with BP and standard LSTM, but also with a Wavelet-LSTM baseline with strict out-of-sample decomposition; the proposed model significantly outperforms all baselines. Although the model is validated on wind-storage hybrid data, it is generic and applicable to pure wind power or other renewable-plus-storage systems, provided that the net power time series is used as the target variable.

Comparative simulation results show that, for the 24 h prediction horizon, the proposed method improves prediction accuracy by 2.29%, 11.47%, and 5.54% compared with the BP, Wavelet-LSTM, and standard LSTM models, respectively. Furthermore, the Diebold–Mariano test and 10-independent-run confidence intervals confirm the statistical significance of the improvements. The obtained results reflect that the proposed advanced BiLSTM prediction model is valuable for wind-storage power prediction in noisy data environments.

Author Contributions

Methodology, M.L., Y.Z. and Z.L.; validation, G.W. and J.Y.; formal analysis, Y.L.; investigation, C.L.; writing—original draft, Z.L. and M.L.; writing—review and editing, C.Z.; supervision, J.Y.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2023YFB4203200. The authors greatly appreciate the financial support of the Natural Science Basic Research Program of Shaanxi (2024JC-YBMS-434, 2025JC-YBMS-504) and the Key Research and Development Program of Xianyang City (S2025-ZDYF-GDZB-4806).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Chao Luo, Jiawei Yu, and Yihua Zhu were employed by the company China Southern Power Grid. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

An, G.; Jiang, Z.; Cao, X.; Liang, Y.; Zhao, Y.; Li, Z.; Dong, W.; Sun, H. Short-term wind power prediction based on particle swarm optimization-extreme learning machine model combined with AdaBoost algorithm. IEEE Access 2021, 9, 94040–94052. [Google Scholar] [CrossRef]
Zhang, G.; Liu, H.; Zhang, J.; Yan, Y.; Zhang, L.; Wu, C.; Hua, X.; Wang, Y. Wind power prediction based on variational mode decomposition multi-frequency combinations. J. Mod. Power Syst. Clean Energy 2019, 7, 281–288. [Google Scholar] [CrossRef]
Xu, G.; Tang, Z.; Han, H.; Bu, B. Prediction of short-term wind power based on ESN improved by VMD. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 674–678. [Google Scholar] [CrossRef]
Li, C.; Tang, G.; Xue, X.; Chen, X.; Wang, R.; Zhang, C. The short-term interval prediction of wind power using the deep learning model with gradient descend optimization. Renew. Energy 2020, 155, 197–211. [Google Scholar] [CrossRef]
Li, H. SCADA data based wind power interval prediction using LUBE-based deep residual networks. Front. Energy Res. 2022, 10, 920837. [Google Scholar] [CrossRef]
Naik, J.; Dash, P.; Dhar, S. A multi-objective wind speed and wind power prediction interval forecasting using variational modes decomposition based multi-kernel robust ridge regression. Renew. Energy 2019, 136, 701–731. [Google Scholar] [CrossRef]
Ouyang, T.; Huang, H.; He, Y.; Tang, Z. Chaotic wind power time series prediction via switching data-driven modes. Renew. Energy 2020, 145, 270–281. [Google Scholar] [CrossRef]
Ding, Y.; Chen, Z.; Zhang, H.; Wang, X.; Guo, Y. A short-term wind power prediction model based on CEEMD and WOA-KELM. Renew. Energy 2022, 189, 188–198. [Google Scholar] [CrossRef]
Bokde, N.; Feijóo, A.; Villanueva, D.; Kulat, K. A review on hybrid empirical mode decomposition models for wind speed and wind power prediction. Energies 2019, 12, 254. [Google Scholar] [CrossRef]
Wang, H.; Han, S.; Liu, Y.; Yan, J.; Li, L. Sequence transfer correction algorithm for numerical weather prediction wind speed and its application in a wind power prediction system. Appl. Energy 2019, 237, 1–10. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, H.; Guo, Y. Wind power prediction based on PSO-SVR and grey combination model. IEEE Access 2019, 7, 136254–136267. [Google Scholar] [CrossRef]
Donadio, L.; Fang, J.; Porté-Agel, F. Numerical weather prediction and artificial neural network coupling for wind energy forecast. Energies 2021, 14, 338. [Google Scholar] [CrossRef]
Liu, W.; Jia, L. Wind power prediction based on the stacking model of XGBoost and random forest. In Proceedings of the 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS), 2022; IEEE: New York, NY, USA, 2022; pp. 1118–1123. [Google Scholar] [CrossRef]
Bouyeddou, B.; Harrou, F.; Saidi, A.; Sun, Y. An effective wind power prediction using latent regression models. In Proceedings of the 2021 International Conference on ICT for Smart Society (ICISS), 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Yang, Z.; Peng, X.; Song, J.; Duan, R.; Jiang, Y.; Liu, S. Short-term wind power prediction based on multi-parameters similarity wind process matching and weighed-voting-based deep learning model selection. IEEE Trans. Power Syst. 2024, 39, 2129–2142. [Google Scholar] [CrossRef]
Deng, W.; Dai, Z.; Chen, R.; Wang, H.; Lu, S.; Li, C.; Zhou, B. Wind power interval prediction based on CGAN and KELM under extreme weather scenarios. In Proceedings of the 2023 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia), Chongqing, China, 7–9 July 2023; pp. 2341–2346. [Google Scholar] [CrossRef]
Chen, B.; Ma, Z.; Zhou, Q. Short-term wind power prediction based on BP neural network improved by t-tent-SSA algorithm. In Proceedings of the 2022 7th International Conference on Power and Renewable Energy (ICPRE), 2022; IEEE: New York, NY, USA, 2022; pp. 844–848. [Google Scholar] [CrossRef]
Chen, G.; Shan, J.; Li, D.Y.; Wang, C.; Li, C.; Zhou, Z.; Wang, X.; Li, Z.; Hao, J.J. Research on wind power prediction method based on convolutional neural network and genetic algorithm. In Proceedings of the 2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia), 2019; IEEE: New York, NY, USA, 2019; pp. 3573–3578. [Google Scholar] [CrossRef]
Deng, W.; Dai, Z.; Liu, X.; Chen, R.; Wang, H.; Zhou, B.; Tian, W.; Lu, S.; Zhang, X. Short-term wind power prediction based on wind speed interval division and TimeGAN for gale weather. In Proceedings of the 2023 International Conference on Power Energy Systems and Applications (ICoPESA), 2023; IEEE: New York, NY, USA, 2023; pp. 352–357. [Google Scholar] [CrossRef]
Hu, S.; Wang, Z.; Hu, N. Short-term wind power prediction based on whale algorithm optimization and attention mechanism of BiLSTM neural network. In Proceedings of the 2023 IEEE 5th International Conference on Power, Intelligent Computing and Systems (ICPICS), 2023; IEEE: New York, NY, USA, 2023; pp. 915–920. [Google Scholar] [CrossRef]
Xiang, L.; Liu, J.; Yang, X.; Hu, A.; Su, H. Ultra-short term wind power prediction applying a novel model named SATCN-LSTM. Energy Convers. Manag. 2022, 252, 115036. [Google Scholar] [CrossRef]
Shahid, F.; Zameer, A.; Mehmood, A.; Raja, M.A.Z. A novel wavenets long short term memory paradigm for wind power prediction. Appl. Energy 2020, 269, 115098. [Google Scholar] [CrossRef]
Houran, M.A.; Bukhari, S.M.S.; Zafar, M.H.; Mansoor, M.; Chen, W. COA-CNN-LSTM: Coati optimization algorithm-based hybrid deep learning model for PV/wind power prediction in smart grid applications. Appl. Energy 2023, 349, 121638. [Google Scholar] [CrossRef]
Kari, T.; Guoliang, S.; Kesong, L.; Xiaojing, M.; Xian, W. Short-Term Wind Power Prediction Based on Combinatorial Neural Networks. Intell. Autom. Soft Comput. 2023, 37, 1437–1452. [Google Scholar] [CrossRef]
Hanifi, S.; Liu, X.; Lin, Z.; Lotfian, S. A Critical Review of Wind Power Forecasting Methods—Past, Present and Future. Energies 2020, 13, 3764. [Google Scholar] [CrossRef]
Wang, S.; Li, B.; Li, G.; Yao, B.; Wu, J. Short-term wind power prediction based on multidimensional data cleaning and feature reconfiguration. Appl. Energy 2021, 292, 116851. [Google Scholar] [CrossRef]
Liu, Y.; Guan, L.; Hou, C.; Han, H.; Liu, Z.; Sun, Y.; Zheng, M. Wind power short-term prediction based on LSTM and discrete wavelet transform. Appl. Sci. 2019, 9, 1108. [Google Scholar] [CrossRef]
Zhou, B.; Ma, X.; Luo, Y.; Yang, D. Wind power prediction based on LSTM networks and nonparametric kernel density estimation. IEEE Access 2019, 7, 165279–165292. [Google Scholar] [CrossRef]
Chen, Y.; Yu, S.; Islam, S.; Lim, C.P.; Muyeen, S.M. Decomposition-based wind power forecasting models and their boundary issue: An in-depth review and comprehensive discussion on potential solutions. Energy Rep. 2022, 8, 8805–8820. [Google Scholar] [CrossRef]
Pawlik, L. Evaluating Reconstruction-Based and Proximity-Based Methods: A Four-Way Comparison (AE, LSTM-AE, OCSVM, IF) in SCADA Anomaly Detection Under Inverted Imbalance. Future Internet 2026, 18, 96. [Google Scholar] [CrossRef]
Ibrahim, I.Y. An Integrated Python Framework for Wind Turbine Predictive Maintenance: LSTM-Autoencoder Anomaly Detection Benchmarked on the CARE-to-Compare Dataset with Digital Twin Power-Curve Modelling and Monte Carlo Uncertainty Quantification (1.0). Zenodo 2026. [Google Scholar] [CrossRef]
Mehmood, Z.; Wang, Z. Hybrid iForest-DBSCAN for anomaly detection and wind power curve modelling. Expert Syst. Appl. 2025, 289, 128381. [Google Scholar] [CrossRef]
Juban, R.; Ohlsson, H.; Maasoumy, M.; Poirier, L.; Kolter, J.Z. A multiple quantile regression approach to the wind, solar, and price tracks of GEFCom2014. Int. J. Forecast. 2016, 32, 1094–1102. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Cheng, R.; Yang, D.; Liu, D.; Zhang, G. A reconstruction-based secondary decomposition-ensemble framework for wind power forecasting. Energy 2024, 308, 132895. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. J. Bus. Econ. Stat. 1995, 13, 253–263. [Google Scholar] [CrossRef]
Shahid, F.; Zameer, A.; Muneeb, M. A novel genetic LSTM model for wind power forecast. Energy 2021, 223, 120069. [Google Scholar] [CrossRef]

Figure 1. Wind speed vs. net power distribution. Blue dots represent normal operating data; red circles highlight malfunction data; the arrow points to outage data.

Figure 2. Schematic diagram of DBSCAN principle. Red balls: core points; yellow balls: border points; blue balls: noise points. The outer circle (dashed) denotes the ε-neighborhood of a core point; arrows indicate the density-reachable connections from a core point to points inside its ε-neighborhood.

Figure 3. Basic steps of DBSCAN.

Figure 4. Noise identification with different parameters: (a) ε = 0.9, MinPts = 2.5; (b) ε = 0.9, MinPts = 5; (c) ε = 1.2, MinPts = 2.5.

Figure 5. K-distance.

Figure 6. Schematic diagram of noise correction results.

Figure 7. BILSTM memory module structure. The arrows indicate the direction of information flow: rightward arrows represent the forward pass, and leftward arrows represent the backward pass. The circles (or rectangles) denote the memory cells at different time steps. The labels W1–W6 are the trainable weight matrices associated with the input, forget, output gates, and candidate cell. The horizontal lines at the top and bottom represent the cell state flow through the module. Blue components correspond to forward hidden states; orange components correspond to backward hidden states.

Figure 8. Five-fold cross-validation MSE for PLSR component selection.

Figure 9. Flow chart of original data preprocessing.

Figure 10. Framework of the proposed forecasting model.

Figure 11. Data trend chart after processing: (a) wind power, (b) wind speed, (c) wind direction, (d) pressure.

Figure 12. Comparison of noise identification effects using different methods: (a) before data preprocessing, (b) after data preprocessing.

Figure 13. Predicted 12 h wind-storage power.

Figure 14. Predicted 24 h wind-storage power.

Figure 15. Predicted 36 h wind-storage power.

Figure 16. Convergence curve of the PSO algorithm with the selected parameters.

Table 1. Nomenclature—variables and parameters used in this study.

Symbol	Description	Unit
$P_{net}$	Net power output of wind-storage hybrid system	MW
$P_{wind}$	Wind power	MW
$P_{storage}$	Battery storage power (positive: discharging, negative: charging)	MW
$P_{cap}$	Total rated capacity of the wind farm (201.5 MW)	MW
$v_{i}^{t}$	Velocity of particle i at iteration t	—
$x_{i}^{t}$	Position of particle i at iteration t	—
$w$	Inertia weight in PSO	—
$c_{1}, c_{2}$	Cognitive and social acceleration coefficients	—
$r_{1}, r_{2}$	Random numbers uniformly distributed in [0, 1]	—
$p_{i}^{t}$	Personal best position of particle i	—
$g^{t}$	Global best position of the swarm	—
$X, Y$	Matrices of independent and dependent variables (PLSR)	—
$t_{i}, u_{i}$	Latent components extracted by PLSR	—
$β$	Regression coefficient matrix	—
$A$	Number of PLSR components (here A = 3)	—
$ϵ$	DBSCAN neighborhood radius	—
MinPts	DBSCAN minimum number of points	—
$N$	Number of prediction points	—
$p_{i}$	Measured power at time i	MW
$p_{i}^{'}$	Predicted power at time i	MW
NRMSE	Normalized root mean square error	%
NMAE	Normalized mean absolute error	%

Table 2. Classic noise data types.

Type of Noise Data	Wind Speed (m/s)	Wind-Storage Power (MW)	Data Features
Missing data	/	/	Data are missing
Malfunction data	1.68	99.368	Illogical correlation between the wind speed and power output
Outage data	14.24	0	Wind speed exceeds the cut-in speed, but power output is zero

“/” indicates that no corresponding value is available due to missing data.

Table 3. Nonlinear correlation analysis between meteorological variables and wind power using Spearman’s rank correlation and normalized mutual information (NMI).

Variable	Pearson	Spearman	NMI
Wind speed	0.896	0.888	0.309
Wind direction	−0.366	−0.451	0.147
Pressure	−0.435	−0.427	0.183
Temperature	−0.061	−0.030	0.159
Humidity	0.001	0.113	0.163

Table 4. Effect evaluation index before and after data noise preprocessing.

Model	Before Data Preprocessing		After Data Preprocessing
Model	NRMSE/%	NMAE/%	NRMSE/%	NMAE/%
BP	10.90	7.78	9.37	6.72
BiLSTM	5.25	2.76	3.52	2.23

Table 5. Algorithm parameters and their values used in this study.

Algorithm	Parameter	Value
DBSCAN	Neighborhood radius ϵ	0.9
DBSCAN	Minimum number of points MinPts	2.5
PLSR	Number of components A	3
PLSR	Cross-validation folds	5
PSO	Population size	5
	Maximum iterations	15
	Inertia weight w	Linearly decreasing from 1.2 to 0.8
	Cognitive coefficient c₁	2.5
	Social coefficient c₂	2
LSTM	Hidden layer neurons	11
	Learning rate	0.01
	Epochs	150
	Optimizer	Adam
BiLSTM (proposed)	Hidden layer neurons	63
	Learning rate	0.0114
	Batch size	247
	Epochs	150
	Optimizer	Adam
	State activation	tanh
	Gate activation	Sigmoid
	Output activation	Linear
	Regularization	Early stopping (patience = 20)
Wavelet-LSTM	Wavelet type	db4
	Decomposition level	3
	Sub-series	A3, D3, D2, D1
	LSTM hidden units	10 (same as standard LSTM)
	Learning rate	0.01
	Epochs	150
	Optimizer	Adam
	Decomposition strategy	Strict out-of-sample

Table 6. Evaluation indexes of power prediction effect of different methods.

Model	NRMSE/%			NMAE/%
Model	12 h	24 h	36 h	12 h	24 h	36 h
BP	9.69	9.51	10.02	6.86	6.74	7.16
LSTM	17.65	12.67	12.73	12.11	8.61	8.76
AdvancedBILSTM	7.25	7.22	8.01	5.33	5.35	5.81
Wavelet-LSTM	17.08	18.69	19.34	13.79	15.43	16.47

Table 7. PSO hyperparameter ablation results (24 h NRMSE).

Parameter	Value	Mean NRMSE (%)	Std NRMSE (%)
Population size	3	8.87	0.66
	5	8.90	1.18
	10	10.07	1.94
	15	10.74	2.65
Number of iterations	10	9.66	0.43
	15	8.90	1.18
	20	9.29	0.25
	30	8.60	0.84
Inertia weight range	[0.5, 1.0]	9.04	1.07
	[0.8, 1.2]	8.90	1.18
	[0.4, 0.9]	8.73	1.00
	[0.9, 1.2]	9.52	0.35
Acceleration coefficients (c₁, c₂)	2.5, 2.0	8.90	1.18
	2.0, 2.0	9.36	1.37
	2.0, 2.5	9.18	0.90
	1.5, 1.5	9.03	0.71

Note: The values in bold indicate the final parameter configuration selected in this paper.

Table 8. Diebold–Mariano test results (24 h ahead).

Comparison	DM Statistic	p-Value	Significant (α = 0.05)
BP vs. Advanced BiLSTM	2.6778	0.0087	Significant
LSTM vs. Advanced BiLSTM	2.3666	0.0200	Significant
Wavelet-LSTM vs. Advanced BiLSTM	8.8150	<0.0001	Significant

Table 9. Wilcoxon signed-rank test results (24 h NRMSE).

Comparison	p-Value	Significant (α = 0.05)
BP vs. Advanced BiLSTM	0.0254	Significant
LSTM vs. Advanced BiLSTM	0.0195	Significant
Wavelet-LSTM vs. Advanced BiLSTM	0.0020	Significant

Table 10. Statistical summary of 10 independent runs of Advanced BiLSTM (24 h NRMSE).

Statistic	Value (%)
Mean	8.96
Standard deviation	1.40
95% confidence interval	[6.22, 11.70]
Minimum (best run)	6.93
Maximum (worst run)	11.31

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lv, M.; Liu, Z.; Wang, G.; Zhang, C.; Liu, Y.; Luo, C.; Yu, J.; Zhu, Y. An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction. Energies 2026, 19, 2666. https://doi.org/10.3390/en19112666

AMA Style

Lv M, Liu Z, Wang G, Zhang C, Liu Y, Luo C, Yu J, Zhu Y. An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction. Energies. 2026; 19(11):2666. https://doi.org/10.3390/en19112666

Chicago/Turabian Style

Lv, Muyao, Zejia Liu, Guoqing Wang, Chao Zhang, Yanling Liu, Chao Luo, Jiawei Yu, and Yihua Zhu. 2026. "An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction" Energies 19, no. 11: 2666. https://doi.org/10.3390/en19112666

APA Style

Lv, M., Liu, Z., Wang, G., Zhang, C., Liu, Y., Luo, C., Yu, J., & Zhu, Y. (2026). An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction. Energies, 19(11), 2666. https://doi.org/10.3390/en19112666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Advanced BiLSTM Prediction Model for Short-Term Wind-Storage Power Prediction

Abstract

1. Introduction

2. Power Characteristics of Wind-Storage Hybrid Systems

3. Noise Identification

3.1. DBSCAN Algorithm

3.2. Noise Recognition

3.3. Noise Data Correction

4. Advanced BiLSTM Predictive Model

4.1. BiLSTM Principles

4.2. PSO Principles

4.3. PLSR Principles

4.4. Data Standardization and Preprocessing Process

4.5. Predictive Model Building

4.6. Prediction Evaluation

5. Case Analysis

5.1. Simulation

5.2. Simulation Comparison

5.3. Ablation Study on PSO Parameters

5.4. Statistical Significance Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI