1. Introduction
Renewable energy has become increasingly important in modern power systems. Among the available renewable resources, wind-storage power has been widely integrated due to its sustainability, environmental benefits, and cost competitiveness [
1]. Nevertheless, the intrinsic intermittency and stochastic variability of wind-storage resources create substantial challenges to secure grid operation and economic dispatch [
2]. Wind energy forecasting serves as a fruitful approach to mitigate these challenges. Through predicting future wind-storage power output and enabling more coordinated dispatch strategies, the negative impacts associated with wind-storage power intermittency can be significantly reduced. Therefore, for the large-scale integration of wind energy into power systems, accurate and dependable wind-storage power forecasting is indispensable [
3,
4,
5,
6]. Wind-storage hybrid systems, which couple wind farms with battery energy storage, have emerged as a promising solution to mitigate the intermittency of wind power. The storage system can absorb or release power according to grid needs, thereby improving dispatchability and reducing curtailment. Consequently, accurate prediction of the net power output of such wind-storage systems is essential for optimal energy management and grid integration.
In general, wind-storage power prediction approaches can be classified into two categories: physical and statistical model-based approaches [
7,
8,
9]. Physical models are developed by characterizing the physical relationships among meteorological variables, geographical features, and wind speed. Wind speed is first estimated using equations derived from atmospheric dynamics and meteorological data and subsequently transformed into power output through the turbine power curve. A representative example is the numerical weather prediction (NWP) model, which has been extensively utilized in wind-storage energy prediction [
10,
11,
12]. Physical models typically do not rely on historical power generation data. In contrast, statistical models forecast future power output through the extraction of statistical dependencies and temporal patterns from historical observations. Representative techniques include the Kalman filter, autoregressive (AR) models [
6], support vector machines (SVMs) [
11], random forests, and XGBoost [
13]. For example, Bouyeddou et al. [
14] evaluated PCR and PLSR models for wind-storage power forecasting and presented determination coefficients (R
2) equal to 0.930 and 0.931, respectively. Although these methods are relatively efficient and practical, their capability to capture the highly nonlinear and time-varying characteristics of wind-storage power is still limited.
Over the past decade, the rapid advancement of neural networks and deep learning has further promoted their application in wind-storage power forecasting [
15,
16]. Architectures including back propagation neural networks, long short-term memory networks, and convolutional neural networks have been broadly investigated. In ref. [
17], a BP neural network tuned by the t-Tent-SSA method was proposed to enhance prediction accuracy. Chen et al. [
18] combined CNN with genetic algorithms to estimate wind-storage-related variables, achieving a correlation coefficient of 0.835. Ref. [
19] divided wind speed samples into different intervals according to the Beaufort wind scale. The study authors then employed TimeGAN to amplify samples in high wind speed intervals where data were scarce and subsequently used a neural network for power forecasting. With this sample expansion, the RMSE decreased by 2.57% compared to traditional LSTM networks. Hu et al. [
20] established a BiLSTM-Attention algorithm to forecast wind-storage power output and utilized the Whale Optimization Algorithm to select optimal network hyperparameters, achieving an R
2 of 93.23%. In ref. [
21], a self-attention mechanism was introduced to develop a hybrid model combining an SATCN and an LSTM. The proposed model reduced the RMSE by 17.56% and 10.99% compared with standard LSTM and TCN, respectively. In ref. [
22], a wavelet activation kernel was introduced into the LSTM layer to develop a deep learning-based WN-LSTM model, achieving an MAE of 0.0050. In ref. [
23], a CNN–LSTM model whose hyperparameters have been optimized using the Coati Optimization Algorithm (COA) was proposed, reducing the RMSE by 0.5% and 5.8% for day-ahead and hour-ahead short-term forecasting, respectively. Despite these advances, existing deep learning methods still face two important challenges. On the one hand, the quality of input data has a considerable influence on their prediction performance. On the other hand, it remains difficult to effectively combine preliminary linear prediction with subsequent residual correction within a unified forecasting framework.
The literature has explored hybrid strategies that combine linear models with neural networks to improve forecasting accuracy. For instance, Bouyeddou et al. [
14] evaluated PCR and PLSR for wind power forecasting, achieving R
2 values of 0.930 and 0.931, respectively, confirming the effectiveness of PLSR as a baseline linear predictor. In terms of deep learning enhancement, Kari et al. [
24] constructed a BiLSTM network to capture temporal correlations and combined it with wavelet transform and an improved genetic algorithm, demonstrating that hybrid configurations can further enhance prediction accuracy. A systematic review by Hanifi et al. [
25] classified wind power forecasting methods into physical, statistical, and hybrid categories, highlighting that hybrid models currently achieve the best performance. These findings motivate our PLSR-BiLSTM residual correction framework.
Additionally, the quality of the original dataset significantly affects prediction accuracy. During the operation of wind-storage power plants, anomalous data are frequently observed due to downtime, malfunctions, and other factors. Although these outliers typically constitute a small portion of the dataset, they can significantly degrade prediction performance. Therefore, eliminating anomalous data is crucial [
26]. Common approaches include missing value imputation via interpolation, normalization to mitigate dimensionality effects [
26], wavelet transform-based data decomposition [
27], and clustering-based methods [
28] to capture correlations between historical data and predicted values. However, recent studies have pointed out that decomposition-based forecasting models, including wavelet-based methods, may suffer from the boundary issue if future data are inadvertently used during decomposition, leading to unrealistically high performance [
29]. In the context of wind turbine SCADA data, recent studies have systematically compared DBSCAN with other anomaly detection methods. Pawlik [
30] evaluated Autoencoder (AE), LSTM-Autoencoder (LSTM-AE), One-Class SVM (OCSVM), and Isolation Forest (IF), concluding that reconstruction-based approaches are effective for capturing temporal dependencies. Similarly, Gück et al. [
31] reported that LSTM-AE outperformed Isolation Forest in a benchmark study. More importantly, Mehmood and Wang [
32] proposed a hybrid iForest-DBSCAN framework for anomaly detection and power curve modelling, achieving over 99% accuracy in both offshore and onshore wind farms, further demonstrating that DBSCAN-based methods can be effectively integrated with other techniques to enhance detection performance. These works confirm that DBSCAN remains advantageous for identifying arbitrarily shaped clusters without requiring prior data labeling. While clustering algorithms are widely adopted, the conventional K-means method fails to identify clusters with complex (e.g., nested) structures. Therefore, density-based approaches, particularly DBSCAN, have been introduced to address this limitation. Nevertheless, data preprocessing alone is not sufficient to fully improve forecasting performance, and an effective prediction framework is still needed to further enhance model accuracy.
Despite these advances, existing hybrid forecasting methods have several limitations that are often overlooked. First, decomposition-based approaches (e.g., Wavelet-LSTM) suffer from the boundary issue when applied in strict out-of-sample multi-step forecasting, as recently reviewed by Chen et al. [
29]; using future data during decomposition artificially inflates performance, while a truly online setting leads to error accumulation. Second, most residual-correction models are designed for pure wind power and do not account for the additional noise and dynamics introduced by battery charge/discharge in wind-storage hybrid systems. Third, although DBSCAN has been used for anomaly detection [
30,
31,
32], few studies systematically integrate it with both linear (PLSR) and nonlinear (BiLSTM) components for wind-storage power prediction, nor do they provide rigorous statistical significance tests such as Diebold–Mariano or multiple-run confidence intervals. Therefore, a framework that explicitly handles storage-induced noise, combines linear extraction with residual learning, and offers comprehensive statistical validation is still lacking. To fill these gaps, the work presented herein proposes an advanced BiLSTM prediction model with the following contributions.
The main contributions of this work are as follows:
- (1)
We propose a wind-storage hybrid system-oriented forecasting framework that explicitly accounts for battery charge/discharge dynamics and the resulting noise in net power measurements.
- (2)
We integrate DBSCAN-based anomaly detection with linear regression correction to clean wind-storage power data, which is shown to significantly improve prediction accuracy.
- (3)
We develop a PLSR + BiLSTM residual correction architecture that decomposes the prediction task into linear (PLSR) and nonlinear (BiLSTM) parts and optimize the BiLSTM hyperparameters via PSO.
- (4)
We provide extensive statistical validation including Diebold–Mariano tests, multiple-run confidence intervals, and comparison with a Wavelet-LSTM baseline.
The remainder of this paper is organized as follows.
Section 2 describes the power characteristics of wind-storage hybrid systems.
Section 3 presents the DBSCAN-based noise identification and correction method.
Section 4 introduces the PLSR-BiLSTM-PSO forecasting model, including principles and cross-validation for PLSR components.
Section 5 presents the case study, experimental settings, comparative results, and statistical significance analysis.
Section 6 concludes the paper.
To ensure comparability with established benchmarks in wind power forecasting, our case study follows similar validation protocols as the widely used GEFCom2014 dataset [
33], which includes NWP and power observations from multiple wind farms. Results obtained from simulations confirm that, compared with BP, Wavelet-LSTM, and standard LSTM models, the proposed method improves prediction accuracy by 2.29%, 11.47%, and 5.54% for the 24 h horizon, respectively.
The main variables and parameters used in this study are summarized in
Table 1:
2. Power Characteristics of Wind-Storage Hybrid Systems
A wind-storage hybrid system consists of a wind farm and a battery energy storage system (BESS) connected at the point of common coupling. The net output power
is the algebraic sum of wind power
and storage power
(positive when discharging, negative when charging), as expressed in Equation (1):
The BESS can smooth fluctuations, provide frequency regulation, and store excess wind energy. However, the actual net power is also influenced by the state of charge (SoC) of the battery and the control strategy. In practice, measured data from wind-storage plants often contain anomalies caused by both wind turbine malfunctions and battery operation constraints (e.g., forced charging/discharging limits).
Figure 1 shows the raw distribution of wind speed versus net power collected from an actual wind-storage farm in Gansu Province over 1 month.
When wind speed falls within the normal operating range of the turbine, wind speed and wind-storage-generated power usually exhibit a strong positive correlation. This characteristic can be used for effective noise identification. According to
Figure 1, the raw data on wind-storage power velocity have many abnormal points. Owing to the temporal dependence of wind-storage power data, such anomalies cannot accurately reflect the genuine operating conditions of the turbines and may degrade prediction accuracy. Therefore, systematic data preprocessing is essential before model construction.
Table 2 summarizes the typical types of noise data. Noise caused by mechanical failures or human interventions can significantly distort the normal data distribution. This distortion interferes with supervised model training and weakens the capacity for generalization of the prediction model. Therefore, systematic noise identification and correction can effectively reduce data interference and improve overall wind-storage power prediction performance.
It should be noted that the data used in this study correspond to the net power output of the wind-storage hybrid system, where the effects of battery charge/discharge are already embedded in the measured power values. The specific battery type is lithium-ion with a rated capacity of 48 MWh and a rated power of 24 MW. The control strategy is peak shaving and valley filling, power smoothing, and tracking the dispatched power schedule, thereby improving the dispatchability and economy of the wind-storage hybrid system. Our prediction model does not require explicit modelling of the battery’s internal dynamics, as the net power time series contains the historical impact of storage operations.
3. Noise Identification
3.1. DBSCAN Algorithm
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering technique. In contrast to the K-means, it eliminates the need for predefining the cluster count and is proficient in recognizing clusters of various shapes. Moreover, DBSCAN is more suitable for datasets with irregular sample distributions and noise points, whereas K-means often performs poorly when handling non-convex data distributions.
DBSCAN relies on two key parameters, (, MinPts), to characterize the local density of samples. Here, denotes the neighborhood radius of a sample, and MinPts denotes the minimal count of neighboring points essential for that sample to be considered a core point. Based on these two parameters, DBSCAN classifies samples into core points, border points, and noise points, thereby identifying dense regions and isolating abnormal observations.
Figure 2 illustrates the basic principle of DBSCAN. Once
MinPts is specified, the red circles represent core points, the yellow circles represent density-reachable points, and the blue circles represent noise points.
Figure 3 illustrates the primary execution stages of the DBSCAN.
3.2. Noise Recognition
The performance of DBSCAN is significantly influenced by the selection of and MinPts. A smaller or a larger MinPts reduces the number of core points, which may cause normal samples to be incorrectly identified as noise and increase the number of clusters. Conversely, a larger or a smaller MinPts increases the number of core points, which may cause some abnormal samples to be absorbed into normal clusters and remain undetected. Therefore, appropriate parameter selection is essential for reliable noise identification.
As shown in
Figure 4, there is a clear correlation between wind speed and wind-storage power when the turbine operates within its normal range. Generally, wind-storage power increases with wind speed under normal conditions. However, some samples deviate significantly from this relationship due to sensor malfunctions or data acquisition failures; these should be identified as noise points.
Figure 4b shows that inappropriate parameter settings incorrectly classify many high-density normal samples as noise, while
Figure 4c shows that many low-density abnormal samples remain undetected. In contrast, the parameter setting in
Figure 4a (
ε = 0.9,
MinPts = 2.5) better preserves the original wind speed vs. wind-storage power distribution and effectively identifies abnormal samples. Therefore, the selected parameters can reduce the interference of noisy data in subsequent time-series forecasting.
To determine the DBSCAN parameter ε in a systematic way, we construct a k-distance plot. For each data point (wind speed, power), we compute its distance to the k-th nearest neighbor, where k =
MinPts − 1. Here we set
MinPts = 2.5 (approximately 2 × dimensionality, with the features being wind speed and power), so k = 2. The k-distance values are then sorted in descending order and plotted against the point index, as shown in
Figure 5. The curve drops sharply from a large value (over 7) and begins to flatten at a distance around 0.9. This “elbow” point indicates the threshold where points transition from sparse (noise) to dense (cluster) regions. Therefore, we choose the elbow value as the neighborhood radius:
ε = 0.9.
Figure 4a demonstrates that, with
ε = 0.9 and
MinPts = 2.5, the DBSCAN clustering successfully identifies abnormal samples while preserving the normal wind-speed–power relationship. In contrast, inappropriate parameter settings (
Figure 4b,c) either misclassify normal points as noise or fail to detect abnormal points.
3.3. Noise Data Correction
After the noise points are identified, they must be further processed. Common treatment methods include direct deletion and interpolation. When the noisy data account for only a small and discontinuous portion of the original dataset, they may be directly removed. Although deletion is simple to implement, excessive data removal leads to information loss and may further reduce forecasting accuracy.
Alternatively, interpolation replaces abnormal samples with estimated values derived from normal data. Compared with direct deletion, interpolation preserves more of the original data information and imposes fewer restrictions on the distribution of noisy samples. Traditional interpolation methods, such as mean-value or median-value replacement, are easy to implement. However, they may distort the inherent data distribution and consequently affect wind-storage power forecasting performance.
To overcome this drawback, a linear regression method is utilized in the present study to correct the identified noise points. Specifically, a regression equation is established using the non-noise samples, and the fitted relationship is then used to estimate the values of the abnormal points. This method can effectively handle continuous noisy samples while preserving the original data distribution as much as possible.
As shown in
Figure 6, most of the corrected noise points are relocated to the neighborhood of the normal wind speed–storage power relationship after linear regression. Compared with the raw noisy data, the processed dataset better follows wind speed–storage power relationship. It also retains more valid information for subsequent forecasting.
4. Advanced BiLSTM Predictive Model
4.1. BiLSTM Principles
Among recurrent neural networks (RNNs), LSTM is a specific subtype. LSTM can significantly relieve the issues of gradient vanishing and explosion by integrating a cell’s stored state and three gate mechanisms, namely input, output, and forget gates. As a result, LSTM has stronger capabilities in historical data learning and long-term dependency modeling, and it is well suited for complex nonlinear problems.
A bidirectional LSTM (BiLSTM) network consists of two parallel LSTM networks and performs separate processing of the input sequence in forward and reverse directions. With such a structure, the model can capture context from time steps that are both past and future, thereby improving the completeness of feature extraction. The memory module organization of the BiLSTM network is visualized in
Figure 7.
4.2. PSO Principles
The performance of the BiLSTM network is strongly influenced by its hyperparameters: as an example, the total count of hidden neurons. Therefore, it is essential to optimize these hyperparameters by applying the PSO algorithm.
Being a search algorithm, PSO features relatively fast convergence. During each iteration, the algorithm identifies the current best particle, and the remaining particles update their trajectories accordingly. Particles are updated by considering their own and the global best position [
11]. The PSO algorithm performs the search by updating particle velocity and position. The iterative update formula is given in Equation (2).
where
and
are the velocity and position of particle
at iteration k;
is the inertia weight;
and
are the cognitive and social acceleration coefficients, respectively;
and
are random numbers uniformly distributed in [0, 1];
is the personal best position of particle
; and
is the global best position of the swarm.
Amone them, c1 is a personal best influence factor, and c2 is a global learning factor. w is the weight parameter.
4.3. PLSR Principles
Assume that there are independent variables . Let and denote the explanatory-variable matrix and the response-variable matrix, respectively. The objective of PLSR is to retrieve the constituents and from and . These components should retain information on variation to the greatest extent possible from their corresponding data matrices. At the same time, the interdependence of and ought to be maximized.
After the initial component pair and is extracted, regressions of on and on are performed in sequence. If the regression formulas achieve the required accuracy, the extraction process stops. Otherwise, the residual information is used to extract the next pair of components, namely and . This procedure is repeated until the necessary level of accuracy is reached. If the sum of components is ultimately isolated from , it can be transformed into a regression equation of with respect to the original variables, and the PLSR model is then obtained. The detailed steps are as follows.
(1) Observe
sets of samples for the independent variables and form matrix
. The dependent variable forms matrix
. The data matrices are defined in Equation (3):
where
is the
matrix of independent variables,
is the
matrix of dependent variables, and
is the number of samples. The raw data matrices
and
are standardized to obtain the normalized variable matrices
and
, respectively.
(2) Extract the primary pair of dominant components
and
. According to the principal component criterion, the covariance between
and
should be maximized. The covariance criterion is expressed as Equation (4):
where
and
are the first pair of latent components extracted from
and
, respectively, and
denotes the covariance. Here,
is a linear combination of
with weight vector
, that is,
. Similarly,
is a linear combination of
with weight vector
, that is,
, where
and
are unit vectors.
(3) The regression relationships among
,
,
, and
are established as Equations (5)–(7):
where
and
are the standardized matrices of
and
,
is the first component,
is the loading vector,
is the first component from
,
is the standardized
,
is the corresponding loading vector,
and
are the residual matrices after extracting the first component, and
,
,
, and
are as defined previously.
The regression vectors (loadings and coefficients) are obtained by ordinary least squares (OLS) regressions of the residual matrices on the extracted components. For the regression of
on
, the OLS estimate of the loading vector
minimizes
, yielding
. Since
is normalized (
if implemented with unit norm; otherwise the denominator is
), this simplifies to
. The same derivation applies to
and
[
34]. The regression vectors are calculated using Equation (8):
where
,
, and
correspond to the residual matrices.
(4) Replace and with the residual matrices and . Subsequently, derive the second pair of weight vectors and , as well as the subsequent pair of foremost components and .
(5) Repeat steps 3 and 4. If the rank of
equals
, the decomposition can be formulated as Equations (9) and (10):
where
and
are the residual matrices after extracting
components,
are the extracted components, and
is the loading vector.
(6) Since
can be represented as linear combinations of
, the above formulas can be transformed into the following PLSR regression equation. The final PLSR regression equation is given in Equation (11):
where
is the regression coefficient matrix, and
is the residual term.
A five-fold cross-validation is performed on the training set to determine the optimal number of PLSR components. As shown in
Figure 8, the cross-validation mean squared error (CV-MSE) drops sharply from 2169.10 (
A = 1) to 527.06 (
A = 2) and further decreases to 469.67 (
A = 3). Adding more components beyond three yields negligible improvement in CV-MSE and may risk overfitting. Therefore, we select
A = 3 components for the PLSR model.
4.4. Data Standardization and Preprocessing Process
To prevent the BiLSTM network from becoming trapped in local optima during training and to eliminate the influence of different input dimensions, the input variables should be standardized to zero mean and unit variance. The input variables are standardized using Equation (12):
where
is the
-th sample of the
-th input variable,
is the mean of the
-th input variable, and
is the standard deviation of the
-th input variable.
After the input variables are standardized, correlation analysis is further conducted to filter the input variables. This paper adopts the correlation measure defined by Pearson to quantify the linear interdependence between individual input features and the resulting power. The Pearson correlation coefficient is calculated as in Equation (13):
where
is the Pearson correlation coefficient between variables
and
,
is the number of samples, and
and
are the sample means.
The correlation strength is interpreted as follows. When does not exceed 0.3, no linear correlation exists. A weak linear association is present provided that satisfies . For greater than 0.5 but not exceeding 0.8, the linear relationship is considered significant. A strong linear relationship can be inferred when .
As shown in
Table 3, temperature and humidity exhibit almost no linear correlation with output power. Therefore, they are excluded from the input variables used for linear regression. To better retain the unprocessed data information and lessen the disruption caused by noise in time-series data, raw data preprocessing is required before constructing the forecasting model. The detailed preprocessing procedure is presented in
Figure 9.
The Spearman correlation and NMI confirm that wind speed maintains the strongest monotonic and mutual-information relationship with power, while temperature and humidity remain weakly correlated. Therefore, the same three variables (wind speed, direction, pressure) are retained as inputs for the PLSR component, consistent with the linear analysis.
4.5. Predictive Model Building
Based on the above data standardization, correlation analysis, and noise preprocessing, the predictive model is constructed using the processed dataset.
Figure 10 illustrates the complete workflow of our advanced BiLSTM forecasting model.
First of all, outage data and missing data are removed from the raw dataset. DBSCAN is then used to identify noisy samples, and the detected abnormal samples are corrected by linear imputation. After that, correlation analysis is performed to select the input variables, so that noise interference can be reduced while the temporal characteristics of the original data are preserved as much as possible.
Next, the PLSR model serves to generate the preliminary prediction of wind-storage power. On this basis, the BiLSTM network is employed to forecast the residual error, with the aim of further optimizing prediction precision. Meanwhile, the PSO serves to select the optimal hyperparameters for the BiLSTM network.
The overall procedure of the proposed PLSR–PSO–BiLSTM forecasting framework is summarized in Algorithm 1:
| Algorithm 1: PLSR–PSO–BiLSTM wind power forecasting method. |
Input:
Wind power dataset D
Population size N
Maximum iterations T Learning rate range
Hidden units range H
MiniBatchSize range B Output:
Predicted wind power sequence
1: Normalize dataset D using Min-Max normalization
2: Divide dataset into training set and testing set
3: Construct PLSR model using meteorological variables
4: Obtain preliminary prediction sequence
5: Calculate residual sequence:
6: Construct BiLSTM input sequences:
7: Initialize PSO population:
8: for t = 1 to T do
9: for each particle do
10: Build BiLSTM model using: MiniBatchSize = Batch LearningRate = LR HiddenUnits = Hidden
11: Train BiLSTM model
12: Predict residual sequence
13: Recover final prediction:
14: Compute fitness value:
15: Update personal best position
16: Update global best position
17: end for
18: for each particle do
19: Update inertia weight:
20: Update particle velocity
21: Update particle position
22: Apply boundary constraints
23: end for
24: end for
25: Obtain optimal parameters:
26: Rebuild optimized BiLSTM model
27: Train final forecasting model
28: Predict wind power on testing set
29: Compute evaluation metrics:
30: return forecasting results |
Specifically, the residual series is obtained by subtracting the preliminary PLSR prediction from the actual wind-storage power in the training set. The selected meteorological variables, together with the linear prediction component, are then used as inputs to the BiLSTM model. After PSO-based hyperparameter optimization, the BiLSTM model is trained to predict the residual error. Ultimately, the preliminary PLSR result is corrected by the predicted residual to obtain the final wind-storage power prediction.
By combining data preprocessing, preliminary prediction, residual correction, and hyperparameter optimization within a unified framework, the proposed method achieves good robustness and prediction accuracy.
4.6. Prediction Evaluation
To evaluate prediction performance, the NRMSE serves to measure the overall dispersion of the prediction error. In addition, the NMAE is utilized to determine the mean deviation between the forecast and actual values. The accuracy is measured by NRMSE and NMAE, defined in Equations (14) and (15):
Here, represents the total rated capacity of the wind farm, denotes the predicted power corresponding to the -th time index, denotes the measured power at the -th discrete time instant, and denotes the overall count of prediction points during the evaluation day.
5. Case Analysis
5.1. Simulation
The data were collected from 1 September 2020 to 30 September 2020, with a 15 min sampling period. The wind-storage farm comprises 25 wind turbines rated at 2 MW and 101 turbines rated at 1.5 MW, coupled with a lithium-ion battery energy storage system of 48 MWh/24 MW. The net power output (wind + storage) is measured at the point of common coupling. The data were collected from 1 September to 30 September 2020, with a 15 min sampling period. The battery operates under a multi-function control strategy that (i) charges when wind power exceeds a threshold and discharges during peak hours (peak shaving), (ii) smooths short-term wind power fluctuations via ramp-rate limiting, and (iii) tracks the dispatched power schedule to meet grid requirements. The net power time series thus inherently reflects storage actions; no separate battery state data are used as input. The wind speed, wind direction, and atmospheric pressure are selected as the input variables. After data preprocessing, a total of 2396 valid samples are retained. The corresponding data trends are shown in
Figure 11.
5.2. Simulation Comparison
For the purpose of verifying the effectiveness of the DBSCAN-based noise identification method, prediction models based on BP and BiLSTM are first constructed using the datasets before and after data preprocessing. The input variables include wind speed, wind direction, and atmospheric pressure.
Figure 12 illustrates the scatter distribution of the prediction results before and after preprocessing, and
Table 4 presents the corresponding evaluation index.
As shown in
Figure 12 and
Table 4, compared with the models trained on the raw data, both models trained on the preprocessed data achieve better prediction performance. In addition, the scatter distribution between the observed and the forecasted values is closer to the ideal line
y =
x. These outcomes verify the effectiveness of the proposed noise identification and preprocessing procedure.
To evaluate the proposed model, the prediction results of BP, standard LSTM, Wavelet-LSTM, and the proposed advanced BiLSTM are compared under identical data preprocessing and data split settings. The dataset is split chronologically: the first 2252 samples (approximately 94%) are used for training, and the remaining 144 samples (corresponding to 36 h, given the 15 min sampling interval) are used for testing. No separate validation set is created; instead, early stopping with a patience of 20 epochs is applied based on the training loss to prevent overfitting.
The hyperparameters of each model are set as follows. For the BP neural network, the hidden layer contains 11 neurons, the learning rate is 0.01, and the training epoch is 150. For the standard LSTM model, the network has one hidden layer with 10 neurons and a learning rate of 0.01 and is trained for 150 epochs using the Adam optimizer. For the proposed advanced BiLSTM model, wind speed, wind direction, and atmospheric pressure are used as the inputs of the PLSR component. After optimization by the PSO algorithm, the BiLSTM network contains one hidden layer with 63 neurons and a learning rate of 0.0114 and is trained for 150 epochs with the Adam optimizer. The BiLSTM adopts a recursive multi-step forecasting strategy without a fixed-length sliding window; it utilizes the entire past sequence and updates its internal state step by step. The activation functions are tanh for the state activation and sigmoid for the gate activations inside the BiLSTM layer, while the output fully connected layer uses linear activation (no activation function). All models are trained with the same chronological split and early stopping policy to ensure a fair comparison.
The algorithm parameters used in this study are listed in
Table 5:
Figure 13,
Figure 14 and
Figure 15 show the prediction curves of the four methods (BP, LSTM, Wavelet-LSTM, and the proposed Advanced BiLSTM) at distinct temporal resolutions, and
Table 6 presents the corresponding quantitative evaluation results. In comparison with the BP and conventional LSTM models, the proposed Advanced BiLSTM model exhibits improved prediction accuracy at all considered time scales. According to both evaluation indices, the estimated results generated by the introduced model are more consistent with the measured data. These outcomes demonstrate the performance and robustness of the proposed Advanced BiLSTM model for wind-storage power prediction under noisy data conditions.
To provide a more comprehensive benchmark, we additionally implement a wavelet-based hybrid model (Wavelet-LSTM) using a three-level discrete wavelet transform (DWT) with the db4 wavelet. The original wind power series is decomposed into four sub-series (A3, D3, D2, D1). Each sub-series is predicted independently by an LSTM with the same hyperparameters as the standard LSTM, and the predictions are summed to obtain the final forecast. Strict out-of-sample decomposition is applied to avoid information leakage.
The inferior performance of Wavelet-LSTM can be attributed to two inherent limitations of decomposition-based forecasting methods. First, the boundary effect occurs because wavelet decomposition uses a finite-length signal; at the test set boundary, the decomposed sub-series are distorted, leading to inaccurate predictions. Second, error accumulation arises because the high-frequency components (details) are close to random noise and difficult to predict recursively. Any small error in high-frequency predictions propagates through the summation step and contaminates the final forecast. Recent critical reviews [
35] have pointed out that, when wavelet decomposition is applied in a strict out-of-sample rolling prediction setting, the performance often degrades sharply compared to in-sample or offline settings. In contrast, the proposed PLSR-BiLSTM residual correction framework avoids direct prediction of high-frequency components, thereby achieving much better robustness and accuracy.
5.3. Ablation Study on PSO Parameters
To verify the rationality of the key hyperparameters of the particle swarm optimization (PSO) algorithm used in this work and to avoid arbitrary parameter specification, we conducted a systematic ablation study on four core parameters: population size, number of iterations, inertia weight range, and acceleration coefficients (
,
). For each candidate parameter value, we independently repeated the complete PSO search and BiLSTM training procedure three times and recorded the normalized root mean square error (NRMSE) on the 24 h prediction task. The mean and standard deviation of the NRMSE over the three runs for each setting are summarized in
Table 7.
As shown in
Table 7, when the population size increases from three to five, the mean NRMSE changes only slightly (8.87% vs. 8.90%). However, further increasing the population size to 10 or 15 leads to a noticeable deterioration in prediction accuracy (mean NRMSE rises to 10.07% and 10.74%, respectively), and the standard deviation expands significantly from 0.66% to 2.65%. This trend suggests that, under a limited number of iterations, an excessively large population size, although theoretically increasing population diversity, may prevent the swarm from converging sufficiently and introduce additional random fluctuations. Therefore, a population size of five offers a good balance between solution quality and search stability.
Regarding the number of iterations, increasing the iterations from 10 to 15 reduces the mean NRMSE from 9.66% to 8.90%, a substantial improvement. At 20 iterations the NRMSE slightly increases to 9.29%, while 30 iterations achieve the lowest mean NRMSE of 8.60%. However, as can be seen from the convergence analysis below, the PSO algorithm approaches the global optimum already within the early iterations, and the marginal gain beyond 15 iterations is negligible while the computational cost increases proportionally. Hence, 15 iterations are chosen as the most cost-effective choice.
For the inertia weight range, the classic setting [0.8, 1.2] yields a mean NRMSE of 8.90%, which is very close to the best performing range [0.4, 0.9] (8.73%). The difference is well within one standard deviation and is not statistically significant. Given that [0.8, 1.2] is the most widely used linear decreasing inertia weight scheme in PSO literature, offering better interpretability and reproducibility, we retain this setting.
Among the tested (, ) pairs, the combination , achieves the lowest mean NRMSE (8.90%), outperforming the other three configurations. This indicates that moderately increasing the cognitive component () helps particles more thoroughly explore their own historical best neighborhoods, leading to a better solution in the BiLSTM hyperparameter search space.
In summary, the PSO hyperparameter configuration adopted in this paper is not arbitrary but has been validated through systematic ablation experiments. The chosen parameters either are optimal in terms of accuracy or represent a reasonable trade-off between performance and computational cost, providing a reliable optimization foundation for the construction of the forecasting model.
To further illustrate the search dynamics of the PSO algorithm with the selected parameters (population size = 5, number of iterations = 15, inertia weight range = [0.8, 1.2],
,
), we present a typical convergence curve in
Figure 16, which records the evolution of the best NRMSE found by the swarm in each generation.
As shown in
Figure 16, the global best NRMSE is 7.59% in the first generation, drops sharply to 7.01% in the second generation, and then remains at that level until the 15th generation, with the curve becoming flat. This behavior indicates that the PSO algorithm quickly locates a high-quality search region and converges to a competitive hyperparameter combination within the first few iterations. It also confirms that 15 iterations are sufficient to ensure convergence; further iterations would not bring meaningful improvement, but would increase computational time.
Together, the convergence curve and the ablation results demonstrate that the PSO parameter settings used in this paper neither waste computational resources nor risk premature termination that would miss a better solution. The algorithm reliably and efficiently provides competitive hyperparameters for the BiLSTM network.
5.4. Statistical Significance Analysis
To verify that the observed performance improvements are not due to random chance, two complementary analyses are conducted.
For each pair of models, the Diebold–Mariano (DM) test is applied to the 24 h ahead forecast error sequences (96 time points) [
36]. The null hypothesis is that the two models have equal predictive accuracy.
Table 8 reports the DM statistic and the corresponding
-value. All
-values are below 0.05, strongly rejecting the null hypothesis and confirming that the proposed Advanced BiLSTM significantly outperforms BP, LSTM, and the Wavelet-LSTM baseline.
To provide a non-parametric complement to the Diebold–Mariano test, we applied the Wilcoxon signed-rank test [
37] to the 24 h NRMSE values obtained from the 10 independent runs. This test does not assume normality of the performance metrics and is suitable for small-sample paired comparisons. The results are summarized in
Table 9.
As shown in
Table 9, the proposed Advanced BiLSTM significantly outperforms BP (
p = 0.0254), LSTM (
p = 0.0195), and Wavelet-LSTM (
p = 0.0020) at the 0.05 significance level. These results are fully consistent with the Diebold–Mariano test (
Table 9), confirming the statistical reliability of the proposed model’s superiority over all baselines. The performance of deep learning models can be affected by random initialization. Therefore, the proposed Advanced BiLSTM is trained 10 independent times using the same hyperparameters and data split, with only the random seed changed.
Table 10 summarizes the 24 h NRMSE over these 10 runs. The mean NRMSE is 8.96% with a standard deviation of 1.40%, and the 95% confidence interval is [6.22%, 11.70%]. The best single run achieves 6.93% NRMSE, which is 2.58 and 5.74 percentage points lower than BP (9.51%) and LSTM (12.67%), respectively. Even the worst run (11.31%) remains comparable to or better than the standard LSTM. These results demonstrate that the superiority of the proposed model is statistically reliable and not due to a lucky initialization.
6. Conclusions
In the current work, an advanced BiLSTM prediction model is devised to reduce the influence of data outliers and improve wind-storage power prediction accuracy using measured wind-storage farm data. First, the DBSCAN parameters (ε = 0.9, MinPts = 2.5) are systematically determined via a k-distance plot, DBSCAN is employed to identify abnormal samples in the historical dataset, and linear regression is used to correct the detected noise points. Then, the PLSR model with the number of components set to A = 3 based on five-fold cross-validation is implemented to generate the preliminary wind-storage energy prediction. In addition, the BiLSTM network serves to predict the residual error, which mainly reflects the nonlinear characteristics not captured by the preliminary prediction. Ultimately, the preliminary PLSR result is corrected by the predicted residual to acquire the ultimate wind-storage power forecast. The proposed method is compared not only with BP and standard LSTM, but also with a Wavelet-LSTM baseline with strict out-of-sample decomposition; the proposed model significantly outperforms all baselines. Although the model is validated on wind-storage hybrid data, it is generic and applicable to pure wind power or other renewable-plus-storage systems, provided that the net power time series is used as the target variable.
Comparative simulation results show that, for the 24 h prediction horizon, the proposed method improves prediction accuracy by 2.29%, 11.47%, and 5.54% compared with the BP, Wavelet-LSTM, and standard LSTM models, respectively. Furthermore, the Diebold–Mariano test and 10-independent-run confidence intervals confirm the statistical significance of the improvements. The obtained results reflect that the proposed advanced BiLSTM prediction model is valuable for wind-storage power prediction in noisy data environments.