A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting

Lv, Muyao; Liu, Zejia; Zhang, Chao; Yu, Jiawei; Luo, Chao; Zhu, Yihua

doi:10.3390/eng7060272

Open AccessArticle

A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting

by

Muyao Lv

¹,

Zejia Liu

¹,

Chao Zhang

^1,*

,

Jiawei Yu

²,

Chao Luo

² and

Yihua Zhu

²

¹

School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China

²

State Key Laboratory of HVDC, Electric Power Research Institute, China Southern Power Grid, Guangzhou 510663, China

^*

Author to whom correspondence should be addressed.

Eng 2026, 7(6), 272; https://doi.org/10.3390/eng7060272

Submission received: 11 April 2026 / Revised: 23 May 2026 / Accepted: 29 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue Emerging Trends in Numerical Methods for Renewable Energy Technologies)

Download

Browse Figures

Versions Notes

Abstract

Wind power forecasting is essential for improving renewable energy exploitation and maintaining power system stability. However, influenced by factors such as the velocity and orientation of the wind and atmospheric pressure, wind power exhibits strong variability and uncertainty. Moreover, raw data often contains missing values, shutdown periods, and anomalies, which can degrade forecasting performance. Aiming at solving these challenges, this study develops a wind power forecasting approach integrating data cleaning with a hybrid prediction model. In the preprocessing stage, correlation analysis is employed to select meteorological variables strongly associated with power output as input features, thereby reducing redundancy and improving model effectiveness. Subsequently, missing values and shutdown records are removed, and an improved DBSCAN method is applied to detect anomalous samples. These outliers are then corrected using least squares regression, enhancing data quality while preserving continuity. In the forecasting stage, a hybrid model integrating TCN, BiLSTM, and the Plant Root Growth Optimization (PRGO) algorithm is developed. Specifically, TCN serves to capture local temporal features, while BiLSTM extracts bidirectional temporal dependencies. The PRGO serves to globally optimize model architecture parameters and key hyperparameters, improving convergence efficiency and generalization performance. Experiments on real wind farm data demonstrate that the proposed TCN-BiLSTM-PRGO model consistently outperforms all baselines (TCN, LSTM, TCN-BiLSTM, TCN-Transformer, and TCN-BiLSTM-WOA) across 12 h, 24 h, and 48 h horizons. At 12 h, it achieves a mean R² of 0.942, NMAE of 6.014%, and NRMSE of 7.539% over five runs, improving R² by 0.008–0.123 and reducing NMAE by 0.37–4.57 percentage points compared to other models. It also attains the highest R² at 24 h (0.791) and 48 h (0.833). Statistical significance (p < 0.05) and chronological split tests (R² = 0.940) further confirm their robustness and generalization. The proposed method offers a reliable solution for high-precision wind power forecasting.

Keywords:

wind power forecasting; data cleaning; anomaly detection; DBSCAN algorithm; TCN-BiLSTM hybrid model; PRGO algorithm

1. Introduction

Wind power has become a cornerstone of the global low-carbon energy transition. However, its inherent intermittency and volatility-caused by fluctuating wind speed, wind direction, and air density-pose significant challenges to the secure operation of power systems. Accurate wind power forecasting can effectively reduce reserve capacity requirements, mitigate wind curtailment, and enhance grid dispatch capability. Consequently, it has become a crucial research topic in the field of renewable energy [1,2,3,4,5,6,7,8].

To address data quality issues, extensive research has been conducted on anomaly detection and data-cleaning methods for wind power data. Traditional statistical approaches, such as threshold filtering and the 3σ rule, are commonly used for outlier detection; however, their effectiveness largely depends on the assumption that the data follow a specific distribution. Nevertheless, wind power series often exhibit non-Gaussian characteristics and heavy-tailed distributions, which make such methods prone to false detections and missed anomalies. A survey on anomaly detection, noted that a key limitation of statistical methods is their strong reliance on distributional assumptions; once the observed data deviates from these assumptions, detection performance degrades significantly [9]. Studies in energy system highlighted that although smoothing operations can reduce noise, they may also remove important transient features reflecting real operating conditions, which is equally problematic for anomaly detection in wind power systems [10]. Similarly, it has been pointed out that excessive smoothing or fixed-parameter control strategies may obscure the true thermal response of energy systems under peak load conditions, resulting in inaccurate estimation of energy efficiency and power demand [11]. An integrated anomaly detection framework for wind farm SCADA data by combining EEMD-BiLSTM, inter-turbine wind-velocity correlation analysis, and dynamic power-curve fitting. Their results demonstrated the effectiveness of multi-strategy synergy in identifying complex anomalies [12]. Nevertheless, most existing studies either directly remove DBSCAN-detected outliers or rely on simple imputation techniques such as mean substitution and linear interpolation for data repair. Such approaches often ignore the local temporal variation characteristics around anomalous samples, disrupt data continuity and intrinsic structure, and thereby adversely affect the training performance and generalization capability of subsequent predictive models. It has also emphasized that simply removing or replacing outliers without considering their local context can distort the intrinsic data structure, thereby reducing the reliability of subsequent analyses [13]. A structure called ReTraTree was proposed to preserves anomalous sub-trajectories and enables subsequent reclustering instead of directly discarding outliers, thereby improving the robustness and information integrity of clustering analysis [14].

Beyond data quality issues, prediction-model design is another critical factor affecting forecasting accuracy. Accurate forecasting, therefore, depends not only on reliable data preprocessing, but also on the capability of prediction models to capture complex nonlinear temporal dependencies. Early studies primarily relied on physical approaches or statistical models such as ARIMA and Kalman filtering, which exhibit limited capability in modeling nonlinear dynamic systems. A reviewed of machine-learning applications for wind power forecasting and pointed out the limitations of conventional statistical models in complex nonlinear situations [3]. Driven by deep learning progress, diverse neural network paradigms have been introduced for wind power forecasting. The GRU architecture demonstrated the effectiveness of gated recurrent structures in sequence modeling tasks; however, GRU-based models still struggle to capture extreme fluctuation patterns in wind power forecasting [15]. A CNN-LSTM-AM model for offshore wind turbines was proposed, integrating convolutional feature extraction with an attention mechanism to enhance forecasting accuracy [16]. A systematically comparison of LSTM and Transformer models for wind power forecasting, finding that LSTM performs better in short-term dependency modeling but relies heavily on manual hyperparameter tuning, resulting in unstable generalization performance [6]. Empirical evidence showed that TCN outperforms LSTM and GRU across multiple sequence modeling tasks and offers a longer effective memory [17]. TCNFormer, which integrates TCN with a Transformer encoder and achieves superior accuracy over existing models in forecasting wind speed over short horizons [18]. The Transformer, which relies entirely on self-attention mechanisms and achieved a breakthrough in machine translation tasks [19]. Informer was proposed for long-sequence forecasting, significantly improving long-horizon prediction efficiency through ProbSparse self-attention and a generative decoder [20]. Further studies demonstrated that Transformer-based architectures can effectively capture global temporal dependencies in wind power forecasting, while hybrid Transformer-LSTM structures further improve predictive accuracy [6]. Despite these advances, existing standalone or partially hybridized models still face difficulties in simultaneously achieving robust local feature extraction, bidirectional temporal dependency learning, and adaptive global hyperparameter optimization under highly volatile wind power conditions. In particular, TCN exhibits strong capability in extracting local temporal features through dilated causal convolutions, whereas BiLSTM is more effective in capturing bidirectional temporal dependencies. However, the predictive performance of such hybrid deep learning models remains highly sensitive to hyperparameter configuration, making efficient global optimization essential for achieving stable forecasting performance.

To improve model performance, researchers have introduced optimization algorithms for automatic hyperparameter tuning in deep learning models. Particle Swarm Optimization (PSO) is widely used due to its simplicity and fast convergence; however, it is prone to premature convergence in high-dimensional spaces, leading to a loss of population diversity [21,22]. The Gray Wolf Optimizer (GWO), which reproduces the dominance structure and foraging strategy of gray wolves and achieves a better balance between exploration and exploitation [23]. However, its relatively simple position-updating mechanism still makes it prone to local optima in multimodal optimization problems. Differential Evolution (DE) possesses excellent global exploration capability, but is highly sensitive to control parameters, making parameter tuning itself a nontrivial issue [24]. Although Genetic Algorithms (GAs) are effective for discrete search spaces, they suffer from high computational cost and slow convergence [25]. To address these limitations, the Whale Optimization Algorithm (WOA) was combined with a CNN-BiLSTM model, leveraging WOA’s encircling prey mechanism for global hyperparameter optimization; however, WOA still suffers from premature loss of diversity in high-dimensional search spaces [26]. An Enhanced Grasshopper Optimization Algorithm (EGOA) improves the exploration-exploitation balance using chaos theory and Lévy flight; however, its parameter configuration remains relatively complex [27]. The Butterfly Optimization Algorithm (BOA) was employed to jointly refine VMD decomposition parameters and LSTM hyperparameters; however, BOA suffers from insufficient convergence accuracy in high-dimensional nonlinear optimization [28]. An upgraded Harris Hawks Optimization (IHHO), which effectively avoids local optima through multiple strategies, but at the expense of increased computational complexity [29]. The sparrow-inspired search method with the firefly-inspired optimization method to jointly refine a BiLSTM model [30]. Although this hybrid strategy enhances global search capability, the cooperation of multiple algorithms increases computational overhead. Recently, hybrid deep learning and intelligent optimization frameworks have demonstrated promising potential in renewable energy forecasting applications. For example, a recent study titled “Intelligent Approaches for Global Horizontal Irradiance Forecasting in Saudi Arabia: Benchmarking Deep Learning Coupled with RMSprop Optimization for Solar Power Systems” employed deep neural networks integrated with advanced optimization strategies to improve solar irradiance forecasting accuracy, highlighting the growing importance of combining intelligent optimization algorithms with deep learning architectures in renewable-energy systems [31].

Although these optimization strategies improve forecasting performance to some extent, many of them still suffer from premature convergence, insufficient exploration capability, or excessive computational complexity when dealing with high-dimensional nonlinear optimization problems. These studies indicate that optimization-driven hybrid AI frameworks can substantially enhance forecasting robustness and generalization performance under complex renewable-energy scenarios.

To tackle the aforementioned issues, this study proposes a data cleaning approach that integrates DBSCAN-based anomaly detection with least-squares correction. The proposed method first selects key feature variables via correlation analysis as inputs for the subsequent power forecasting model. After removing missing and outage data, DBSCAN is applied to identify anomalies, which are then corrected using least-squares regression. This process improves data quality while preserving the intrinsic temporal continuity of the wind power sequence. Compared with conventional direct deletion or interpolation strategies, the proposed correction mechanism better maintains local variation characteristics and reduces information loss caused by abnormal samples.

To further improve the reliability of wind power forecasting, this study develops a hybrid TCN-BiLSTM-PRGO forecasting framework integrating temporal deep learning with swarm intelligence optimization. The proposed model combines TCN for local temporal feature extraction, BiLSTM for bidirectional dependency learning, and Plant Root Growth Optimization (PRGO) for adaptive global hyperparameter optimization. Unlike conventional optimization methods, PRGO enhances population diversity through guided growth and stochastic exploration mechanisms, thereby improving the ability to escape local optima in high-dimensional hyperparameter search spaces. Experimental results on real wind-farm datasets demonstrate that the proposed method consistently outperforms existing benchmark models across multiple forecasting horizons, exhibiting superior forecasting accuracy, robustness, and temporal generalization capability. Therefore, the proposed framework provides a reliable and effective solution for high-precision wind power forecasting in complex renewable-energy environments.

2. An Improved DBSCAN-Based Method for Wind Power Data Anomaly Detection and Correction

2.1. Correlation Analysis

First, to identify the feature variables most strongly correlated with power output, the Spearman correlation coefficient serves to quantitatively assess the nonlinear monotonic relationships between the average velocity and orientation of the wind, atmospheric pressure, humidity, temperature, and actual power. The Spearman correlation coefficient is defined as shown in Equation (1).

ρ = 1 - \frac{6 \sum_{i = 1}^{n} d_{i}^{2}}{n (n^{2} - 1)}

(1)

where ρ denotes the Spearman correlation coefficient, n corresponds to the total count of samples, and d_i represents the difference between the ranks of the two variables for the i-th observation. A figure of |ρ| nearer to 1 reflects a higher monotonic dependence, whereas a value in the vicinity of 0 is suggestive of a weak linear link. The dependency measures between meteorological factors and actual power are illustrated in Figure 1.

As shown in the figure, wind speed exhibits the strongest positive correlation with power output, followed by atmospheric pressure and wind direction, while temperature and humidity show relatively weak correlations. Therefore, the velocity and orientation of the wind, and atmospheric pressure are designated as the input feature variables for the subsequent forecasting model.

2.2. DBSCAN Noise Detection

Before performing DBSCAN clustering, the raw data are subjected to basic preprocessing, where outage and missing records are removed to reduce the influence of clearly invalid samples. As shown in Figure 2.

Since different meteorological variables have different units and scales, the data is first mapped into a unified range of [0, 1] through min-max normalization, as illustrated in Equation (2).

x^{'} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(2)

Here, x represents the raw data, which is defined as the normalized value, and x_min and x_max are the minimum and maximum values of the samples, in sequence. This approach effectively mitigates the influence of differing units on distance computation and improves the stability of DBSCAN clustering.

Next, DBSCAN is employed for anomaly detection. Its core idea is to form clusters based on density reachability and identify low-density points as noise.

DBSCAN defines clusters in terms of core, boundary, and outlier points, where a core point satisfies Equation (3).

|N_{ε} (x_{i})| \geq \min_s a m p l e s

(3)

If a point x_j lies within the neighborhood of a core point, it satisfies:

x_{j} \in N_{ε} (x_{i})

Then x_j belongs to the same cluster. DBSCAN requires two key parameters. The minimum samples (min_samples) define the minimum number of neighborhood points required for a core point, effectively serving as a density threshold, as shown in Equation (4).

\min_s a m p l e s \approx \ln (n)

(4)

Here, n is the total number of observations. A smaller value of min_samples makes points more likely to form clusters, resulting in fewer detected noises; conversely, a larger value imposes stricter clustering conditions, leading to more points being classified as noise.

In addition, the key parameter of DBSCAN is the neighborhood radius ε, which directly affects the clustering results. A larger ε facilitates cluster formation and reduces noise, whereas a smaller ε imposes stricter clustering criteria and increases noise. Therefore, selecting ε requires a trade-off between denoising capability and data retention.

2.3. Least-Squares Correction

For the noise points identified by DBSCAN, instead of being directly removed, this study applies Ordinary Least Squares (OLS) regression for correction to reduce data loss and improve data continuity.

First, a linear relationship between wind power and input features is assumed, as shown in Equation (5).

y = β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{p} x_{p} + ε

(5)

Here, y denotes wind power, x_p represents the input features (e.g., the velocity and orientation of the wind, and pressure), β_p is the regression coefficient, and ε is the deviation term.

The objective of the least-squares method is to minimize the sum of squared errors, with its matrix form given in Equation (6).

\overset{⌢}{β} = {(X^{T} X)}^{- 1} X^{T} y

(6)

Here, X is defined as the feature matrix, y indicates the observed power output, and

\overset{⌢}{β}

is the estimated regression coefficient vector.

Finally, normal points retain their original values, while for DBSCAN-identified outliers, x_noise, the predicted values from the model are used as replacements, as shown in Equation (7).

{\overset{⌢}{y}}_{n o i s e} = X_{n o i s e} \overset{⌢}{β}

(7)

The overall workflow of the DBSCAN-based wind power data anomaly detection and correction method is shown in Figure 3.

3. Wind Power Forecasting Model Based on TCN-BiLSTM-PRGO

3.1. TCN Model

TCN, a convolutional neural network architecture, is designed specifically for time series data. Its core idea is to model long-term dependencies using one-dimensional causal convolution combined with dilated convolution, while maintaining parallelism in the training process.

First, the causal convolution in TCN guarantees that the output corresponding to time t is based exclusively on the current and past input, as presented in Equation (8).

y_{t} = \sum_{i = 0}^{k - 1} ω_{i} \cdot x_{t - i}

(8)

Here, k represents the convolution kernel dimension, and ω_i represents the convolutional weights. This structure prevents future information leakage and is highly suitable for time series forecasting tasks.

Next, to enlarge the receptive field, TCN introduces dilated convolution, as formulated in Equation (9).

y_{t} = \sum_{i = 0}^{k - 1} ω_{i} \cdot x_{t - d \cdot i}

(9)

Here, d denotes the dilation factor. The receptive field of TCN is given in Equation (10).

R = (k - 1) \sum_{l = 0}^{L - 1} d_{l} + 1

(10)

With this structure, the model can capture long-range dependencies using fewer layers. Finally, TCN introduces residual connections to mitigate the difficulty in training deep networks, as illustrated in Equation (11).

y = F (x) + x

(11)

In summary, the overall TCN workflow is illustrated in Figure 4.

Compared with traditional RNNs/LSTMs, TCN provides advantages consisting of strong parallel computing capability, high training efficiency, no gradient vanishing, and superior long-sequence modeling ability. However, it still has limitations in capturing global temporal dependencies.

3.2. BiLSTM Model

BiLSTM extends LSTM by introducing forward and backward processing paths to capture complete temporal dependencies. Its basic structure is presented in Figure 5.

In LSTM, information flow is controlled by gating mechanisms, while BiLSTM employs a bidirectional structure that computes both forward and backward hidden states simultaneously, as shown in Equation (12).

h_{t} = [{\overset{⇀}{h}}_{t}, {\overset{↼}{h}}_{t}]

(12)

The bidirectional structure of BiLSTM enables it to capture complete temporal context and enhances its ability to model complex sequences. However, it suffers from long training time, sensitivity to hyperparameters, and susceptibility to local optima.

3.3. PRGO-Based Hyperparameter Optimization

PRGO is a swarm intelligence optimization algorithm inspired by plant root growth behavior. Its core idea is to explore optimal solutions within the search space through a “guided growth + random exploration” mechanism. It consists of four processes: main root growth (exploitation), lateral shoot expansion (local exploration), random root search (global exploration), and root population competition (selection).

First, the main root grows toward the current best solution, as shown in Equation (13).

X_{i}^{t + 1} = X_{i}^{t} + α (X_{b e s t} - X_{i}^{t})

(13)

Here, X_best denotes the current best solution, and α is the step-size factor. Then, local exploration is performed by perturbing around the current solution, as shown in Equation (14).

X_{i}^{t + 1} = X_{i}^{t} + β \cdot ℕ (0, 1)

(14)

This enhances the ability to escape local optima and enables global exploration, as shown in Equation (15).

X_{i}^{t + 1} = X_{\min} + r a n d (- 1, 1) \cdot (X_{\max} - X_{\min})

(15)

Finally, the best individual is selected based on the fitness function, as shown in Equation (16).

X_{b e s t} = \arg \min f (X_{i})

(16)

The population is initialized by randomly sampling each individual within the search space, as shown in Equation (17).

X^{(0)} = {x_{1}^{(0)}, x_{2}^{(0)}, \dots, x_{N}^{(0)}}, x_{i}^{(0)} = (l r_{i}^{(0)}, h_{i}^{(0)})

(17)

where

N = 5

is the population size,

l r_{i}^{(0)} \sim U [L_{m i n}, L_{m a x}]

is uniformly sampled from the learning rate interval, and

h_{i}^{(0)}

is uniformly sampled from the discrete set H.

Let the hyperparameter vector be denoted as

θ = (l r, h)

, where

l r

is the learning rate and

h

is the number of BiLSTM hidden units. The search space is defined as Equation (18).

l r \in [L_{m i n}, L_{m a x}] = [5 \times 10^{- 5}, 3 \times 10^{- 4}], h \in H = {64, 96, 128, 160}

(18)

where

L_{m i n}

and

L_{m a x}

are the lower and upper bounds of the learning rate, and H is the discrete set of allowable hidden unit counts.

The objective of PRGO is to find the optimal hyperparameters that minimize the training loss after a short pre-training of

E_{PRGO}

epochs. The loss for a given candidate θ is computed as the mean squared error (MSE) over a mini-batch, as shown in Equation (19).

θ^{*} = \arg \min_{θ} \frac{1}{N_{b a t c h}} {\sum_{k = 1}^{N_{b a t c h}} (y_{k} - {\hat{y}}_{k} (θ))}^{2}

(19)

where

N_{batch}

is the batch size,

y_{k}

is the true power value, and

{\hat{y}}_{k} (θ)

is the prediction of the TCN-BiLSTM model configured with hyperparameters θ. The fitness of an individual is taken as

f (θ) = θ^{*}

. The individual with the smallest fitness is recorded as the global best θ_best.

After applying the update rules Equations (14)–(16), the following elitism and boundary constraints are enforced, as shown in Equations (20) and (21).

If f (θ_{i}^{(t + 1)}) < f (θ_{b e s t}), θ_{b e s t} \leftarrow θ_{i}^{(t + 1)}

(20)

θ_{1}^{(t + 1)} = θ_{b e s t}, l r_{n e w} = \max (L_{\min}, \min (L_{\max}, l r_{r a n d}))

(21)

where

θ_{i}^{(t+ 1)}

denotes the

i

-th individual in the next generation, and

l r_{rand}

is the learning rate after the random root search (15). Equation (21) ensures that the best individual is carried to the next generation (elitism) and that the learning rate never exceeds the predefined bounds.

In summary, the PRGO-based hyperparameter optimization process is visualized in Figure 6.

3.4. Construction of the TCN-BiLSTM-PRGO Model

The TCN-BiLSTM-PRGO model is a hybrid forecasting framework that integrates deep learning with an intelligent optimization algorithm. In this model, TCN is used for local temporal feature extraction, BiLSTM for modeling long- and short-term dependencies, and PRGO for automatic hyperparameter optimization, thereby improving overall prediction performance. The overall model is expressed in Equation (22).

\overset{⌢}{y} = f_{F C} (f_{B i L S T M} (f_{T C N} (X)))

(22)

Here, f_TCN denotes local feature extraction, f_BiLSTM represents temporal dependency modeling, and f_FC refers to nonlinear mapping. The optimization objective is given in Equation (23).

\min_{θ} \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\overset{⌢}{y}}_{i})^{2}

(23)

PRGO is used to search for the optimal hyperparameters, as shown in Equation (24).

θ^{*} = \arg \min f (θ)

(24)

To make the coupling mathematically explicit, we define the input sequence and the output dimensions of each module. Let the input be a multivariate time series, as shown in Equation (25)

X = [x_{1}, x_{2}, \dots, x_{T}] \in ℝ^{T \times d_{i n}}

(25)

where

T = 96

is the look-back window length (i.e., 24 h at 15 min intervals) and

d_{in} = 4

is the number of input features (wind speed, wind direction, pressure, and historical power). The TCN first extracts local temporal features as shown in Equation (26).

H_{T C N} = T C N (X) \in ℝ^{T \times d_{t c n}}

(26)

where

d_{tcn} = 64

is the output channel dimension of the TCN.

The resulting feature map is then fed into a BiLSTM layer, which processes the sequence in both forward and backward directions as shown in Equation (27).

H_{B i L S T M} = B i L S T M (H_{T C N}) \in ℝ^{T \times 2 h}

(27)

where

h

is the number of hidden units per direction—a key hyperparameter that PRGO will optimize. The hidden state at the last time step, denoted as

h_{last} \in R^{2 h}

, is extracted from

H_{BiLSTM}

. This vector is passed to a fully connected layer to produce the final prediction as shown in Equation (28).

\hat{y} = W_{f c} h_{l a s t} + b_{f c}

(28)

where

W_{fc} \in R^{1 \times 2 h}

and

b_{fc} \in R

are the learnable weights and bias of the output layer, and

\hat{y}

is the predicted wind power for the next time step (15 min ahead).

The PRGO algorithm optimizes the hyperparameter set

θ = (l r, h)

over the search space defined in (18) by minimizing the validation loss, as shown in Equation (29).

θ^{*} = \arg \min_{θ} \frac{1}{N_{v a l}} {\sum_{i = 1}^{N_{v a l}} (y_{i} - {\hat{y}}_{i} (θ))}^{2}

(29)

where

N_{val}

is the number of samples in the validation set,

y_{i}

is the true value, and

{\hat{y}}_{i} (θ)

is the prediction of the TCN-BiLSTM model configured with θ. After the optimal hyperparameters are found, the final model is retrained on the entire training set for

E_{final} = 100

epochs, as described in Algorithm 1.

Algorithm 1 TCN-BiLSTM-PRGO Procedure

Input:
Wind power dataset D
Population size N
Maximum iterations T
Search ranges:
learning rate ∈ [lr_min, lr_max]
hidden units ∈ H
Epochs for PRGO evaluation E1
Epochs for final training E2

Output:
Optimized TCN-BiLSTM forecasting model

1.    Normalize dataset D using Min-Max normalization
2.    Construct time-series samples using sliding window
3.    Split dataset into training set and testing set

4.    Initialize PRGO population:
Xi = [lri, hiddeni], i = 1, 2, …, N

5.    FOR t = 1 TO T DO

6. FOR each individual Xi DO

7.   Build TCN-BiLSTM model using:
learning rate = lri
hidden units = hiddeni

8.   Train model for E1 epochs on training set

9.   Compute fitness value:
Fitness (Xi) = MSE loss

10. Update global best solution Xbest

11. END FOR

12. FOR each individual Xi DO

13. Update learning rate:
lr_new = lr + α (lr_best − lr)
+ β · rand (−1,1) · lr

14. Random exploration:
IF rand < p THEN
randomly reset lr and hidden units
END IF

15. Clip lr_new into valid range

16. Randomly update hidden units

17. Generate new individual Xi_new

18. END FOR

19. END FOR

20. Obtain optimal parameters:
lr_best, hidden_best

21. Build final TCN-BiLSTM model using optimal parameters

22. Train final model on training set for E2 epochs

23. Predict wind power on testing set

24. Compute evaluation metrics:
R², NMAE, NRMSE

25. Return forecasting results

The operational flow for predicting wind power using the TCN-BiLSTM-PRGO is plotted in Figure 7.

A standalone TCN lacks global dependency modeling capability, while a standalone LSTM is less effective in extracting local features. In contrast, the TCN-BiLSTM model integrates TCN for local temporal pattern capture and BiLSTM for global bidirectional dependency modeling, thereby improving feature extraction efficiency and prediction accuracy. In addition, PRGO automatically searches for optimal hyperparameters such as learning rate and hidden layer size, effectively avoiding instability and local optima caused by manual tuning, and enabling efficient modeling of complex wind power data.

3.5. Performance Evaluation Metrics

This study uses the coefficient of determination (R²), NMAE, and NRMSE as evaluation metrics to appraise the performance of the wind power forecasting model, as shown in Equations (30)–(32).

N M A E = \frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i} - {\overset{⌢}{y}}_{i}|}{P_{\max}}

(30)

N R M S E = \frac{\sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}}}{P_{\max}}

(31)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\overset{⌢}{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(32)

Here, n corresponds to the sum of all observations; y_i is the actual wind power when considering the i-th sample;

{\overset{⌢}{y}}_{i}

is the predicted value that corresponds to it; and

\bar{y}

signifies the average of all actual power values.

In summary, the overall framework of this study, including data preprocessing, DBSCAN-based noise detection, least-squares correction, and the TCN-BiLSTM-PRGO predictive model for wind power generation, is illustrated in Figure 8:

4. Experimental Results

4.1. Data Description and Parameter Settings

The data on wind power adopted in this study are sourced from the actual operational records of 126 wind machines in a wind power plant in Gansu Province, including 101 units with a nameplate capacity of 1.5 MW and 25 units with a nameplate capacity of 2 MW. Across the timespan 1 July 2020–1 September 2020, the data are sampled at 15 min intervals.

During data preprocessing, the average velocity (m/s) and orientation of the wind, and pressure (hPa) are selected as input variables for the forecasting model. After removing outage records and missing samples, a total of 8219 valid data points are obtained. The distribution of the processed data is shown in Figure 9.

4.2. Feature Sensitivity Analysis

Section 2.1 employed the Spearman correlation coefficient to identify the meteorological variables most strongly associated with wind power. The results indicated that wind speed has the highest correlation (ρ ≈ 0.85), followed by wind direction and pressure (ρ ≈ 0.30 and 0.25, respectively), while temperature and humidity exhibit very weak correlations. Based on this quantitative assessment, wind speed, wind direction, and pressure were selected as input features for the forecasting model.

To validate this feature selection, we performed an ablation study using the proposed TCN-BiLSTM-PRGO model on a fixed chronological split (80% training, 20% testing). The model was trained and evaluated on four feature sets: (1) all three features, (2) without pressure, (3) without wind speed, and (4) without wind direction. Table 1 reports the performance on the test set.

Removing wind speed causes the most severe performance degradation: R² drops from 0.940 to 0.929, while NMAE and NRMSE increase by 0.55 and 0.64 percentage points, respectively. This confirms that wind speed is the most critical predictor, consistent with its high correlation coefficient. Removing wind direction also reduces performance, though to a lesser extent (R² = 0.938, NMAE = 4.252%), indicating that wind direction provides useful but secondary information. Interestingly, removing pressure yields a slight improvement (R² = 0.943, NMAE = 4.006%). This phenomenon can be explained by the weak correlation between pressure and power (ρ ≈ 0.25), combined with the limited three-month dataset; a weak feature may introduce noise or cause mild overfitting. Nevertheless, we retain pressure in the final model for two reasons: first, from a physical standpoint, pressure affects air density and thus wind energy conversion, and its importance may become pronounced under extreme weather or longer forecasting horizons; second, the performance gain from removing pressure is marginal (0.003 in R²) and practically negligible. Overall, the ablation study strongly supports the feature selection made in Section 2.1 and justifies the inclusion of wind speed, wind direction, and pressure as inputs.

4.3. Experimental Results and Comparison

A total of 8219 data samples are used in this study. According to Equation (4), min_samples is approximately set to 9. Based on this value, the K-distance curve with min_samples = 9 is used to determine ε, as shown in Figure 10.

The figure confirms the core principle of DBSCAN: a larger ε leads to easier cluster formation and fewer detected noise points. A clear inflection region is observed between 0.075 and 0.125, where the noise ratio decreases from 20% to 8%, and then to below 5%, followed by a relatively stable trend. Therefore, the optimal range of ε is identified as 0.075–0.125, as it effectively removes obvious outliers while preserving most normal data and avoiding over-cleaning. In this study, ε values of 0.08, 0.10, and 0.12 within this range are selected for comparison, and the corresponding noise detection results are presented in Figure 11.

As shown in Figure 11, a large number of scattered outliers are present in the dataset, with pronounced fluctuations particularly in the medium-to-high wind speed range (approximately 8–15 m/s). These outliers deviate significantly from the physical wind speed-power relationship; if not properly handled, they may severely interfere with model training. When ε = 0.08, the neighborhood radius is relatively small, making it difficult for samples to form effective density connections, which leads to a large number of border points being misclassified as noise and a high noise ratio. In this case, the model exhibits an over-cleaning issue, where normal samples may be mistakenly removed, thereby distorting the data distribution. When ε = 0.12, the neighborhood radius becomes significantly larger, causing many true outliers to be absorbed into clusters. As a result, the noise ratio decreases markedly, indicating an under-cleaning condition where a considerable amount of anomalous data remains, thereby degrading the accuracy of subsequent model training. When ε = 0.10, the noise ratio is approximately 5%, achieving a good balance between anomaly detection capability and data retention. This setting effectively removes obvious outliers while avoiding excessive cleaning of normal data. Therefore, considering both denoising performance and data preservation, ε = 0.10 is selected as the optimal value. This parameter effectively eliminates clear anomalies while preventing information loss caused by over-cleaning.

For DBSCAN noise detection with min_samples = 9 and ε = 0.10, the results of noise correction using the least-squares method are shown in Figure 12.

As shown in Figure 12, the data distribution clearly converges toward the main wind speed-power trend, exhibiting a smoother pattern that better conforms to physical laws. Compared with direct outlier removal, this approach avoids data sparsity and information loss caused by sample deletion. Moreover, the corrected data more closely match the true wind power curve, thereby reducing the impact of abnormal fluctuations on subsequent model training.

After DBSCAN-based anomaly detection and least-squares correction, the processed dataset serves for wind power projection experiments. The prediction results and corresponding errors are analyzed at 12 h, 24 h, and 48 h forecasting horizons, respectively. To examine the effectiveness of the proposed TCN-BiLSTM-PRGO model, several benchmark models are constructed under the same data input and training settings for comparison, including the TCN model, unidirectional LSTM model, and TCN-BiLSTM hybrid model. For all models, the input data are unified as multivariate time series with a window length of 96, leading to an input feature dimension of 4, namely the average velocity and orientation of the wind, pressure, and historical power. A single neuron is included in the output layer to predict future wind power. For the LSTM model, a single-layer architecture is used with 64 hidden units. Leveraging the Adam optimizer, the model is subjected to 100 training epochs, with the learning rate initialized to 0.001. The TCN model uses convolutional channel sizes of 32, 32, and 64, progressively expanding the receptive field to capture local dynamic features at different temporal scales. In the TCN-BiLSTM model, TCN first serves to retrieve local temporal features, which are then fed into a BiLSTM network for further modeling. On this basis, the PRGO is further incorporated to adaptively optimize the key hyperparameters of the TCN-BiLSTM model. The parameter settings of PRGO are presented in Table 2.

The convergence curve of the PRGO process obtained under the above parameter settings is shown in Figure 13.

The PRGO loss curve exhibits a pattern of rapid decline, stage-wise stabilization, and further decrease. In the initial stage (iterations 1–2), the loss drops sharply to approximately 0.0015, indicating strong global search capability. It then remains relatively stable during iterations 4–6 and 7–14, suggesting a transition to local search and gradual convergence toward a local optimum. Small additional reductions occur at iterations 6–7 and 14–15, demonstrating the algorithm’s capability to break free from local optima and further explore the solution space. After iteration 15, the curve stabilizes, indicating convergence. Overall, PRGO achieves both fast convergence and effective global optimization capability.

To verify the effectiveness of DBSCAN-based noise detection and least-squares correction in improving power forecasting accuracy, Figure 14a,b present comparisons between predicted and actual values before and after DBSCAN processing, respectively.

From the overall distribution, the unprocessed data in Figure 14a shows a high degree of scatter, with many points deviating from the diagonal line. In particular, noticeable prediction errors appear in the medium-to-high power range, indicating that anomalous samples in the raw data interfere with model fitting.

In Figure 14b, after DBSCAN-based detection and correction, the data points are more densely concentrated around the diagonal line, with significantly reduced dispersion and fewer outliers, indicating an effective improvement in data quality. Meanwhile, the regression fitting line aligns more closely with the ideal diagonal, demonstrating enhanced consistency between model outputs and ground truth.

Overall, the combination of DBSCAN and the correction method effectively reduces the consequences of anomalous data on model training and improves the consistency of the data distribution, thereby boosting the exactness and trustworthiness of wind power estimation.

The comparison between the estimated and the observed power curves is presented in Figure 15, Figure 16 and Figure 17.

To evaluate the robustness and statistical reliability of the proposed model, five independent experiments were conducted for all compared models using different random seeds: [42, 52, 62, 72, 82]. The random seeds affect neural network weight initialization, dropout operations, and mini-batch shuffling during training. For each model, the mean ± standard deviation of all evaluation metrics across the five runs is reported, as shown in Table 3,Table 4 and Table 5.

Across all compared models, the proposed TCN-BiLSTM-PRGO model achieves the best predictive performance under the 12 h forecasting horizon. Specifically, PRGO obtains the highest mean R² value of 0.942, together with the lowest NMAE (6.014%) and NRMSE (7.539%). Moreover, the standard deviations of all metrics remain very small (e.g., R² ± 0.006), indicating excellent robustness and training stability under different random initializations.

Compared with the second-best model, TCN-BiLSTM-WOA, the proposed method improves R² from 0.934 to 0.942 while reducing NMAE and NRMSE by approximately 0.37% and 0.49%, respectively. Although WOA also improves the baseline TCN-BiLSTM model, PRGO consistently achieves better optimization performance, suggesting a more effective balance between global exploration and local exploitation during hyperparameter optimization.

In contrast, the TCN-Transformer model exhibits relatively poor performance, with a mean R² of only 0.819 and a relatively large standard deviation of 0.076. This may be attributed to the relatively limited training data, which can restrict the ability of Transformer-based architectures to fully learn long-range attention patterns.

As the forecasting horizon increases to 24 h, the prediction difficulty becomes significantly higher for all models. Nevertheless, the proposed TCN-BiLSTM-PRGO model still achieves the best overall performance, with a mean R² of 0.791, NMAE of 8.248%, and NRMSE of 13.387%.

Compared with TCN-BiLSTM-WOA, PRGO improves the mean R² from 0.760 to 0.791 while reducing NMAE from 8.736% to 8.248%. Similarly, compared with the conventional LSTM model, PRGO improves R² by approximately 0.060 and reduces NMAE by nearly 1%. These results demonstrate that the proposed optimization strategy remains effective under medium-term forecasting conditions.

Furthermore, the standard deviations of the PRGO model remain comparable to those of the baseline methods, indicating stable predictive performance across different random seeds.

Under the more challenging 48 h forecasting horizon, the proposed TCN-BiLSTM-PRGO model continues to achieve the best mean predictive accuracy among all compared methods, with an R² of 0.833, NMAE of 6.805%, and NRMSE of 11.312%.

Compared with TCN-BiLSTM-WOA, PRGO further improves R² from 0.806 to 0.833 while simultaneously reducing both NMAE and NRMSE. Although the standard deviation of R² for PRGO (±0.096) is slightly larger than that of several baseline models, its mean performance remains consistently superior. This result indicates that the proposed optimization strategy maintains strong predictive capability even under longer forecasting horizons.

Overall, the proposed TCN-BiLSTM-PRGO framework demonstrates clear and consistent advantages over all baseline models across different forecasting horizons. The integration of TCN for local temporal feature extraction, BiLSTM for bidirectional sequence modeling, and PRGO for hyperparameter optimization significantly improves forecasting accuracy and robustness. Although the proposed model requires longer training time due to the additional optimization process, the computational overhead remains acceptable for offline wind power forecasting applications.

4.4. Statistical Significance Analysis

To further verify whether the observed performance improvements are statistically reliable rather than caused by random fluctuations, statistical significance tests were conducted based on the repeated experimental results under the 48 h forecasting horizon.

Using the NMAE results obtained from the five independent runs, paired t-tests and Wilcoxon signed-rank tests were performed between the proposed TCN-BiLSTM-PRGO model and each baseline model. Table 6 reports the corresponding p-values.

From the p-values presented in Table 6, it can be observed that the proposed TCN-BiLSTM-PRGO model significantly outperforms all baseline models under the 48 h forecasting horizon. For TCN, LSTM, TCN-BiLSTM, and TCN-Transformer, both the paired t-test and Wilcoxon signed-rank test produce extremely small p-values (all much smaller than 0.001), indicating highly significant improvements.

Compared with TCN-BiLSTM-WOA, the proposed PRGO model also achieves statistically significant superiority, with t-test and Wilcoxon p-values of 0.042 and 0.046, respectively. Although WOA improves forecasting performance to some extent, PRGO still demonstrates better optimization capability under the more challenging long-term forecasting scenario.

These results confirm that the performance improvements achieved by the proposed method are statistically reliable rather than caused by random initialisation or stochastic training fluctuations. Therefore, the proposed PRGO-based optimization framework exhibits strong robustness and stable generalization capability for wind power forecasting tasks.

4.5. Temporal Generalization Evaluation

To further evaluate the generalization capability of the proposed method on truly unseen future data, a strict chronological split strategy was adopted. The preprocessed dataset was divided chronologically into a training set containing the first 80% of samples and a test set containing the remaining 20% future samples. This setting ensures that all testing samples come from future time periods never observed during training.

All compared models, including TCN, LSTM, TCN-BiLSTM, TCN-Transformer, TCN-BiLSTM-WOA, and the proposed TCN-BiLSTM-PRGO model, were trained and evaluated under the same chronological split condition. The corresponding results are presented in Table 7.

As shown in Table 7, the proposed TCN-BiLSTM-PRGO model achieves the best overall performance on unseen future data, with the highest R² value of 0.940 and the lowest NMAE (4.224%) and NRMSE (7.232%).

Compared with the second-best model, LSTM, PRGO improves R² from 0.935 to 0.940 while reducing NMAE and NRMSE by approximately 5.4% and 3.6%, respectively. Moreover, PRGO consistently outperforms both the conventional TCN-BiLSTM model and the WOA-optimized variant, whose R² values remain around 0.926.

Although WOA improves the baseline model to some extent, it still fails to match the predictive performance of PRGO, further demonstrating the effectiveness of the proposed optimization strategy. Meanwhile, the TCN-Transformer model exhibits the weakest generalization performance, likely because Transformer-based architectures generally require larger datasets to effectively learn long-range attention relationships.

Overall, the proposed PRGO-optimized framework maintains strong predictive accuracy under strict chronological testing conditions, demonstrating excellent temporal generalization capability for real-world wind power forecasting applications.

Figure 18 illustrates the prediction curves of all compared models on the unseen future test set under the chronological split.

The true power values are shown in black, and the predictions of TCN-BiLSTM-PRGO are highlighted. It can be observed that the proposed PRGO-optimized model closely follows the actual power fluctuations across the entire test period, capturing both the general trend and local variations. In contrast, other models such as TCN, LSTM, TCN-BiLSTM, TCN-Transformer, and TCN-BiLSTM-WOA deviate more noticeably from the ground truth, especially during periods of rapid power changes. This visual comparison further confirms the quantitative results reported in Table 7, demonstrating the superior tracking ability and generalization performance of the proposed TCN-BiLSTM-PRGO model on truly future data.

5. Discussion

The proposed TCN-BiLSTM-PRGO model introduces additional computational overhead due to the iterative hyperparameter optimization by PRGO (approximately 800 s per model). This cost is acceptable for offline training where high accuracy is prioritized over real-time constraints. For mega-scale wind farms with hundreds of turbines and multi-year data, the training time may increase proportionally. However, several strategies can be adopted to address this challenge: (1) parallelization—PRGO individuals are independent and can be evaluated in parallel on multiple GPUs/CPUs; (2) early stopping—fitness evaluation can be truncated if validation error stalls; (3) surrogate-assisted optimization—a lightweight proxy model can replace expensive retraining during early iterations; (4) transfer learning—a model pre-trained on a subset of turbines can be fine-tuned for others, drastically reducing overall training time. With these engineering solutions, the proposed method remains feasible for large-scale applications.

6. Conclusions

This study makes three main contributions to wind power forecasting.

First, we propose a data cleaning approach that integrates DBSCAN-based anomaly detection with least-squares regression correction. Unlike conventional methods that simply discard outliers, this strategy preserves data continuity and improves the quality of training data.

Second, we develop a hybrid deep learning architecture, TCN-BiLSTM, which combines the local feature extraction capability of TCN with the bidirectional temporal dependency modeling of BiLSTM.

Third, we introduce the Plant Root Growth Optimization (PRGO) algorithm to automatically optimize the key hyperparameters (learning rate and hidden units) of the TCN-BiLSTM model. The PRGO algorithm balances global exploration and local exploitation, effectively avoiding the local optima commonly encountered in manual tuning or other metaheuristic algorithms.

Experimental validation on real wind farm data over multiple forecasting horizons (12 h, 24 h, 48 h) demonstrates that the proposed TCN-BiLSTM-PRGO model consistently outperforms all baselines (TCN, LSTM, TCN-BiLSTM, TCN-Transformer, and TCN-BiLSTM-WOA) in terms of R², NMAE, and NRMSE. Statistical significance tests (p < 0.05) confirm that the improvements are not due to random fluctuations. A chronological split experiment further verifies the model’s strong generalization to unseen future data (R² = 0.940).

Future work will focus on reducing the computational cost of PRGO for mega-scale wind farms, extending the framework to other renewable energy forecasting tasks (e.g., solar PV), and incorporating probabilistic forecasting to quantify prediction uncertainty.

Author Contributions

Methodology, M.L. and Z.L.; validation, J.Y.; formal analysis, C.L.; investigation, Y.Z.; writing—original draft, Z.L. and M.L.; writing—review and editing, C.Z.; supervision, J.Y.; project administration, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2023YFB4203200; The authors greatly appreciate the financial support of the Natural Science Basic Research Program of Shaanxi (2024JC-YBMS-434, 2025JC-YBMS-504). Key Research and Development Program of Xianyang City (S2025-ZDYF-GDZB-4806)

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Jiawei Yu, Chao Luo and Yihua Zhu are employed by China Southern Power Grid (China). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Yang, Y.; Lou, H.; Wu, J.; Zhang, S.; Gao, S. A survey on wind power forecasting with machine learning approaches. Neural Comput. Appl. 2024, 36, 12753–12773. [Google Scholar] [CrossRef]
Rajaperumal, T.A.; Christopher Columbus, C. Enhanced wind power forecasting using machine learning, deep learning models and ensemble integration. Sci. Rep. 2025, 15, 20572. [Google Scholar] [CrossRef] [PubMed]
Haq, I.U.; Kumar, A.; Rathore, P.S. Machine learning approaches for wind power forecasting: A comprehensive review. Discov. Appl. Sci. 2025, 7, 1139. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, L.; Xiong, X.; Li, G.; Wang, G.; Lin, L. Long-term Wind Power Forecasting with Hierarchical Spatial-Temporal Transformer. arXiv 2023. [Google Scholar] [CrossRef]
Sarkar, M.R.; Anavatti, S.G.; Dam, T.; Pratama, M.; Al Kindhi, B. Enhancing Wind Power Forecast Precision via Multi-head Attention Transformer: An Investigation on Single-step and Multi-step Forecasting. arXiv 2023. [Google Scholar] [CrossRef]
Arun Kumar, M.; Rithick Joshua, K.; Sahana, R.; Caroline Dorathy Esther, J.; Kavitha Devi, M.K. Predicting wind power using LSTM, Transformer, and other techniques. Clean Technol. Recycl. 2024, 4, 125–145. [Google Scholar] [CrossRef]
Tsai, W.-C.; Hong, C.-M.; Tu, C.-S.; Lin, W.-M.; Chen, C.-H. A Review of Modern Wind Power Generation Forecasting Technologies. Sustainability 2023, 15, 10757. [Google Scholar] [CrossRef]
Lin, W.-H.; Wang, P.; Chao, K.-M.; Lin, H.-C.; Yang, Z.-Y.; Lai, Y.-H. Wind Power Forecasting with Deep Learning Networks: Time-Series Forecasting. Appl. Sci. 2021, 11, 10335. [Google Scholar] [CrossRef]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
Lygnerud, K.; Ottosson, J.; Kensby, J.; Johansson, L. Business models combining heat pumps and district heating in buildings generate cost and emission savings. Energy 2021, 234, 121202. [Google Scholar] [CrossRef]
Beaudry, G.; Pasquier, P.; Marcotte, D.; Zarrella, A. Flow rate control in standing column wells: A flexible solution for reducing the energy use and peak power demand of the built environment. Appl. Energy 2022, 313, 118774. [Google Scholar] [CrossRef]
Wen, W.; Liu, Y.; Sun, R.; Liu, Y. Research on Anomaly Detection of Wind Farm SCADA Wind Speed Data. Energies 2022, 15, 5869. [Google Scholar] [CrossRef]
Folayan, A.J.; Dosunmu, A.; Oriji, B. Aerobic and anaerobic biodegradation of synthetic drilling fluids in marine deep-water offshore environments: Process variables and empirical investigations. Energy Rep. 2023, 9, 2153–2168. [Google Scholar] [CrossRef]
Pelekis, N.; Tampakis, P.; Vodas, M.; Doulkeridis, C.; Theodoridis, Y. On temporal-constrained sub-trajectory cluster analysis. Data Min. Knowl. Discov. 2017, 31, 1294–1330. [Google Scholar] [CrossRef]
Cho, K.; Merrienboer, B.V.; Gülçehre, Ç.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014. [Google Scholar] [CrossRef]
Sun, Y.; Zhou, Q.; Sun, L.; Sun, L.; Kang, J.; Li, H. CNN-LSTM-AM: A power prediction model for offshore wind turbines. Ocean Eng. 2024, 301, 117598. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018. [Google Scholar] [CrossRef]
Zim, A.H.; Iqbal, A.; Malik, A.; Dong, Z.; Wu, H. TCNFormer: Temporal Convolutional Network Former for Short-Term Wind Speed Forecasting. arXiv 2024, arXiv:2408.15737. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. arXiv 2017. [Google Scholar] [CrossRef]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle Swarm Optimization: A Comprehensive Survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Al-Dabbagh, R.D.; Neri, F.; Idris, N.; Baba, M.S. Algorithmic design issues in adaptive differential evolution schemes: Review and taxonomy. Swarm Evol. Comput. 2018, 43, 284–311. [Google Scholar] [CrossRef]
Yen, J.; Liao, J.; Lee, B.; Randolph, D. A Hybrid Approach to Modeling Metabolic Systems Using a Genetic Algorithm and Simplex Method. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 1998, 28, 173–191. Available online: https://www.academia.edu/45358435/ (accessed on 28 May 2026). [CrossRef]
Cong, B.; Ma, S.; Li, S. Research and Application of WOA-CNN-BiLSTM Based Model for Wind Power Prediction. In Proceedings of the 2024 IEEE 6th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Hangzhou, China, 23–25 October 2024; pp. 831–840. [Google Scholar] [CrossRef]
Jalali, S.M.J.; Ahmadian, S.; Khodayar, M.; Khosravi, A.; Ghasemi, V.; Shafie-Khah, M.; Nahavandi, S.; Catalão, J.P.S. Towards novel deep neuroevolution models: Chaotic levy grasshopper optimization for short-term wind speed forecasting. Eng. Comput. 2022, 38, 1787–1811. [Google Scholar] [CrossRef]
Wang, Y.; Zhao, K.; Hao, Y.; Yao, Y. Short-term wind power prediction using a novel model based on butterfly optimization algorithm-variational mode decomposition-long short-term memory. Appl. Energy 2024, 366, 123313. [Google Scholar] [CrossRef]
Huang, J.; Qin, J.; Song, S. A Novel Wind Power Outlier Detection Method with Support Vector Machine Optimized by Improved Harris Hawk. Energies 2023, 16, 7998. [Google Scholar] [CrossRef]
Zhang, W.; Yan, H.; Xiang, L.; Shao, L. Wind power generation prediction using LSTM model optimized by sparrow search algorithm and firefly algorithm. Energy Inform. 2025, 8, 35. [Google Scholar] [CrossRef]
Al Garni, H.Z.; Sundaram, A.; Zayed, M.E.; Mas’ud, A.A.; Rehman, S.; Baseer, M.A.; Al-Anazi, M.A.; Al-Shammari, S.J. Intelligent approaches for global horizontal irradiance forecasting in Saudi Arabia: Benchmarking deep learning coupled with RMSprop optimization for solar power systems. Results Eng. 2026, 30, 110877. [Google Scholar] [CrossRef]

Figure 1. Feature correlation with power output.

Figure 2. Scatter plot of wind speed versus actual power.

Figure 3. DBSCAN-based wind power data anomaly detection and correction.

Figure 4. TCN model. Blue: input layer; Orange: hidden layers; Green: output layer; Gray: 1×1 convolution.

Figure 5. BiLSTM model. Colored arrows: Forward pass (blue arrows) processes the sequence from left to right; backward pass (red arrows) processes from right to left. Both directions feed into the output layer. Symbols:

X_{t - 1}, X_{t}, X_{t + 1}

are input feature vectors at consecutive time steps;

y_{t - 1}, y_{t}, y_{t + 1}

are the corresponding output predictions.

Figure 5. BiLSTM model. Colored arrows: Forward pass (blue arrows) processes the sequence from left to right; backward pass (red arrows) processes from right to left. Both directions feed into the output layer. Symbols:

X_{t - 1}, X_{t}, X_{t + 1}

are input feature vectors at consecutive time steps;

y_{t - 1}, y_{t}, y_{t + 1}

are the corresponding output predictions.

Figure 6. PRGO framework.

Figure 7. TCN-BiLSTM-PRGO model. Colored arrows: Forward pass (blue arrows) processes the sequence from left to right; backward pass (red arrows) processes from right to left. Both directions feed into the output layer. Symbols:

X_{t - 1}, X_{t}, X_{t + 1}

are input feature vectors at consecutive time steps;

y_{t - 1}, y_{t}, y_{t + 1}

are the corresponding output predictions.

Figure 7. TCN-BiLSTM-PRGO model. Colored arrows: Forward pass (blue arrows) processes the sequence from left to right; backward pass (red arrows) processes from right to left. Both directions feed into the output layer. Symbols:

X_{t - 1}, X_{t}, X_{t + 1}

are input feature vectors at consecutive time steps;

y_{t - 1}, y_{t}, y_{t + 1}

are the corresponding output predictions.

Figure 8. TCN-BiLSTM-PRGO wind power forecasting model.

Figure 9. Historical data.

Figure 10. Effect of eps on noise ratio.

Figure 11. Corresponding noise detection results.

Figure 12. Least squares correction.

Figure 13. PRGO loss.

Figure 14. Comparison of pre- and post-DBSCAN correction. The red line is the fitted line; the orange area is the confidence band; the blue dots represent the actual data points.

Figure 15. 12h wind power predicted (seed = 42).

Figure 16. 24h wind power predicted (seed = 42).

Figure 17. 48h wind power predicted (seed = 42).

Figure 18. Prediction curves on the future test set.

Table 1. Feature ablation results using TCN-BiLSTM-PRGO on the future test set.

Feature Set	R²	NMAE (%)	NRMSE (%)
All (speed + direction + pressure)	0.940	4.250	7.275
Without pressure	0.943	4.006	7.066
Without wind speed	0.929	4.773	7.875
Without wind direction	0.938	4.252	7.333

Table 2. Parameter-setting of PRGO.

Parameter	Symbol	Value/Range
Population	N	5
Max iterations	T	20
Learning rate	Lr	[5 × 10⁻⁵, 3 × 10⁻⁴]
Hidden units	H	{64, 96, 128, 160}
Growth factor	α	0.5
Perturbation factor	β	0.3
Random search prob	P	0.2

Table 3. Performance comparison over 5 random seeds (mean ± std) for the 12-h forecasting horizon.

Model	R²	NMAE (%)	NRMSE (%)	Train Time (s)
TCN	0.932 ± 0.005	6.539 ± 0.003	8.155 ± 0.003	56.134 ± 4.204
LSTM	0.934 ± 0.006	6.424 ± 0.004	8.027 ± 0.004	21.011 ± 1.861
TCN-BiLSTM	0.931 ± 0.004	6.664 ± 0.002	8.232 ± 0.002	66.347 ± 6.780
TCN-Transformer	0.819 ± 0.076	10.582 ± 0.027	13.054 ± 0.030	123.078 ± 5.132
TCN-BiLSTM-WOA	0.934 ± 0.010	6.382 ± 0.005	8.031 ± 0.006	738.407 ± 124.720
TCN-BiLSTM-PRGO	0.942 ± 0.006	6.014 ± 0.003	7.539 ± 0.004	863.388 ± 131.679

Table 4. Performance comparison over five random seeds (mean ± std) for the 24 h forecasting horizon.

Model	R²	NMAE (%)	NRMSE (%)	Train Time (s)
TCN	0.747 ± 0.063	9.262 ± 0.011	14.665 ± 0.017	46.986 ± 3.495
LSTM	0.731 ± 0.045	9.176 ± 0.007	15.153 ± 0.013	22.405 ± 0.498
TCN-BiLSTM	0.761 ± 0.020	9.041 ± 0.002	14.323 ± 0.006	75.259 ± 3.962
TCN-Transformer	0.759 ± 0.044	10.132 ± 0.016	14.355 ± 0.014	135.783 ± 2.361
TCN-BiLSTM-WOA	0.760 ± 0.031	8.736 ± 0.007	14.332 ± 0.010	880.152 ± 97.330
TCN-BiLSTM-PRGO	0.791 ± 0.031	8.248 ± 0.004	13.387 ± 0.009	824.006 ± 98.130

Table 5. Performance comparison over five random seeds (mean ± std) for the 48 h forecasting horizon.

Model	R²	NMAE (%)	NRMSE (%)	Train Time (s)
TCN	0.818 ± 0.032	7.126 ± 0.008	11.777 ± 0.013	50.983 ± 8.549
LSTM	0.786 ± 0.036	7.487 ± 0.006	12.792 ± 0.011	23.041 ± 0.906
TCN-BiLSTM	0.801 ± 0.032	7.561 ± 0.005	12.330 ± 0.010	70.193 ± 1.328
TCN-Transformer	0.794 ± 0.038	8.772 ± 0.015	12.548 ± 0.125	115.692 ± 1.274
TCN-BiLSTM-WOA	0.806 ± 0.035	7.184 ± 0.008	12.187 ± 0.011	753.275 ± 116.009
TCN-BiLSTM-PRGO	0.833 ± 0.096	6.805 ± 0.002	11.312 ± 0.007	838.083 ± 149.760

Table 6. Statistical significance test results for the 48 h forecasting horizon.

Compared 0	t-Test p-Value	Wilcoxon p-Value	Significant (p < 0.05)
TCN	4.73 × 10⁻¹¹	6.27 × 10⁻¹²	Yes
LSTM	2.87 × 10⁻¹²	4.50 × 10⁻⁹	Yes
TCN-BiLSTM	3.24 × 10⁻¹⁶	2.35 × 10⁻¹⁴	Yes
TCN-Transformer	0.025	0.025	Yes
TCN-BiLSTM-WOA	0.042	0.046	Yes

Table 7. Temporal generalization performance on the future test set.

Model	R²	NMAE (%)	NRMSE (%)
TCN	0.926	4.863	8.047
LSTM	0.935	4.463	7.502
TCN-BiLSTM	0.926	4.852	8.035
TCN-Transformer	0.919	5.948	8.395
TCN-BiLSTM-WOA	0.926	4.849	8.024
TCN-BiLSTM-PRGO	0.940	4.224	7.232

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lv, M.; Liu, Z.; Zhang, C.; Yu, J.; Luo, C.; Zhu, Y. A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting. Eng 2026, 7, 272. https://doi.org/10.3390/eng7060272

AMA Style

Lv M, Liu Z, Zhang C, Yu J, Luo C, Zhu Y. A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting. Eng. 2026; 7(6):272. https://doi.org/10.3390/eng7060272

Chicago/Turabian Style

Lv, Muyao, Zejia Liu, Chao Zhang, Jiawei Yu, Chao Luo, and Yihua Zhu. 2026. "A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting" Eng 7, no. 6: 272. https://doi.org/10.3390/eng7060272

APA Style

Lv, M., Liu, Z., Zhang, C., Yu, J., Luo, C., & Zhu, Y. (2026). A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting. Eng, 7(6), 272. https://doi.org/10.3390/eng7060272

Article Menu

A DBSCAN-Based Data Cleaning and TCN-BiLSTM-PRGO Hybrid Model for Wind Power Forecasting

Abstract

1. Introduction

2. An Improved DBSCAN-Based Method for Wind Power Data Anomaly Detection and Correction

2.1. Correlation Analysis

2.2. DBSCAN Noise Detection

2.3. Least-Squares Correction

3. Wind Power Forecasting Model Based on TCN-BiLSTM-PRGO

3.1. TCN Model

3.2. BiLSTM Model

3.3. PRGO-Based Hyperparameter Optimization

3.4. Construction of the TCN-BiLSTM-PRGO Model

3.5. Performance Evaluation Metrics

4. Experimental Results

4.1. Data Description and Parameter Settings

4.2. Feature Sensitivity Analysis

4.3. Experimental Results and Comparison

4.4. Statistical Significance Analysis

4.5. Temporal Generalization Evaluation

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI