Next Article in Journal
Development and Optimization of Green Extraction Process of Greek Mountain Tea (Sideritis scardica)
Next Article in Special Issue
A Typical Scenario Generation Method Based on KDE-Copula for PV Hosting Capacity Analysis in Distribution Networks
Previous Article in Journal
Study on Jet Characteristics of Novel Coherent Tuyeres and Injection of Hydrogen-Rich Gas in Blast Furnace
Previous Article in Special Issue
Optimal Configuration of Transformer–Energy Storage Deeply Integrated System Based on Enhanced Q-Learning with Hybrid Guidance
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Adaptive Early Warning Method for Wind Power Prediction Error

1
Electric Power Research Institute of State Grid Tianjin Electric Power Company, Tianjin 300384, China
2
State Grid Tianjin Electric Power Company, Tianjin 300384, China
3
College of Electrical Engineering, Zhejiang University, Hangzhou 310058, China
4
State Grid Tianjin Electric Power Company Ninghe Branch, Tianjin 301300, China
*
Author to whom correspondence should be addressed.
Processes 2025, 13(12), 3941; https://doi.org/10.3390/pr13123941
Submission received: 4 November 2025 / Revised: 26 November 2025 / Accepted: 3 December 2025 / Published: 5 December 2025
(This article belongs to the Special Issue Applications of Smart Microgrids in Renewable Energy Development)

Abstract

Despite the continuous development of wind power forecasting methods, forecasting errors remain unavoidable, especially during extreme weather events. However, current research on quantifying these errors is quite limited. This paper proposes an adaptive error risk early warning method that can directly predict the magnitude of forecast errors and classify and warn of risks, thereby achieving proactive risk management. This method comprises three core designs. First, mechanism-based feature engineering captures the driving factors of error generation, including numerical weather prediction bias, atmospheric instability, and meteorological dynamics, all of which are key factors leading to forecast bias. Second, a stacked ensemble method integrates quantile regression, random forest, and gradient booster, utilizing complementary learning capabilities to handle high-dimensional non-stationary error patterns. Third, K-means clustering establishes a dynamic risk threshold that adapts to changes in seasonal error distribution, overcoming the limitations of fixed thresholds. Validation using actual wind farm operation data demonstrates significant improvements: the proposed ensemble model reduces the Root Mean Square Error (RMSE) by 2.5% compared to the best single model, and the dynamic threshold mechanism increases the High-Risk Recall rate from 89.7% to 96.9%. These results confirm that the method can effectively warn of high-error events and provide timely and actionable decision support to enhance grid stability and security.

1. Introduction

Wind power, with its increasing penetration, has emerged as a key component in the global transition to renewable energy. However, the inherent variability and uncertainty of wind resources pose significant challenges for power system operation [1,2]. Despite advances in forecasting techniques, wind power prediction remains particularly vulnerable during extreme weather events, when accurate forecasts are most critical for grid stability.
This vulnerability stems from a fundamental limitation. Wind power forecasting relies heavily on Numerical Weather Prediction (NWP) data as input. When atmospheric conditions become unstable, NWP errors can exceed 30%, directly propagating through even sophisticated forecasting models [3,4]. This creates a ceiling effect: no amount of model refinement can compensate for fundamentally flawed input data.
Recent research has focused on improving forecast accuracy through advanced architectures. Long Short-Term Memory networks capture temporal dependencies [5,6]. Transformer-based models leverage attention mechanisms for better pattern recognition [7,8]. Ensemble methods combine multiple predictors to enhance robustness [9,10]. While these approaches achieve incremental improvements under normal conditions, they fail to address the core problem during extreme events.
Beyond model architecture optimization, significant research efforts have also been directed toward post-processing error correction to mitigate NWP-induced deviations. These methods typically analyze historical error sequences to effectively “calibrate” the final output. For instance, decomposition-based strategies, such as those utilizing Variational Mode Decomposition, separate the error signal into multiple frequency components to predict and correct residuals [11]. Similarly, advanced error correction frameworks using secondary neural networks have been developed to capture non-linear error trends [12]. However, these approaches often rely on the assumption of error inertia—that past errors predict future ones. This assumption frequently fails during rapid weather transitions where error patterns shift abruptly, rendering historical residual learning ineffective.
Furthermore, in the domain of risk management, researchers have developed specialized early warning systems for ramp events. Methods such as the swinging door algorithm and its variants have been widely used to detect potential large-scale power fluctuations [13]. Simultaneously, probabilistic forecasting techniques have evolved to provide prediction intervals rather than point estimates, offering a quantification of uncertainty [14]. Despite these advancements, a gap remains in operational usability. Existing warning methods largely rely on fixed thresholds that do not adapt to seasonal error distributions. Conversely, probabilistic methods often output wide confidence intervals that, while statistically rigorous, can be difficult for grid operators to translate into immediate risk decisions during high-pressure dispatch scenarios.
Therefore, this paper proposes a paradigm shift: instead of pursuing marginal accuracy improvements, we focus on predicting forecast errors and then quantifying their magnitudes. This approach recognizes two key insights. First, forecast errors exhibit predictable patterns correlated with specific meteorological conditions and NWP biases. Second, knowing the potential error magnitude enables grid operators to implement differentiated response strategies, transforming passive deviation acceptance into active risk management.
The concept builds on emerging work in uncertainty quantification. Probabilistic forecasting methods provide prediction intervals rather than point estimates. Direct error modeling approaches attempt to predict deviations between forecasts and actual generation. However, existing methods lack an end-to-end framework that maps meteorological inputs directly to actionable risk warnings, particularly under non-stationary extreme conditions.
We propose an adaptive early warning method that addresses this gap through two integrated components. First, a Stacking-based ensemble predictor combines complementary strengths of multiple algorithms to achieve robust error magnitude prediction. Second, a K-means clustering mechanism generates dynamic risk thresholds that adapt to changing error distributions. This approach provides grid operators with timely, actionable warnings before large deviations occur.
The main contributions are threefold:
  • Develop an end-to-end framework that directly maps NWP data to risk warnings, filling a technical gap in operational error management.
  • Design a multi-model ensemble using Stacking integration, combining Quantile Regression, Random Forest, and Gradient Boosting to handle the complexity of error prediction.
  • Introduce data-driven dynamic thresholds through K-means clustering, overcoming limitations of fixed thresholds under varying conditions.
The rest of this paper is organized as follows. Section 2 presents the paradigm shift from forecasting to warning. Section 3 details the methodology. Section 4 analyzes experimental results. Section 5 concludes the paper.

2. From Forecasting to Warning: A Paradigm Shift in Error Management

2.1. The Fundamental Challenge

Wind power forecasting has reached a technical ceiling. Despite advances in deep learning architectures, forecast accuracy degrades catastrophically during extreme weather events—precisely when grid operators need reliable predictions most. This degradation stems from a fundamental bottleneck: the systematic amplification of Numerical Weather Prediction errors under unstable atmospheric conditions.
To understand this limitation, consider the traditional forecasting pipeline: NWP data feeds into sophisticated power prediction models, which generate forecasts for grid dispatch. This serial architecture contains an inherent vulnerability. When atmospheric dynamics become chaotic during extreme events, NWP errors can exceed 30%, directly propagating through even the most advanced neural networks. No amount of model refinement can compensate for fundamentally flawed input data [15,16].
This realization motivates our paradigm shift. Rather than pursuing marginal improvements in generation forecasting—a path with diminishing returns—we propose directly predicting and quantifying the forecast error itself. The rationale builds on a key observation: while power output under extreme weather is highly uncertain, the magnitude of forecast error exhibits predictable patterns driven by measurable atmospheric instabilities and NWP biases.

2.2. Architecting the Solution: A Dual-Stage Framework

The transition from forecasting to warning faces two key technical challenges. First, robustly predicting the magnitude of error is difficult. This is complicated by a high-dimensional and diverse feature space, as well as highly complex, non-linear relationships. Stable and generalizable prediction of this non-stationary error magnitude requires a model that excels at integrating different features. Second, adaptation to dynamic error distributions is needed, as these distributions change significantly across seasons. The practical value of a warning system depends on accurate risk warning. Conventional fixed thresholds are insufficient for this purpose, creating a need for adaptive risk assessment methods.
These challenges shape our framework architecture through a natural progression, as shown in Figure 1:
The framework consists of an Offline Stage and an Online Stage. The Offline Stage focuses on building predictive models and calibrate risk thresholds through the K-means Clustering of historical prediction errors. The Online Stage processes real-time NWP data through the trained ensemble to generate actionable risk signals. The overall architecture and its two stages are detailed in Figure 2, while the specific core technologies used in both stages are elaborated upon in Section 3.
This architectural decoupling provides three key benefits. First, accuracy limitations are overcome by focusing on the more predictable error signal. Second, the method is given a forward-looking view of risk; warnings are generated based on atmospheric precursors before output deviations occur. Third, implementation independence is achieved. The error-warning module runs alongside existing commercial forecasting systems without replacing them, which ensures both compatibility and future scalability.

3. Core Technologies of the Early Warning Method

This section explains the internal mechanisms of the framework’s dual-stage architecture. The description follows the method’s information flow. First, raw data from multiple sources is processed into engineered features (Section 3.1). Then, these features are used by an ensemble learning method to generate robust error predictions (Section 3.2). Finally, the error predictions are converted into adaptive risk thresholds through a clustering technique (Section 3.3).
To provide a clear overview of the mathematical modeling workflow and the interconnections between the core components, Figure 3 shows a comprehensive flowchart. The diagram illustrates the step-by-step process from raw data input to the final risk warning, mapping the specific equations to their corresponding functional stages within the dual-stage framework.

3.1. Feature Engineering: Capturing Error Generation Mechanisms

3.1.1. Problem Formulation

Let Χ t R d represent the raw meteorological inputs at time t and Ε t denote the wind power forecast error. Our objective is to construct a mapping function f :   Χ t Ε ^ t that predicts the error magnitude. The challenge lies in transforming high-dimensional raw data into informative features that capture error generation mechanisms. As discussed in Section 2, forecast errors stem from three primary mechanisms: NWP biases, atmospheric instability [17] and atmospheric transition dynamics [18,19]. Therefore, our feature engineering strategy is mechanism-driven: each feature category is explicitly designed to quantify these error drivers.
We thus propose four complementary feature sets, as shown in Figure 4, each targeting a specific source of forecast error.
The complete feature vector, which combines all components, is given by:
x t = Φ ( 1 ) , ϕ W S ( 2 ) , Φ ( 3 ) , ϕ h ( 4 ) , ϕ m ( 5 ) T ,
This feature vector x t serves as the input to our ensemble prediction framework.

3.1.2. Meteorological Prediction Bias Features

Biases in predicted meteorological variables from NWP inputs directly lead to wind power forecast errors. Thus, quantifying these biases serves as a leading indicator for the error magnitude. For each meteorological variable ν V = { T , P , H , W . . . } , where T denotes temperature, P denotes pressure, H denotes humidity, and W denotes wind speed, we compute:
ϕ ν ( 1 ) = v a c t u a l v N W P ,
where ν a c t u a l is the actual value of the meteorological variable, and ν N W P is the predicted value of the meteorological variable. These features form the bias feature vector Φ ( 1 ) = ϕ T ( 1 ) , ϕ P ( 1 ) , ϕ H ( 1 ) , ϕ W ( 1 ) . . . T .
The absolute error formulation captures deviation magnitude regardless of sign, as both overestimation and underestimation degrade power forecast accuracy.

3.1.3. Atmospheric Stability Features

Atmospheric instability, especially rapid shifts in vertical wind, is a key driver of sudden power fluctuations that NWP models often miss. Wind shear—the vertical gradient of horizontal wind speed—is a reliable proxy for this instability. Strong wind shear signals unstable conditions from events like frontal passages or low-level jets. These events cause non-linear wind changes at turbine height, creating a divergence from NWP forecasts.
Wind shear is computed using multi-altitude wind speed measurements. For heights z1 and z2, the shear index is:
ϕ W S ( 2 ) = W z 2 W z 1 z 2 z 1 ,
where   W z i denotes wind speed at height z i . High shear values indicate unstable vertical stratification.

3.1.4. Dynamic Rate-of-Change Features

Forecast errors in wind power frequently occur during transitions between weather regimes, driven by large-scale weather evolution. In these periods, multiple meteorological variables—including pressure, temperature, wind speed, and direction—change rapidly and in a coordinated manner. Relying only on static meteorological values is inadequate for identifying such unstable transition periods. Instead, the temporal gradients of these key variables must be explicitly calculated. For any given meteorological variable ν , its rate-of-change feature over a time span of Δ t is defined as:
ϕ v ( 3 ) = v t v t Δ t Δ t ,
here, ν t is the observed value at the time t , and ν t t is the observed value at the time t Δ t .
These rate-of-change features form Φ ( 3 ) = ϕ T ( 3 ) , ϕ P ( 3 ) , ϕ H ( 3 ) , ϕ W ( 3 ) . . . T , capturing rapid meteorological changes that signal evolving weather systems.

3.1.5. Temporal Periodicity Features

Forecast errors exhibit periodic patterns. Error statistics vary systematically by time of day due to diurnal heating cycles and by season due to changes in large-scale atmospheric circulation. We encode these periodicities using harmonic functions:
ϕ h ( 4 ) = [ sin ( 2 π h / 24 ) , cos ( 2 π h / 24 ) ] T , ϕ m ( 5 ) = [ sin ( 2 π m / 12 ) , cos ( 2 π m / 12 ) ] T ,
where h [ 0,23 ] and m [ 1,12 ] represent hour and month, respectively. The sine–cosine pairs ensure continuity at period boundaries and enable the model to learn phase-shifted patterns.

3.2. Multi-Model Ensemble for Error Magnitude Prediction

As noted in Section 2, error forecasting during extreme weather faces a key robustness issue: error patterns are highly non-stationary and depend on multiple factors. A single model often fails to capture all of these simultaneously: linear trends in error magnitude, non-linear high-frequency fluctuations, and probabilistic uncertainty bounds. Using only one model may lead to overfitting one pattern while generalizing poorly to others.
To address this limitation, a Stacking ensemble framework is adopted. Unlike simple averaging or voting methods, Stacking uses a meta-learner to learn optimal combinations of multiple base models [20,21]. This integrates information from different models, each specializing in different aspects of the error generation process. By leveraging their complementary strengths, the ensemble improves both robustness and accuracy beyond what any single model can achieve. Accordingly, the following section details the individual base learners and their integration methodology.

3.2.1. Base Learner Construction

Given the feature vector x t from Section 3.1, we construct three base learners with complementary capabilities. Each learner h i produces an error magnitude estimate.
  • Quantile Regression (QR): Uncertainty Quantification
Unlike standard regression, which estimates conditional means, QR models the full conditional distribution by targeting specific quantiles [22]. This approach is particularly valuable in error forecasting for power systems. Grid operators require information on both the most likely error magnitude and the potential for extreme scenarios, such as the 90th percentile error, to ensure adequate reserve capacity is allocated [23,24]. This essential probabilistic information is provided directly by QR.
For the q-th quantile [25], QR solves:
h ^ Q R = arg min h i = 1 N ρ q E i h x i ,
where h ^ Q R is the quantile regression prediction function, h denotes the predictive function to be optimized, N represents the number of samples, ρ q is the pinball loss function, E i is the true value of the i-th sample, and h ( x i ) is the predicted value for the i-th sample. The formula aims to find the optimal h by minimizing the sum of pinball losses across all samples, enabling the estimation of the q-th quantile relationship between variables.
2.
Random Forest (RF): Robustness in High-Dimensional Noise
The feature space constructed in Section 3.1 is high-dimensional and heterogeneous. Collinearity and measurement noise inevitably exist among these features. RF excels in such environments due to its dual randomization strategy: bootstrap sampling of training data and random feature selection at each split [26]. This built-in regularization makes RF intrinsically resistant to overfitting and noise, eliminating the need for extensive preprocessing or normalization.
The RF prediction aggregates B decision trees [27]:
h ^ R F ( x ) = 1 B b = 1 B T b ( x ) ,
where h ^ R F ( x ) is the Random Forest prediction function for input x , B denotes the number of decision trees, and T b ( x ) is the prediction of the b-th decision tree. Averaging predictions across B diverse trees (trained via bootstrap sampling and random feature subsets) reduces variance and improves prediction robustness.
Each tree T b is trained on a bootstrap sample with random feature subsets, ensuring diversity.
3.
Gradient Boosting Machine (GBM): Deep Non-Linear Pattern Mining
Extreme error events—the primary targets of this early warning method—are driven by subtle, complex non-linear interactions among features. For instance, the combined effect of high wind shear, rapid pressure gradients, and NWP temperature bias can trigger much larger errors, which cannot be captured by linear models or shallow trees. GBM is specifically designed to identify these deep non-linear dependencies through iterative residual fitting [28].
GBM builds an additive model through sequential optimization [29]:
h ^ G B M ( m ) ( x ) = h ^ G B M ( m 1 ) ( x ) + γ m T m ( x ) ,
where h ^ G B M ( m ) ( x ) is the GBM prediction function at the m-th iteration, h ^ G B M ( m 1 ) ( x ) is the prediction from the previous iteration, γ m is the learning rate, and T m ( x ) is the m-th decision tree. T m is trained to fit the negative gradient of the loss function with respect to h ^ G B M ( m 1 ) , enabling GBM to sequentially optimize and build an additive model by iteratively correcting errors.

3.2.2. Stacking Integration

The base learner predictions form a new feature space:
z t = h ^ Q R x t , h ^ R F x t , h ^ G B M x t T .
Linear Regression (LR) is selected as the meta-learner, whose task is to learn how to synthesize the outputs of the first-layer base learners. The choice of LR as the meta-learner is based on the following considerations: its simplicity, computational efficiency, and strong interpretability, which effectively avoids the risk of overfitting in the second layer where the feature dimension is low [30].
The meta-learner combines these predictions through linear regression:
  E ^ t = β 0 + β T z t = β 0 + β 1 h ^ Q R + β 2 h ^ R F + β 3 h ^ G B M ,
where β 0 is the intercept term, β = [ β 1 , β 2 , β 3 ] T is the vector of linear weights. The meta-learner combines the predictions of Quantile Regression, Random Forest, and Gradient Boosting Machine through linear regression to learn optimal combination weights.
The optimal weights β * = β 1 , β 2 , β 3 T are obtained by minimizing:
  β = arg min β i = 1 N E i β 0 β T z i 2 .
This yields the final error magnitude prediction Ε ^ t , which serves as input to the risk classification stage.

3.3. Adaptive Dynamic Risk Thresholds

3.3.1. From Prediction to Risk Classification

The predicted error magnitude Ε ^ t from Section 3.2 must be mapped to discrete risk levels R { L o w ,   M e d i u m ,   H i g h } for operational decision-making. Traditional approaches rely on fixed thresholds, defined by the piecewise function:
  R = l o w m e d i u m h i g h   ,   i f   E ^ t < T 1 ,                   ,   i f   T 1 E ^ t < T 2 ,   i f   E ^ t T 2 , ,
here, T 1 and T 2 are static thresholds. A critical shortcoming is that they fail to adapt to seasonal variations in error distributions.

3.3.2. Adaptive Threshold via K-Means Clustering

To address the limitations of fixed thresholds—especially their failure to adapt to temporal changes in error distributions—this subsection proposes an adaptive method. This method dynamically derives risk boundaries from the statistical features of predicted errors.
The core of this method lies in the adoption of the K-Means clustering algorithm (with K = 3), chosen primarily based on three pragmatic considerations:
First, as an unsupervised learning method, it does not require labeled risk data and only needs the input of error magnitudes to conduct analysis;
Second, it features low computational cost, which supports periodic re-calibration during operational deployment;
Finally, it inherently identifies density-based groupings, a characteristic that aligns well with the error generation process: normal operations produce a dense cluster of small errors, while extreme events form a sparse cluster of large errors.
Specifically, the method establishes a historical error database by collecting the predicted errors { E ^ 1 , E ^ 2 , , E ^ M } generated during the validation period. These historical errors represent the expected error distribution of the model and serve as the input for the clustering algorithm. These errors are partitioned into three clusters using K-means clustering (K = 3), corresponding to Low-, Medium-, and High-risk levels, respectively.
The clustering aims to minimize the within-cluster sum of squared errors [31], expressed formally as:
J = k = 1 3 E ^ i C k E ^ i μ k 2 ,
here, C k is the k-th cluster, and μ k is its centroid (the mean of all errors in the cluster).
This ensures errors in each cluster are highly similar while clusters are significantly different. Thus, the boundaries between the three clusters form dynamic risk thresholds. These thresholds can adapt to changes in the underlying distribution of predicted errors over time, effectively addressing the rigidity of fixed thresholds.

3.3.3. Threshold Calculation from Clusters

After the algorithm converges on the historical data, three cluster centers are generated and ordered as μ L o w < μ M e d i u m < μ H i g h , where μ L o w , μ M e d i u m , and μ H i g h represent the cluster centers corresponding to low-risk, medium-risk, and high-risk clusters, respectively.
The risk thresholds are computed as the midpoints between adjacent clusters:
T 3 = μ Low + μ Medium 2 , T 4 = μ Medium + μ High 2 .
Once calculated, these values T 3 and T 4 are fixed and deployed as the operational boundaries for the online warning system. T 3 functions as the threshold demarcating low risk from medium risk, while T 4 acts as the threshold distinguishing medium risk from high risk. This ensures that the risk classification criteria are objectively derived from the statistical patterns of past model performance rather than subjective estimation.
The final risk classification combines the ensemble prediction with dynamic thresholds:
R = l o w m e d i u m h i g h   , i f   E ^ t < T 3 ,                     , i f   T 3 E ^ t < T 4 , , i f   E ^ t T 4 .
This formulation explicitly shows the complete pipeline: raw data X t is transformed to features x t , which feed into base learners producing z t ; z t is then combined by the meta-learner to yield E ^ t , and finally E ^ t is classified into risk level R using adaptive thresholds.

3.4. Performance Metrics

To fully evaluate the performance of the proposed method, evaluation metric systems are established for its two core tasks: “error magnitude prediction” (a regression task) and “risk level warning” (a classification task). The specific definitions and explanations are provided below.

3.4.1. Regression Metrics for Error Magnitude

This task focuses on the accuracy of the predicted error magnitude, which is crucial for determining the necessary reserve capacity. Two standard yet complementary metrics are employed—mean absolute error (MAE) and root mean square error (RMSE) [32]:
M A E = 1 M i = 1 M E i E ^ i , R M S E = 1 M i = 1 M E i E ^ i 2 ,
where M is the test set size. MAE measures average performance while RMSE emphasizes large errors critical for extreme event detection.

3.4.2. Classification Metrics for Risk Warning

The warning task is a multi-class classification problem. Metrics derived from the confusion matrix are used to quantify the operational reliability of the method [33], which specifically focus on the High-Risk category, as misclassification here leads to the most severe consequences. To quantify this performance, three key metrics are defined:
1.
High-Risk Recall ( R e c a l l H i g h )
This metric assesses the proportion of actual high-risk events that are correctly identified by the system [34]. It is formulated as:
R e c a l l H i g h = i : R i = H i g h R ^ i = H i g h i : R i = H i g h ,
where R i is the true risk label of the i-th event, and R ^ i is the predicted risk label.
2.
High-Risk Precision ( P r e c i s i o n H i g h )
This metric reflects the proportion of predicted high-risk events that are actually high-risk [34]. Its formula is:
P r e c i s i o n H i g h = i : R i = H i g h R ^ i = H i g h i : R ^ i = H i g h ,
3.
High-Risk F1 Score ( F 1 H i g h )
As a harmonic mean of precision and recall, this metric balances the trade-off between the two [34]. It is defined as:
F 1 H i g h = 2 P r e c i s i o n H i g h R e c a l l H i g h P r e c i s i o n H i g h + R e c a l l H i g h .
These metrics quantify the method’s ability to identify critical events while managing false alarm rates, directly measuring operational value for grid dispatch.

4. Results and Analysis

4.1. Dataset and Experimental Setup

4.1.1. Dataset Description

The case study for this research is a 100 MW wind farm located in the mountainous terrain of Tianjin, China. This region is prone to cold waves and strong convective weather. The study utilizes a two-year dataset of continuous operational records from January 2023 to December 2024, with a 15 min temporal resolution. The data comprises four main categories:
  • Historical actual power output data from the wind farm (SCADA): The 15 min average net output power recorded by the wind farm’s monitoring system.
  • Historical predicted power output data from the wind farm: Day-ahead predicted power generated by the commercial forecasting system used by the wind farm operator.
  • Historical measured meteorological data from the wind farm: Including wind speed and direction at multiple altitudes, as well as temperature, atmospheric pressure, and humidity.
  • NWP historical weather forecast data: Containing weather type, wind speed and direction at multiple altitudes, as well as temperature and atmospheric pressure.

4.1.2. Experimental Configuration

To ensure reproducibility and evaluate model robustness, the following specific configurations were adopted for the models used in this study.
1.
Quantile Regression
  • Target Quantiles : { 0.5,0.1,0.9 } ;
  • Regularization Type: L2 Regularization;
  • Regularization Parameter: 0.01.
2.
Random Forest
  • Number of Decision Trees: 200;
  • Maximum Tree Depth: 15;
  • Feature Sampling Ratio: 0.7.
3.
Gradient Boosting Machine
 
The Gradient Boosting Machine was tuned with the following hyperparameters to optimize its sequential learning process:
  • Learning Rate: 0.05;
  • Number of Estimators: 150.
4.
Stacking Ensemble
 
The Stacking Ensemble model utilized a two-layer structure.
 
The first layer consists of the QR, RF, and GBM models (details above).
5.
K-Means Clustering
  • Number of Clusters: 3;
  • Initial Centroid Selection: K-means++ algorithm;
  • Convergence Threshold: 1 × 1 0 4 .

4.1.3. Validation Strategy

A rigorous validation workflow was designed to simulate real-world operational scenarios and avoid data leakage. The workflow included two key components:
  • Time-Series Stratified Data Splitting
  • Unlike random data splitting, the dataset was divided chronologically to mimic practical forecasting cycles:
  • Training Set (January 2023–February 2024, 14 months): Used for initial model fitting and feature importance analysis.
  • Validation Set (March 2024–July 2024, 5 months): Exclusive for hyperparameter tuning and K-means cluster center calibration. No overlap with the training set was allowed to prevent overfitting to specific time periods.
  • Test Set (August 2024–December 2024, 5 months): A fully independent dataset used for final performance evaluation. This ensured the results reflect the method’s generalization ability to unseen future data.
2.
Time-Series 5-Fold Cross-Validation
To further optimize model parameters and mitigate overfitting, time-series 5-fold cross-validation was applied to the training set. Each fold maintained temporal continuity, avoiding the “look-ahead bias” inherent in traditional random cross-validation. For each fold, the model was trained on the earlier subset and validated on the subsequent subset, with average performance across folds guiding parameter adjustment.
All data preprocessing, model training, and forecasting simulations in this study were implemented using Python 3.11.

4.2. Experimental Results

4.2.1. Performance of the Error Prediction Ensemble Model

To evaluate the ensemble model’s capability in handling complex error patterns, its predictive performance was first analyzed under typical high-error scenarios. Figure 5 shows a case study of four days with significant weather anomalies. The model’s predicted errors (red line) show a close alignment with the actual errors (blue line) in their overall trend. This visual observation is supported by quantitative metrics. The average Pearson correlation coefficient is 0.968, and the average distance metric is 0.15. These values indicate high predictability, respectively, reflecting strong linear correlation and morphological similarity between the predicted and actual curves.
Furthermore, the predicted curve is smoother than the actual curve. This smoothness demonstrates that the Stacking-based ensemble model effectively filters random noise and captures the core, predictable error trend. This leads to robust performance against heterogeneous features. The model successfully identifies the patterns of error fluctuations and provides timely warnings for all critical error peak periods (e.g., 03:00–06:00 on 18 November 2024, and 15:00–18:00 on 5 December 2024).
The primary objective of the early warning method is the accurate classification of risk levels. Figure 6 shows the method’s confusion matrix on the test set. The results indicate that most low-risk and high-risk events were correctly classified. The method correctly identified 85.6% of low-risk samples, demonstrating its reliability in recognizing safe operational conditions. For the critical high-risk category, a recall rate of over 95% was achieved, confirming the method’s capability to detect the vast majority of actual high-risk events.
Notably, the identification accuracy for medium-risk samples was 72.8%. Among the misclassified medium-risk samples, 25.0% were predicted as high-risk. This pattern confirms the model’s safety-conservative characteristic. When prediction uncertainty exists, the model is inclined to escalate the risk level, thereby ensuring that stronger warnings are triggered as a precautionary measure.
To thoroughly validate the effectiveness of each component in the proposed framework, a systematic ablation study was conducted. The study evaluated four configurations: (1) QR alone, (2) RF alone, (3) GBM alone and (4) the proposed Stacking ensemble with meta-learner optimization.
Table 1 shows the ablation study results. The Stacking ensemble achieved the best performance across all metrics, with MAE = 2.801 MW and RMSE = 4.477 MW. This improvement confirms that the Stacking framework effectively leverages the complementary strengths of the base learners. Among individual models, GBM showed the strongest performance (MAE = 2.864 MW), confirming its capability in capturing complex non-linear patterns, while QR exhibited the highest error (MAE = 3.066 MW) due to its linear assumption.
Furthermore, using four metrics calculated from the confusion matrix, this study evaluated the performance of the ensemble model against three base classifiers—QR, RF, and GBM—on an independent test set, as illustrated in Figure 7. The experimental results demonstrate that the ensemble model outperformed all three individual classifiers across all evaluation metrics.

4.2.2. Effectiveness Validation of the Dynamic Threshold Mechanism

K-means clustering identified three distinct clusters in the validation set errors, corresponding to low, medium, and high error levels. Figure 8 shows the probability density distribution of the errors and the dynamic thresholds derived from the cluster centers. The thresholds are not located at random points within the distribution but are positioned to separate the different density regions of the data. This placement demonstrates the data-driven nature of the method.
Note that the curve originates from negative values on the x-axis. This is a common visual artifact resulting from the smoothing algorithm used for chart generation and does not indicate the presence of negative absolute errors in the data.
To quantify the advantages of dynamic thresholds, Table 2 compares their performance with a fixed-threshold baseline in the risk level classification task. The results clearly show that the dynamic threshold method outperforms the fixed-threshold method across all key metrics. A particularly significant improvement is observed in the Recall for high-risk events, which increases from 0.897 to 0.969. This notable increase provides direct evidence that the K-means dynamic threshold mechanism successfully addresses the core challenge. It indicates that fixed thresholds tend to miss truly dangerous high-error events, whereas data-driven dynamic thresholds can more sensitively detect these critical minority samples by aligning classification boundaries with the current statistical distribution of errors.

4.2.3. Performance Analysis Under a Typical Extreme Weather Event

To evaluate the method’s performance under real-world conditions, a typical winter cold wave event with rapid wind power ramp-up was selected from the test set for detailed analysis. As shown in Figure 9, the actual power error on that day (upper subplot) exhibited strong fluctuations, peaking at over 30 MW—significantly higher than the model’s average error under normal conditions. This scenario provided a suitable test case for evaluating the model’s warning capability.
In this event, the warning method successfully captured danger signals and presented an intuitive risk display, as seen in the lower subplot of Figure 9. The color scheme indicates risk levels: blue for low, yellow for medium, and red for high. The method generated time-series warnings corresponding to the error fluctuations. A comparison between the upper and lower subplots reveals that the macro-level trend of the predicted error closely aligns with the actual error, successfully capturing its variations.
A key observation occurred after 21:00, when the actual error began to rise sharply. The lower subplot shows that the model’s predicted error also increased rapidly, either simultaneously or slightly ahead of the actual rise, and correctly flagged the event with medium- or high-risk warnings. This result supports the effectiveness of the proposed risk warning mechanism. The model not only anticipated the sharp error increase but also provided progressive alerts through its multi-level thresholds, offering valuable response time for system operators.
It is also noteworthy that during the evening high-risk period, the model’s predicted error peak was slightly higher than the actual error peak. Such moderate over-prediction is often acceptable—and sometimes preferable—in risk management. From a safety standpoint, a method that tends to estimate risk conservatively helps avoid significant losses caused by overly optimistic predictions, thereby enhancing the model’s practical utility.

5. Conclusions

A two-year empirical study using operational data from a Chinese wind farm validates the proposed adaptive error early warning framework. The study confirms that the framework effectively overcomes key limitations of traditional forecasting methods.
First, the feasibility of the end-to-end framework, which integrates specialized error prediction with dynamic risk quantification, is demonstrated. The Stacking-based multi-model ensemble effectively predicts complex, non-linear error magnitudes, achieving optimal performance metrics and resolving the challenge of robust prediction.
Second, the method shows significant superiority over single-model and fixed-threshold approaches, particularly in risk perception. The K-Means dynamic threshold mechanism enhances adaptability, which leads to a substantial increase in the Recall rate for high-risk events. This confirms the framework’s crucial ability to detect critical minority-class, high-error events.
Finally, analysis of typical extreme weather events verifies the practical value of the method. Its safety-conservative prediction strategy and timely alerts provide valuable response time and decision-making support for grid operators. This result underscores the framework’s high reliability and significant utility for mitigating risks in complex, real-world environments.
Despite the promising results achieved in this study, several limitations should be acknowledged.
First, regarding data requirements, the proposed method necessitates at least 12 months of historical data to ensure effective model training, which may restrict its applicability to newly commissioned wind farms that lack sufficient historical operational data.
Second, the method remains dependent on the quality of Numerical Weather Prediction data; although efforts have been made to enhance robustness, catastrophic NWP failures cannot be fully mitigated, which may introduce uncertainties to the final results under extreme weather prediction scenarios.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; software, C.W.; validation, F.H. and Z.H.; formal analysis, F.H.; investigation, M.C.; resources, C.H.; data curation, M.C.; writing—original draft preparation, L.Z.; writing—review and editing, L.Y.; visualization, C.W.; supervision, Z.H.; project administration, C.H. and L.Y.; funding acquisition, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Project of State Grid Tianjin Electric Power Company (No. R&D 2024-01, Project Name: “Research on High Risk Warning and Probability Prediction Technology for New Energy Power Prediction Error under Extreme Meteorological Conditions”).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Li Zhang and Chun He were employed by the Electric Power Research Institute of State Grid Tianjin Electric Power Company. Authors Facai He and Zhigang Huang were employed by the State Grid Tianjin Electric Power Company. Author Chao Wang was employed by the State Grid Tianjin Electric Power Company Ninghe Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
NWPNumerical Weather Prediction
QRQuantile Regression
RFRandom Forest
GBMGradient Boosting Machine
LRLinear Regression
MAEMean Absolute Error
RMSERoot Mean Square Error
SCADASupervisory Control and Data Acquisition

References

  1. Ahmed, S.D.; Al-Ismail, F.S.M.; Shafiullah, M.; Al-Sulaiman, F.A.; El-Amin, I.M. Grid Integration Challenges of Wind Energy: A Review. IEEE Access 2020, 8, 10857–10878. [Google Scholar] [CrossRef]
  2. Min, C.G. Analyzing the Impact of Variability and Uncertainty on Power System Flexibility. Appl. Sci. 2019, 9, 561. [Google Scholar] [CrossRef]
  3. Liu, Z.; Guo, H.; Zhang, Y.; Zuo, Z. A Comprehensive Review of Wind Power Prediction Based on Machine Learning: Models, Applications, and Challenges. Energies 2025, 18, 350. [Google Scholar] [CrossRef]
  4. Yang, M.; Jiang, Y.; Che, J.; Han, Z.; Lv, Q. Short-Term Forecasting of Wind Power Based on Error Traceability and Numerical Weather Prediction Wind Speed Correction. Electronics 2024, 13, 1559. [Google Scholar] [CrossRef]
  5. Li, J.; Geng, D.; Zhang, P.; Meng, X.; Liang, Z.; Fan, G. Ultra-Short Term Wind Power Forecasting Based on LSTM Neural Network. In Proceedings of the 2019 IEEE 3rd International Electrical and Energy Conference (CIEEC), Beijing, China, 7–9 September 2019; pp. 1815–1818. [Google Scholar]
  6. Xin, P.; Wang, H.; Wang, H. Short-Term Wind Power Forecasting Based on VMD-QPSO-LSTM. In Proceedings of the 2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 19–21 January 2024; pp. 474–478. [Google Scholar]
  7. Kisvari, A.; Lin, Z.; Liu, X. Wind Power Forecasting—A Data-Driven Method Along with Gated Recurrent Neural Network. Renew. Energy 2021, 163, 1895–1909. [Google Scholar] [CrossRef]
  8. Li, L.; Gao, G.; Wu, W.; Wei, Y.; Lu, S.; Liang, J. Short-term Day-ahead Wind Power Prediction Considering Feature Recombination and Improved Transformer. Power Syst. Technol. 2024, 48, 1466–1476. [Google Scholar]
  9. Yu, G.; Shen, L.; Dong, Q.; Cui, G.; Wang, S.; Xin, D.; Chen, X.; Lu, W. Ultra-Short-Term Wind Power Forecasting Techniques: Comparative Analysis and Future Trends. Front. Energy Res. 2024, 11, 1345004. [Google Scholar] [CrossRef]
  10. Liang, Z.; Chai, R.; Sun, Y.; Jiang, Y.; Sun, D. Cold Wave Recognition and Wind Power Forecasting Technology Considering Sample Scarcity and Meteorological Periodicity Characteristics. Appl. Sci. 2025, 15, 4312. [Google Scholar] [CrossRef]
  11. Li, J.; Zhang, S.; Yang, Z. A Wind Power Forecasting Method Based on Optimized Decomposition Prediction and Error Correction. Electr. Power Syst. Res. 2022, 208, 107886. [Google Scholar] [CrossRef]
  12. Ding, M.; Yang, R.; Zhang, C.; Wang, B. Short-Term Wind Speed Forecasting Using Recurrent Neural Networks with Error Correction. Energy 2020, 217, 119397. [Google Scholar]
  13. Cui, M.; Zhang, J.; Florita, A.R.; Hodge, B.M.; Ke, D.; Sun, Y. An Optimized Swinging Door Algorithm for Identifying Wind Ramping Events. IEEE Trans. Sustain. Energy 2016, 7, 150–162. [Google Scholar] [CrossRef]
  14. Pinson, P.; Kariniotakis, G. Conditional Prediction Intervals of Wind Power Generation. IEEE Trans. Power Syst. 2010, 25, 1845–1856. [Google Scholar] [CrossRef]
  15. Huang, C.; Wu, Y.; Tsai, C.; Hong, J.; Thang, P. Enhancing Wind Power Forecasts via Bias Correction Technologies for Numerical Weather Prediction Model. In Proceedings of the 2024 IEEE/IAS 60th Industrial and Commercial Power Systems Technical Conference (I&CPS), Las Vegas, NV, USA; 2024; pp. 1–6. [Google Scholar]
  16. Zhang, X.; Wang, F. Wind Power Prediction Based on Improved Dung Beetle Optimization Algorithm and Fusion Attention Mechanism. Guangdong Electr. Power 2025, 38, 32–40. [Google Scholar]
  17. Wu, B.; Yu, S.; Peng, L.; Wang, L. Interpretable Wind Speed Forecasting with Meteorological Feature Exploring and Two-Stage Decomposition. Energy 2024, 294, 130782. [Google Scholar] [CrossRef]
  18. Li, H. Analysis of Spatio-Temporal Variations of Wind Shear Index in China. Master’s Thesis, Lanzhou University, Lanzhou, China, 2016. [Google Scholar]
  19. Wei, C.C. Wind Features Extracted from Weather Simulations for Wind-Wave Prediction Using High-Resolution Neural Networks. J. Mar. Sci. Eng. 2021, 9, 1257. [Google Scholar] [CrossRef]
  20. Takara, L.d.A.; Teixeira, A.C.; Yazdanpanah, H.; Mariani, V.C.; Coelho, L.d.S. Optimizing multi-step wind power forecasting: Integrating advanced deep neural networks with stacking-based probabilistic learning. Appl. Energy 2024, 369, 123487. [Google Scholar] [CrossRef]
  21. Bachir Belmehdi, C.B.; Khiat, A.; Keskes, N. Predicting an Optimal Virtual Data Model for Uniform Access to Large Heterogeneous Data. Data Intell. 2024, 6, 504–530. [Google Scholar] [CrossRef]
  22. Zhu, J.; He, Y.; Yang, X.; Yang, S. Ultra-Short-Term Wind Power Probabilistic Forecasting Based on an Evolutionary Non-Crossing Multi-Output Quantile Regression Deep Neural Network. Energy Convers. Manag. 2024, 301, 118062. [Google Scholar] [CrossRef]
  23. Meng, Y.; Fan, S.; Shen, Y.; Xiao, J.C.; He, G.Y.; Li, Z.Y. Transmission and distribution network-constrained large-scale demand response based on locational customer directrix load for accommodating renewable energy. Appl. Energy 2023, 350, 121681. [Google Scholar] [CrossRef]
  24. Zhang, Y.; Meng, Y.; Fan, S.; Xiao, J.C.; Li, L.; He, G.Y. Multi-time scale customer directrix load-based demand response under renewable energy and customer uncertainties. Appl. Energy 2025, 383, 125334. [Google Scholar] [CrossRef]
  25. Bassett, G., Jr.; Koenker, R. Regression Quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  26. Olcay, K.; Tunca, S.G.; Özgür, M.A. Forecasting and Performance Analysis of Energy Production in Solar Power Plants Using Long Short-Term Memory (LSTM) and Random Forest Models. IEEE Access 2024, 12, 103299–103312. [Google Scholar] [CrossRef]
  27. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  28. Pang, C.; Shang, X.; Zhang, B.; Yu, J. Short-term Wind Power Probability Prediction Based on Improved Gradient Boosting Machine Algorithm. Autom. Electr. Power Syst. 2022, 46, 198–206. [Google Scholar] [CrossRef]
  29. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  30. Yu, H.; Li, J.; Wang, H.; Li, S.; Bian, J. Abnormal State Detection Method Based on Dynamic Clustering of Wind Turbines. Electr. Power Autom. Equip. 2025, 45, 64–70. [Google Scholar] [CrossRef]
  31. MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1967; pp. 281–297. [Google Scholar]
  32. Chai, T.; Draxler, R.R. Root Mean Square Error (RMSE) or Mean Absolute Error (MAE)? -Arguments against Avoiding RMSE in the Literature. Geosci. Model Dev. 2014, 7, 1247–1250. [Google Scholar] [CrossRef]
  33. Tang, M.; Zhao, Q.; Ding, S.X.; Wu, H.; Li, L.; Long, W.; Huang, B. An Improved LightGBM Algorithm for Online Fault Detection of Wind Turbine Gearboxes. Energies 2020, 13, 807. [Google Scholar] [CrossRef]
  34. Sokolova, M.; Lapalme, G. A Systematic Analysis of Performance mEasures for Classification Tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Figure 1. Framework Architecture.
Figure 1. Framework Architecture.
Processes 13 03941 g001
Figure 2. A Dual-Stage Framework.
Figure 2. A Dual-Stage Framework.
Processes 13 03941 g002
Figure 3. Flowchart of the mathematical modeling and early warning process.
Figure 3. Flowchart of the mathematical modeling and early warning process.
Processes 13 03941 g003
Figure 4. Feature Sets.
Figure 4. Feature Sets.
Processes 13 03941 g004
Figure 5. Analysis of Model Performance on Four Extreme Days.
Figure 5. Analysis of Model Performance on Four Extreme Days.
Processes 13 03941 g005
Figure 6. Confusion Matrix of Risk Level Classification.
Figure 6. Confusion Matrix of Risk Level Classification.
Processes 13 03941 g006
Figure 7. Model Performance Comparison.
Figure 7. Model Performance Comparison.
Processes 13 03941 g007
Figure 8. Prediction Error Distribution and Dynamic Thresholds Determined by K-Means Clustering.
Figure 8. Prediction Error Distribution and Dynamic Thresholds Determined by K-Means Clustering.
Processes 13 03941 g008
Figure 9. Time-series Plot of Risk Warnings under a Typical Extreme Weather Event.
Figure 9. Time-series Plot of Risk Warnings under a Typical Extreme Weather Event.
Processes 13 03941 g009
Table 1. Performance Comparison of Error Magnitude Prediction.
Table 1. Performance Comparison of Error Magnitude Prediction.
ModelMAE (MW)RMSE (MW)
Quantile Regression3.0665.101
Random Forest2.8524.597
Gradient Boosting Machine2.8644.593
Ensemble model2.8014.477
Table 2. Compared to Fixed Thresholds.
Table 2. Compared to Fixed Thresholds.
Threshold MethodOverall AccuracyHigh-Risk Recall
Dynamic Threshold0.8380.969
Fixed Threshold0.8020.897
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, L.; He, F.; Chen, M.; He, C.; Huang, Z.; Wang, C.; Yan, L. An Adaptive Early Warning Method for Wind Power Prediction Error. Processes 2025, 13, 3941. https://doi.org/10.3390/pr13123941

AMA Style

Zhang L, He F, Chen M, He C, Huang Z, Wang C, Yan L. An Adaptive Early Warning Method for Wind Power Prediction Error. Processes. 2025; 13(12):3941. https://doi.org/10.3390/pr13123941

Chicago/Turabian Style

Zhang, Li, Facai He, Mouyuan Chen, Chun He, Zhigang Huang, Chao Wang, and Lei Yan. 2025. "An Adaptive Early Warning Method for Wind Power Prediction Error" Processes 13, no. 12: 3941. https://doi.org/10.3390/pr13123941

APA Style

Zhang, L., He, F., Chen, M., He, C., Huang, Z., Wang, C., & Yan, L. (2025). An Adaptive Early Warning Method for Wind Power Prediction Error. Processes, 13(12), 3941. https://doi.org/10.3390/pr13123941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop