Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation

Zeng, Ming; Jia, Qianqian; Wen, Zhenming; Mao, Fang; Huang, Haotao; Pan, Jingyuan

doi:10.3390/fi18010007

Open AccessArticle

Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation

by

Ming Zeng

¹,

Qianqian Jia

^1,*,

Zhenming Wen

²,

Fang Mao

¹,

Haotao Huang

¹ and

Jingyuan Pan

¹

Guangzhou Power Supply Bureau, Guangdong Power Grid Co., Ltd., Guangzhou 510520, China

²

School of Mechatronic Engineering, Guangdong Polytechnic Normal University, Guangzhou 510450, China

^*

Author to whom correspondence should be addressed.

Future Internet 2026, 18(1), 7; https://doi.org/10.3390/fi18010007

Submission received: 15 November 2025 / Revised: 17 December 2025 / Accepted: 17 December 2025 / Published: 22 December 2025

Download

Browse Figures

Versions Notes

Abstract

Accurate short-term wind speed forecasting is essential for mitigating wind power variability and supporting stable grid operation. This study proposes a model selection forecasting system (MSFS) that dynamically integrates six deep learning models to enhance predictive accuracy and robustness. Using multi-turbine data from a wind farm in northwest China, the framework identifies the optimal model at each time step through iterative evaluation and retrains the selected models to further improve performance. The Kruskal–Wallis test shows that all forecasting models, including MSFS, maintain statistical consistency with the real wind speed distribution at the 95% confidence level. Uncertainty analysis demonstrates that MSFS more reliable forecasting interval. By coupling MSFS-derived wind speed forecasts with turbine-specific power curves, the system enables reliable theoretical power estimation, offering critical reference information for dispatch planning, reserve allocation, and distinguishing resource-driven variability from turbine performance deviations. The slightly conservative yet highly stable forecasting behavior of MSFS reduces overestimation risks and enhances decision reliability. Overall, the proposed MSFS framework provides a robust, interpretable, and operationally valuable solution for short-term wind energy forecasting, with strong potential for wind farm operation and power system management.

Keywords:

model selection forecasting system; wind speed forecasting; theoretical power generation; uncertainty analysis

1. Introduction

By the end of 2024, global wind power capacity had continued to expand, reaching approximately 1136 GW, thereby consolidating wind energy as a cornerstone in the transformation of the global energy structure (Figure 1 illustrates the overall trajectory of global wind power development). Within this process, the top ten wind power countries made substantial contributions, with China standing out by achieving 520.6 GW of cumulative installed capacity and 79.8 GW of annual additions, accounting for 45.8% and nearly 68% of the global totals, respectively. This not only reinforced China’s leading position but also underscored its pivotal role in shaping the global wind power landscape. Importantly, China’s dominance is not confined to sheer scale; it is equally reflected in the widespread deployment of onshore projects, the rapid rise of offshore wind, and the sustained export of technology and manufacturing capabilities. Meanwhile, countries such as the United States, Germany, India, Brazil, the United Kingdom, Spain, and France continue to play critical roles within their respective regions, collectively driving the multipolar evolution of global wind power [1].

Nevertheless, the expansion of installed capacity does not automatically translate into the efficient utilization of wind resources. The decisive challenge lies in ensuring the reliable and stable integration of wind power into electricity grids, which remains an unresolved issue [2]. The inherent variability and intermittency of wind speed introduce significant uncertainty into grid integration, thereby constraining both the effective absorption of wind energy and the flexibility of system scheduling. As a result, wind speed forecasting has emerged as a central focus of research, aiming to enhance forecasting accuracy as a means to mitigate the destabilizing effects of wind fluctuations on grid operation, and ultimately to improve the controllability and security of energy systems. However, the nonlinear and complex nature of wind speed data continues to pose formidable challenges, limiting the capacity of existing methods to fully capture its dynamic patterns and constituting a fundamental bottleneck to methodological innovation and theoretical advancement in this field.

To address these challenges, a wide variety of wind power forecasting approaches have been developed, which can generally be categorized into four main categories: (1) Statistical models, such as autoregressive integrated moving average (ARIMA), Kalman filters, and regression approaches, which effectively capture temporal dependencies but remain inadequate for nonlinear and non-stationary characteristics [3]; (2) Numerical weather forecasting (NWP) models, which simulate atmospheric dynamics and thermodynamics through physical equations, thereby offering physically consistent large-scale meteorological inputs, yet with high computational costs and limited adaptability to local contexts [4]; (3) Spatio-temporal models, including graph neural networks (GNNs), spatio-temporal convolutions, and Transformers, which capture interdependencies across turbines and sites, significantly improving predictive accuracy, though requiring high-quality data and complex modeling [5]; (4) Artificial intelligence (AI)-based models, covering deep learning, ensemble learning, and hybrid optimization approaches, which provide powerful nonlinear modeling and feature extraction capabilities but are often criticized for limited interpretability, lack of physical consistency, and sensitivity to data drift [6].

Although different forecasting approaches each offer unique strengths, no single method can simultaneously satisfy the requirements of accuracy, physical plausibility, interpretability, and real-time applicability. Statistical models are transparent but less accurate; NWP models are physically consistent but computationally expensive; spatio-temporal models improve performance but suffer from limited transferability; and AI models provide strong nonlinear learning but face interpretability and credibility challenges. As a result, research has increasingly shifted toward multi-model fusion. By combining the interpretability of statistical models, the physical consistency of NWP, the dependency-modeling capability of spatio-temporal methods, and the nonlinear representational power of AI, hybrid forecasting frameworks have emerged as a more robust and practical solution for large-scale wind integration and low-carbon energy development [7].

Recent studies further highlight the growing importance of hybrid statistical–AI methodologies. Wang et al. [8] reviewed uncertainty-modeling strategies and interval metrics, outlining key challenges and future directions. Building on this, innovative frameworks have been proposed to address the nonlinear and uncertain nature of wind data. For example, Zhao et al. [9] introduced an interpretable contrastive learning model (ICoTF), while Li et al. [10] developed a deep fuzzy inference system to manage asymmetric and heavy-tailed errors. Chen et al. [11] combined Bayesian wavelet denoising with deep Gaussian processes, significantly improving accuracy across multiple wind farms. He et al. [12] addressed concept drift via an online probabilistic LSTM framework, and Gao et al. [13] embedded physical power-distribution constraints into LSTM training.

Advances have also been made in privacy-preserving learning and real-time adaptation [14]. Wang and Zhou [15] introduced a federated learning scheme for multi-site forecasting, whereas Nayak et al. [16] proposed an online ensemble regression model for continuous data streams. Additional contributions include Bayesian DLMs for hydrogen-related forecasting [17] and statistical downscaling frameworks for complex-terrain wind modeling [18].

Collectively, these works show that no single technique can ensure accuracy, interpretability, robustness, and physical consistency simultaneously. Instead, hybrid and ensemble methodologies—integrating statistical modeling, physics-based constraints, spatio-temporal learning, and AI—have become the dominant direction for reliable renewable-energy forecasting.

Spatio-temporal (ST) models, in particular, have emerged as an important complement to these methods. By jointly capturing temporal dynamics and spatial dependencies, ST models provide a more comprehensive representation of renewable-energy variability. Zhao et al. [19] developed a multi-site Transformer framework with gated fusion and frequency-enhancement modules, while Ma et al. [20] proposed an ST graph network (MFSGN) to model static and dynamic correlations among turbine clusters. Both significantly outperform traditional approaches. High-resolution regional forecasting has also benefited from deep learning, with Resifi et al. [21] introducing a hybrid recursive–downscaling strategy, Zhang and Yin [22] developing a PCA-enhanced U-Net architecture, and Zhang et al. [23] presenting a multimodal ST-DFNet incorporating meteorological and topographical factors for accurate multi-step prediction.

Further, spatio-temporal models have been applied to offshore wind and interpretability-focused studies. Hu et al. [24] presented a dual-branch attention neural network (DBANN) for offshore wind forecasting, which separately captures spatial and temporal dependencies before combining them in a fusion module, yielding large error reductions in real-world datasets. Y. N. Zhao et al. [25] proposed a framework to interpret LASSO regression by quantifying spatio-temporal correlations, revealing how feature collinearity and spatial distribution impact predictive accuracy. Verdone et al. [26] reviewed advances in multi-site forecasting, emphasizing that integrating spatial information not only improves predictive performance but also enhances model robustness, while noting the pressing need for standardized ST benchmarks to facilitate cross-study comparisons.

Collectively, these studies highlight that spatio-temporal models—often integrated with attention mechanisms, graph learning, and multimodal feature fusion—represent a critical pathway toward more accurate, robust, and interpretable wind forecasting. By combining the ability to model temporal dependencies, capture inter-site correlations, and incorporate diverse data sources, ST-based frameworks extend the capabilities of traditional approaches and provide strong support for applications such as grid scheduling, offshore wind integration, and renewable energy planning. The convergence of spatio-temporal learning with statistical, physics-informed, and AI-based methods thus forms a promising research frontier for achieving high-penetration renewable integration.

Numerical Weather Prediction (NWP) models provide physically consistent, large-scale meteorological inputs and are indispensable for medium- and long-term renewable energy forecasting. However, their high computational cost, limited local adaptability, and cumulative forecast errors motivate the development of hybrid approaches. Recent studies increasingly integrate NWP with statistical, physical, and deep learning methods to enhance forecasting accuracy, robustness, and interpretability.

Wan et al. [27] proposed the NWP-Time2Vec-xLSTM-TCCNN model, which incorporates NWP-derived variables and a hybrid xLSTM–CNN architecture. This method effectively captures nonlinear temporal features and achieved notable improvements, reducing MAE by 12.28–20% for photovoltaic forecasting and 10.33–11.53% for wind power forecasting. Ignatev et al. [28] refined NWP-based forecasts through localized statistical calibration and incorporated operational turbine data to simulate site-specific power output, demonstrating higher reliability across diverse conditions. Michalakopoulos et al. [29] further integrated NWP inputs, physical turbine specifications, and power curve equations into a Temporal Fusion Transformer (TFT). Their framework yielded substantial performance gains, including RMSE reductions of up to 60% and R² values exceeding 99%.

Overall, the hybridization of NWP with statistical inference, physical modeling, and deep learning effectively compensates for the limitations of standalone models. By combining large-scale atmospheric information with local corrections and nonlinear learning, these integrated frameworks deliver more accurate, robust, and physically coherent wind power forecasts, thereby supporting grid planning, renewable integration, and market optimization.

Artificial intelligence (AI) models has become a cornerstone of data-driven wind resource modeling due to its strong nonlinear representation capability. However, challenges related to interpretability, robustness under nonstationary conditions, and cross-dataset adaptability remain. Recent work therefore increasingly combines AI models with decomposition techniques, optimization algorithms, frequency-domain analysis, graph learning, and interpretability frameworks, forming hybrid approaches with improved accuracy and stability.

Xu et al. [30] provided a cross-dataset benchmark for neural networks in wind power forecasting, showing that autoregressive models perform more stably in short-term horizons, while neural networks excel in long-term forecasting. Hybrid decomposition–AI models have also demonstrated strong gains. Dong et al. [31] applied CEEMD–Hilbert decomposition with LSTM, while Ullah et al. [32] combined TVFEMD, DWT, GRU, and Transformer architectures for multiscale feature extraction. Other studies further integrate VMD, graph convolution, LSTM, evolutionary optimization, or PatchTST to better capture complex, nonstationary wind patterns [33,34,35]. Frequency-domain enhancement has also been explored, such as the dual-path frequency Mamba-Transformer of Hong et al. [36], which reduced RMSE by more than 60%.

Interpretability has become another major direction. Methods combining feature selection, TFT, SHAP [37] analysis, and MVMD-based [38,39] decomposition have provided more transparent forecasting while maintaining accuracy. Ensemble approaches such as ISI-Net [40] also support both interpretability and intelligent model selection. Optimization and ensemble learning continue to reinforce AI frameworks. Examples include offshore short-term models based on EEMD-BO-BiGRU [41], feature-enhanced xLSTM systems with strong transferability [42], and hybrid point–interval forecasting frameworks optimized by multi-objective strategies [43].

In addition to traditional forecasting approaches, recent advances in model and algorithm selection have emerged as an important research direction closely related to this study. Model selection is a fundamental component of automated machine learning (AutoML), where the goal is to automatically identify the most suitable model or ensemble for a given task or data regime [44]. Contemporary AutoML frameworks incorporate a variety of selection strategies, including meta-learning, Bayesian optimization, reinforcement learning, evolutionary search, and multi-objective model merging [45]. For example, the HM3 framework introduces hierarchical multi-objective model merging to integrate pretrained models efficiently, while recent surveys highlight the increasing role of evolutionary computation and neural model selection in large-scale systems [46,47]. Although these methods demonstrate strong adaptability, they are typically designed for offline optimization, require substantial computational resources, and do not account for the nonstationarity and fine-grained temporal dynamics characteristic of wind speed forecasting [48]. In contrast, the proposed MSFS framework provides a lightweight, time-step-wise model selection strategy tailored to rapidly evolving wind conditions, thereby extending AutoML concepts into a real-time, domain-specific forecasting context.

In this study, the six selected forecasting models were chosen to represent the major architectural paradigms that have demonstrated strong performance in nonlinear and nonstationary wind speed forecasting. Hybrid convolution–recurrent structures (CNN-LSTM, CNN-GRU) effectively capture local temporal patterns and sequential dependencies, while dilated temporal convolutions combined with recurrence (TCN-LSTM) provide multi-scale receptive fields for modeling long-range dynamics. The LSTM–XGBoost hybrid integrates deep temporal feature extraction with nonlinear ensemble learning, offering robustness under varying data regimes. Attention-based models such as the Transformer have recently emerged as state-of-the-art for capturing global dependencies, and graph-enhanced architectures (GNN-TCN) further incorporate spatial interactions among turbines. Together, these models offer complementary representational strengths that are well-suited for evaluating a dynamic model-selection framework. Overall, these studies show that AI alone cannot fully satisfy requirements for accuracy, robustness, interpretability, and generalizability. Hybrid AI frameworks—integrating decomposition, optimization, graph learning, frequency-domain modeling, and interpretability modules—have emerged as a dominant direction. Meanwhile, effective wind forecasting increasingly relies on multi-model designs that combine statistical, physical, spatio-temporal, and AI methods. By incorporating multivariable drivers such as meteorology, spatial heterogeneity, and turbine characteristics, these integrated frameworks more accurately capture wind-to-power dynamics, thereby improving forecasting reliability, supporting grid integration, and advancing low-carbon energy systems.

Building on these insights, the main contributions of this paper are summarized as follows:

Multi-model ensemble framework for wind speed forecasting. We employ multiple forecasting models in parallel and identify the time-varying optimal model using absolute error as the selection criterion, thereby exploiting complementary strengths across regimes to improve accuracy and robustness.

Dynamic model selection mechanism. We introduce an online learning–based retraining strategy for the identified optimal models and develop a classification model that automatically selects the most suitable predictor at each time point, enabling intelligent and adaptive model choice.

Multivariable wind speed–power conversion model. We construct a conversion curve that augments the traditional wind speed–power relationship with temperature and altitude, yielding a more faithful representation of the nonlinear mapping from wind speed to power output and enhancing engineering applicability.

Uncertainty analysis for the conversion process. We incorporate probabilistic modeling to quantify uncertainty in predicted power, explicitly accounting for environmental perturbations and model error, thereby strengthening decision support for grid scheduling, risk management, and energy optimization.

Operational integration via theoretical power forecasting. By combining wind speed forecasts with the multivariable conversion model, we derive theoretical power forecasting that provide actionable guidance for dispatch, generation planning, and large-scale wind integration, improving wind-farm operational efficiency and grid stability

2. Conceptual Framework of the MSFSC Model

In the MSFSC design, four principal elements are integrated, consisting of forecasting models, a classifier, and an algorithmic module for optimization.

2.1. Framework of Predictive Models and Classifiers

This section details six advanced sub-models utilized in the model selection component of wind speed prediction. The suite of models comprises Transformer [49], CNN-LSTM [50], TCN-LSTM [51], CNN-GRU [52], LSTM-XGBoost [53], and GNN-TCN [54]. These algorithms are recognized for their robustness, computational effectiveness, and strong forecasting capability. Furthermore, the Transformer is applied as an auxiliary tool to improve the selection of candidate models [55].

2.2. Classification-Integrated Framework for Model Selection Forecasting

To address the challenges of accuracy, adaptability, and reliability in wind speed and power forecasting, this study proposes an integrated forecasting framework that combines data-driven deep learning models with physical and probabilistic modeling techniques. The proposed framework is designed to capture the complex temporal dependencies of wind speed, adaptively select the most suitable forecasting model, and quantify forecasting uncertainty to enhance its practical applicability for wind farm operation and grid management. The methodological process consists of seven main steps, as illustrated in Figure 2.

Step 1 Data Collection and Pre-training: In this study, operational data from six wind turbines were collected, each containing 115 days of 10 min wind speed measurements. The first 108 days were used for model pre-training through 100 rolling-window iterations. In each iteration, a fixed window of 1008 consecutive samples was assigned as the training set, followed by the next 144 samples as the test set. The entire window was then shifted forward by 144 samples (equivalent to one day of data), and the process was repeated until 100 windows had been evaluated. This rolling-window strategy preserves temporal causality, avoids data leakage, and enables model assessment under diverse wind regimes. For each window, six advanced deep learning models (CNN-LSTM, Transformer, TCN-LSTM, CNN-GRU, LSTM-XGBoost, and GNN-TCN) were independently trained and tested. The model with the smallest absolute error on the test portion of each window was recorded as the optimal model label. These labels are used solely for training the classifier and do not involve running all six models during real-time forecasting. Accordingly, for wind turbine

i

at rolling-window iteration

T

, the constructed input-output dataset is formulated as shown in Equation (1).

D_{i, T} = [\begin{matrix} \begin{matrix} d_{1, i} \\ \dots \\ \begin{matrix} d_{1008 - p + 1, i} \\ d_{1009 - p + 1, i} \\ \begin{matrix} \dots \\ d_{1152 - p + 1, i} \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} \begin{matrix} \dots \\ ⋱ \\ \begin{matrix} \dots \\ \dots \\ \begin{matrix} ⋱ \\ \dots \end{matrix} \end{matrix} \end{matrix} & \begin{matrix} d_{p, i} \\ \dots \\ \begin{matrix} d_{1007, i} \\ d_{1008, i} \\ \begin{matrix} \dots \\ d_{1151, i} \end{matrix} \end{matrix} \end{matrix}| & \begin{matrix} d_{p + 1, i} \\ \dots \\ \begin{matrix} d_{1008, i} \\ d_{1009, i} \\ \begin{matrix} \dots \\ d_{1152, i} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix}] \begin{matrix} Training data \\ Testing data \end{matrix}

(1)

where i denotes the index of the wind turbine

(i = 1, 2, \dots, 6)

, and T represents the iteration number of the testing process

(T = 1, 2, \dots, 100)

.

Step 2 Retraining of the forecasting models using the optimal datasets: To fully exploit the predictive capability of each model, the optimal datasets associated with the optimal model at each time step were consolidated and used for retraining. This process enabled fine-tuning of model parameters, thereby enhancing the robustness, adaptability, and generalization performance of each model.

Step 3 Development of the model selection mechanism: The model selection mechanism was designed to enable adaptive identification of the most suitable forecasting model at each time step. As formulated in Equation (2), the optimal model label was generated by selecting, at every time point, the forecasting model that yielded the minimal absolute error among the six candidates. Importantly, although these labels are derived from forecasting performance, the classifier uses only the raw input features and does not access forecasting outputs, errors, or model parameters, thereby avoiding any form of information leakage. The classifier is trained to learn the mapping from current input conditions to the most suitable forecasting model prior to prediction, enabling dynamic adaptability during online forecasting.

D_{i, t}^{C_{j}} = \{d_{i, t - p + 1}^{C_{j}}, \dots, d_{i, t - 1}^{C_{j}} \to \min_{k \in [1, 6]} {|d_{i, t}^{C_{j}} - {\hat{d}}_{i, t}^{C_{j}}|}_{k} \to M_{i, k}\}

(2)

The optimal forecasting model at each time step is selected based on the absolute error (AE). This criterion is particularly suitable for point-wise model selection, as it directly measures the deviation between the predicted and observed values at each individual time step without introducing scale-dependent or distributional biases. Compared with alternative metrics such as RMSE or MAPE, AE offers greater robustness under nonstationary wind conditions, where abrupt fluctuations and occasional extreme values are common. The squared penalty in RMSE tends to overemphasize these rare outliers, which can distort the model ranking and lead to unstable selection decisions. Preliminary comparisons conducted in this study showed that AE provides consistent model rankings across different wind regimes while avoiding excessive sensitivity to extreme deviations. Therefore, AE is adopted as the primary criterion for identifying the best-performing model at each forecasting step.

Step 4: Wind speed forecasting: During the online forecasting stage, the newly received wind speed inputs are first passed to the trained classification model, which rapidly identifies the most suitable forecasting model based on the learned selection mechanism. Only the selected model is then executed to generate the final wind speed prediction, meaning that the forecasting phase requires running a single model rather than all six candidates. The classifier itself is lightweight and adds negligible computational overhead, ensuring real-time responsiveness. This adaptive forecasting procedure enhances prediction accuracy while maintaining computational efficiency and avoiding performance degradation associated with relying on a single model.

Step 5 Wind speed–power curve: Based on the historical operational data, a wind speed–power conversion curve was established to characterize the nonlinear relationship between wind speed and turbine output power. Environmental factors such as air temperature, altitude, and humidity were incorporated into the modeling process to improve the generalization of the conversion law.

The output power of a wind turbine originates from the kinetic energy carried by the moving air mass, and its theoretical value can be derived from the principle of fluid dynamics. Given the rotor swept area A, air density ρ, and wind speed v, the total power available in the wind flow can be expressed as:

P_{wind} = \frac{1}{2} ρ A v^{3} .

(3)

However, a wind turbine cannot convert all the available wind energy into mechanical energy. According to Betz’s Law [56], the theoretical maximum efficiency of energy extraction from the wind flow is limited to 59.3%. This conversion efficiency is characterized by the power coefficient “C” _“p”, defined as the ratio of the actual mechanical output power of the turbine to the theoretical wind power:

C_{p} = \frac{P_{mech}}{P_{wind}} .

(4)

Therefore, under ideal operating conditions, the mechanical output power of a wind turbine can be expressed as:

P_{mech} = \frac{1}{2} ρ A C_{p} v^{3} .

(5)

Temperature and altitude do not enter the model as arbitrary additional features, but through their physical effect on air density

ρ

. Using standard thermodynamic relationships,

ρ

is modeled as a function of temperature and altitude, and variations in

ρ

induce proportional variations in

P_{mech}

at a given wind speed. Therefore, the “weights” of temperature and altitude in the multivariable wind speed–power model are not free parameters but are determined explicitly by the air-density correction function.

This wind speed–power conversion model provides a physical foundation for validating and interpreting the forecasting results obtained in the previous steps.

Step 6 Theoretical power forecasting: The wind speed values were input into the established wind speed–power conversion curve to forecast the theoretical power outputs for different wind turbines. This process yielded the estimated mechanical power under ideal operating conditions, serving as a reference for wind farm dispatching and power grid operation. The theoretical power forecasting results provide valuable insights for short-term operational planning and contribute to maintaining grid stability and optimizing energy management strategies.

In practical wind turbine operations, the power coefficient

C_{p}

may gradually degrade over time due to blade fouling, mechanical wear, and other aging effects. To ensure that the wind speed–power conversion model remains aligned with the turbine’s current operating conditions, this study employs a short-term rolling-window strategy. Specifically, only the most recent 108 days of operational data are used to construct the empirical wind speed–power curve. As new measurements become available, older samples are discarded, and both the forecasting models and the estimated

C_{p}

curve are continuously updated. This rolling-update mechanism allows

C_{p}

to reflect the turbine’s recent performance characteristics and implicitly captures gradual efficiency changes without relying on long-term static assumptions. Consequently, long-term degradation effects do not accumulate in our framework, and the theoretical power estimation remains robust and representative of current turbine conditions.

Step 7 Uncertainty analysis: To further enhance the reliability and practical applicability of the proposed forecasting framework, probabilistic modeling was incorporated for uncertainty analysis. Based on the predicted wind speed values, a probability distribution model was constructed to capture the stochastic characteristics of wind variations and model forecasting errors. This evaluation of uncertainty examines the combined effects of environmental disturbances and model-generated variability, thereby reinforcing the stability and dependability of the forecasted results. It also provides a foundation for risk-aware operational planning in power grids and the optimization of renewable energy utilization. The metrics applied in this analysis are summarized in Table 1.

The flowchart representing both the wind speed forecasting system and the wind energy conversion system is presented in Figure 2.

3. Experiment and Analysis

In this section, three experiments are designed to examine both the predictive performance of the proposed MSFSC and the theoretical power generation for individual wind turbines. Before detailing the experimental procedures, the relevant dataset and the criteria for evaluation are introduced.

3.1. Data Source

High-resolution wind speed data, sampled at 10 min intervals from six turbines in a wind farm located in northwestern China, constituted the primary dataset for this investigation. Each dataset encompassed 15,552 continuous measurements over a 108-day period. Figure 3 summarizes the statistical characteristics of these datasets. For model construction, the majority of data (approximately 99%) were utilized for training, while the final 6.5% were set aside for forecasting performance evaluation, ensuring a rigorous validation framework.

3.2. Evaluation Criteria

In this study, multiple statistical indicators were employed to comprehensively evaluate the forecasting performance from different perspectives. The error-based metrics (MAE and RMSE) measure the overall deviation between predicted and observed wind speeds, reflecting the absolute accuracy and robustness of the model. The relative error and stability metrics (MAPE and STDAPE) assess the magnitude and variability of percentage errors, providing insights into model consistency under varying wind conditions. The directional accuracy metric (DA) evaluates the model’s capability to correctly capture the changing trends of wind speed, which is essential for short-term operational control. Finally, the correlation and overall performance metrics (R² and TIC) quantify the explanatory power and overall efficiency of the forecasting model. These evaluation indices collectively form a comprehensive, multidimensional framework encompassing forecast accuracy, stability, trend-following capacity, and correlation strength, thereby providing a rigorous assessment of wind speed prediction performance and classification accuracy (ACC). The precise definitions of these indices are detailed in Table 2.

The six forecasting models differ in both architectural and computational complexity. CNN-LSTM and CNN-GRU are the most efficient due to their lightweight convolution–recurrent structures, while LSTM-XGBoost and TCN-LSTM exhibit medium complexity. The Transformer and GNN-TCN are the most computationally demanding because of quadratic self-attention and graph-convolution operations.

Using the training dataset (1008 × 7) and test dataset (144 × 7), and running on an Intel i9-13900K CPU, 64 GB RAM, and an NVIDIA RTX 4090 GPU with a batch size of 16 and 150 epochs, all models can be trained within tens of seconds per turbine, and inference remains at the millisecond level. The approximate runtime differences are summarized in Appendix A Table A1. Overall, despite the varying complexity, all six models are computationally feasible for real-time or near real-time short-term wind speed forecasting, supporting the practical deployment of the MSFS framework.

3.3. Experiment I: Forecasting System for Model Selection Driven by Classification

In this experiment, wind speed forecasts were generated using six deep learning models across six wind turbines for 100 iterations, yielding a total of 14,400 test results. The 100 iterations were designed to determine the optimal forecasting model at each time point. Specifically, the model yielding the smallest absolute error among the six forecasting values at each time point was selected as the optimal model. The test results of single model (in Table 3) and the results of forecasting system (in Figure 4 and Table 4).

Table 3 summarizes the wind speed forecasting performance of six wind turbines under different forecasting models. Overall, the performance variations among models across different turbines are relatively small, indicating that each deep learning models exhibit strong generalization capability and stable predictive behavior. Among them, the CNN-GRU, CNN-LSTM, and LSTM-XGBoost models achieve superior accuracy for most turbines, reflected by their lower aggregate error metrics and higher correlation coefficients.

Table 4 summarizes the wind speed forecasting performance for six wind turbines using the classification-driven Model Selection Forecasting System (MSFSC). Over 14,400 test samples, the optimal selection of each model exhibits substantial variability, highlighting the non-stationary and time-dependent behavior of wind speed, which results in ongoing adjustments of the optimal forecasting model over time.

The experimental results for six wind turbines under different forecasting strategies are summarized in Table 3 and Table 4 and Figure 4. Overall, all six deep learning models achieved high forecasting accuracy; Model performance differed across turbines and temporal intervals, illustrating the dynamic and non-stationary properties of wind speed data. For all 14,400 test points, the optimal model at each time step was determined using the criterion of minimum absolute error, providing a systematic approach for adaptive model selection.

For Turbine #1, the distribution of optimal points (OP) shows that CNN-LSTM, Transformers, TCN-LSTM, CNN-GRU, LSTM-XGBoost, GNN-TCN and MSFS generated 1925, 2737, 2638, 2139, 2031 and 2930 optimal points. Similar patterns were observed across other turbines, indicating that different models dominate under different wind conditions. This variation indicates that no single model consistently dominates, as model performance varies with the dynamic characteristics of wind speed data. When the Transformers-based MSFS was applied, the forecasting accuracy improved markedly, achieving an MAE of 0.0650, RMSE of 0.1052, and MAPE of 1.71%, outperforming the average single-model results (MAE ≈ 0.11, RMSE ≈ 0.16). The Directional Accuracy (DA) also rose from about 75% for single models to over 93% under MSFS, confirming the system’s enhanced capability in capturing wind speed fluctuations and improving temporal adaptability.

As shown in Table 4 and Figure 4 (Part C), the proposed system accurately reproduced the real distribution of optimal models. For Turbine #1, the Transformers classifier correctly identified Model Selection Accuracy (SA) of all turbines were 94.59%, 91.34%, 94.61%, 92.89%, 92.00%, and 91.53%, yielding an average model selection accuracy of approximately 92.8%.

For other Turbines the proposed MSFS consistently outperformed all single models across four categories of evaluation metrics. In terms of error-based metrics, the MSFS achieved MAE < 0.07 and RMSE < 0.11, indicating a substantial reduction in overall forecasting errors. Regarding relative error and stability metrics, the MAPE remained below 2%, and STDAPE values were notably lower, reflecting stable and reliable performance under varying wind conditions. The improvement in Directional Accuracy (DA) from around 75% for single models to over 92% demonstrates the system’s superior ability to capture wind speed fluctuations and trend reversals. Moreover, the R² values exceeding 0.99 and TIC below 0.008 confirm the strong consistency and robustness of the forecasts. Collectively, these results verify that the MSFS provides accurate, stable, and adaptive wind speed forecasting across multiple turbines and operating environments.

Remark. The proposed MSFS successfully integrates six deep learning models, achieving an average model selection accuracy above 92% across 14,400 test instances, while significantly improving overall forecasting precision (R² > 0.99, MAPE < 2%) and directional consistency. These findings validate the robustness, adaptability, and practical value of the Transformers-driven dynamic model selection approach for wind speed forecasting and subsequent wind power forecasting applications.

3.4. Experiment II: Analysis of Classification and Forecasting Outcomes for Various Wind Turbine Categories

The above results demonstrate that the proposed MSFS substantially enhances model selection accuracy and forecasting stability, outperforming all individual forecasting models. In the forecasting phase of this study, a total of 1008 time points were predicted. Building on this foundation, Table 5 and Figure 5 provides a comprehensive evaluation of the forecasting performance of each individual model and the MSFS framework, focusing on overall accuracy, trend consistency, and error distribution to further assess the effectiveness and applicability of the proposed approach.

In Table 5 and Figure 5 shows the forecasting performance of each model based on their optimal forecasting points and the MSFS which display best performing of different models. However, some standalone models appear to outperform MSFS for certain turbines. For example, in Turbine #2, the Transformers model achieves 170 optimal forecasting points, which the MAE and RMSE are 0.0724, 0.1345, both lower than the MAE of MSFS. Similarly, in Turbine #4, LSTM-XGBoost obtains 145 optimal forecasting points and achieves the MAE (0.1095), outperforming the MAE (0.1419) of MSFS for that turbine. These local advantages do not indicate superior overall performance; rather, they arise because these models happened to be optimal for a larger number of time steps in these turbines, causing metrics of optimal forecasting points to be skewed toward their best-performing intervals and artificially reducing their average errors.

Meanwhile, Table 5 also reveals that some models exhibit relatively high MAPE values. For instance, the MAPE of TCN-LSTM reaches 3.55% in Turbine #2 and 3.44% in Turbine #2. As illustrated in Figure 5 (Part B), these inflated MAPE values primarily occur in the 0–3 m/s low-wind speed region, which is below the cut-in wind speed of the turbines. In this range, the actual wind speed is close to zero, causing proportional error metrics such as MAPE to be artificially magnified due to the small denominator. Importantly, because turbines do not generate power in this wind speed region, such errors do not affect the accuracy of the wind-power conversion process, nor do they pose risks for power system operation or grid dispatching.

Despite the localized advantages observed in certain models at their OP intervals, the MSFS consistently demonstrates superior and more stable performance across all turbines. The MAE remains within 0.105–0.129, the RMSE is maintained between 0.177 and 0.221, and all R² values exceed 0.98, indicating that the system reliably captures the temporal dynamics of wind speed and produces highly consistent forecasts. Furthermore, the model selection accuracy (SA) of MSFS remains within 91–95% across the six turbines, confirming that the Transformers-based classifier effectively identifies the optimal model at most time steps—an essential factor contributing to MSFS’s overall performance advantages.

Remark: although standalone models may show better performance at their optimal points and proportional errors may appear elevated in the low-wind speed region, these factors do not alter the broader conclusion. Considering that the single-model metrics are inherently OP-based, the MSFS—enabled by its high-precision dynamic model selection mechanism—achieves more stable, accurate, and generalizable forecasting performance across the full time sequence, making it a more practical and reliable approach for real-world wind speed forecasting applications.

3.5. Experiment III: Theoretical Assessment of Power Generation for Individual Wind Turbines

Accurately estimating the theoretical power output is essential for evaluating wind turbine performance, optimizing wind farm operational strategies, and supporting power grid dispatching. As theoretical power is highly sensitive to wind speed typically following a cubic relationship, accurate wind speed forecasting becomes a critical prerequisite for reliable theoretical power estimation. Even minor deviations in predicted wind speed can lead to significant amplification in power calculation errors.

Building on the reliable wind speed forecasting results obtained in the previous section, this study converts the forecast wind speeds into corresponding theoretical power outputs using turbine-specific wind speed–power characteristic curves. The theoretical power represents the maximum mechanical power a wind turbine can achieve under ideal operating conditions, serving as an upper reference bound for actual power generation and providing essential insights for operational state assessment and short-term power planning. By examining how wind speed forecasting accuracy influences the stability and reliability of power conversion, this section further evaluates the applicability of the proposed forecasting framework to short-term wind power forecasting and wind farm operational decision-making.

Based on the above analysis, Table 6 and Table 7 provide a summary of the wind turbine parameters and the corresponding power curve configurations used in this study. The standard power curve is constructed based on the turbine’s aerodynamic characteristics under standard air-density conditions, while the power coefficient is represented by a polynomial function of wind speed obtained from manufacturer data.

To account for variations in atmospheric temperature and altitude, an air-density correction function is applied to adjust the standard curve to actual operating conditions. In addition, three versions of the actual power coefficient curve—representing the lower bound, standard condition, and upper bound—are included to reflect the uncertainty and performance variability of six wind turbines.

Accurate estimation of aerodynamic power requires a physically consistent representation of air density, as mechanical power extraction is directly proportional to

ρ

. Manufacturers typically assume the standard air density

ρ_{0} = 1.225 kg / m^{3}

which is defined at sea-level pressure and a reference temperature of

15^{\circ} C

. However, the turbines investigated in this study are situated at an elevation of H = 1152, where both atmospheric pressure and temperature differ significantly from standard conditions. As a result, using

ρ_{0}

introduces a systematic bias that propagates into the theoretical power estimation.

To account for local atmospheric conditions, air density was recalculated for each time step using the temperature–altitude correction function:

ρ (T, H) = \frac{353.1 e^{\frac{- 342 H}{T}}}{T},

where

T

is the ambient temperature in Kelvin,

H

is turbine elevation (in kilometers under the model’s empirical scaling), and the coefficient 353.1 arises from the combination of the hydrostatic equation, gas constant, and empirical lapse-rate approximation.

This formulation incorporates two fundamental physical mechanisms:

Pressure reduction with altitude:

$p (H) \propto e^{- k H},$

describing the exponential decay of atmospheric pressure as elevation increases.
Thermal expansion of air:

$ρ \propto \frac{1}{T},$

reflecting the ideal-gas relationship between density and temperature.

Both factors significantly influence air density in mountainous regions. Applying the above correction to the 144-sample dataset produced density values ranging from 0.88 to 0.95 kg/m³, substantially lower than the constant value

ρ_{0} = 1.225 kg / m^{3}

. This reduction aligns with theoretical atmospheric models and field measurements at elevations above 1000 m.

Since aerodynamic mechanical power is given by

P_{theoretical} = \frac{1}{2} ρ A C_{p} v^{3},

where

A = π R^{2}

is the swept area and

C_{p} (v)

is estimated empirically from operational data, any deviation in air density directly scales the theoretical power estimate. Substituting

ρ_{0}

into the power equation yields.

P_{ρ_{0}} (v) = \frac{1}{2} ρ_{0} A C_{p} v^{3},

whereas using the corrected density gives.

P_{ρ (T, H)} (v) = \frac{1}{2} ρ (T, H) A C_{p} v^{3} .

Thus, the ratio of the two theoretical power estimates becomes.

\frac{P_{ρ (T, H)}}{P_{ρ_{0}}} = \frac{ρ (T, H)}{ρ_{0}},

which for the observed dataset lies between 0.72 and 0.78, implying that the manufacturer-based power curve overestimates aerodynamic power by approximately 22–28%. Such discrepancies are particularly problematic in wind farms located at high altitudes, where theoretical curves derived from

ρ_{0}

no longer form realistic upper performance bounds.

In contrast, the corrected air density

ρ (T, H)

yields theoretical power curves that closely follow the empirical envelopes observed in SCADA data, thereby improving the physical interpretability of the wind–power conversion model. This correction ensures that theoretical power outputs accurately reflect the turbine’s instantaneous operating environment and prevents systematic bias in the subsequent wind power forecasting analysis.

Based on the forecasting wind speeds of each wind turbine and the turbine parameters of power curve Table 6 and Table 7, the theoretical power output of each wind turbine can be further derived. This step enables a direct evaluation of how forecasting accuracy in wind speed propagates into the estimation of wind energy production. To illustrate this process and assess the reliability of the proposed forecasting framework from an energy perspective, Figure 6 presents the theoretical power curves, the corresponding power forecasting results, and the comparative performance of different forecasting models and the MSFS. The figure provides a comprehensive view of the temporal evolution of predicted theoretical power as well as the accuracy improvements achieved through dynamic model selection.

The accuracy of theoretical power forecasting is strongly governed by the precision of wind speed forecasting. As shown in Table 8, the NRMD values across all turbines remain within 3.8–5.1% for single models, while the MSFS further reduces this range to 3.2–4.5%, indicating smaller deviations between predicted and reference theoretical power. Similarly, RMSE values are consistently lower for MSFS (e.g., 67.69 kW for Turbine #1 and 57.85 kW for Turbine #2), compared with 70–80 kW exhibited by most single models. The R² values further confirm this improvement, with MSFS achieving 0.995–0.998, compared to 0.982–0.994 for single models. These results indicate that the enhanced wind speed accuracy achieved by MSFS directly translates into more reliable theoretical power estimation.

A notable feature is that the Mean Percentage Error (MPE) is consistently negative for all turbines, typically ranging from −2% to −6%. This indicates that the predicted theoretical power is slightly lower than the actual theoretical output. Such a conservative bias is operationally beneficial: under-forecasting avoids the risk of overestimating available wind power, which could otherwise lead to dispatch imbalance or insufficient reserves in grid scheduling. Importantly, this conservative tendency does not compromise forecasting accuracy, as evidenced by the high R² and low NRMD values. Therefore, the forecasting framework provides both high-precision and operational safety, making it particularly suitable for real-time power dispatching and short-term wind farm operational planning.

Remark: The experimental results demonstrate that the proposed forecasting framework provides accurate and operationally reliable theoretical power forecasting. By enhancing wind speed forecasting accuracy through dynamic model selection, the framework achieves low conversion errors and high consistency with turbine power characteristics. The consistently negative but small MPE values indicate a conservative bias that avoids overestimation, thereby reducing risks in grid scheduling and short-term dispatch decisions. Overall, the forecasting system not only ensures high forecasting fidelity but also offers a safety margin beneficial for wind farm operation and power system management, making it particularly suitable for practical dispatch applications.

4. Discussion

In this subsection, a thorough examination of the proposed forecasting system is provided, focusing on two key areas: the relevance and uncertainty evaluation of the system, and an analysis of wind energy generation within wind farms.

Analysis of the Proposed Model’s Significance and Uncertainty

To evaluate whether the wind speed forecasts generated by the seven predictive models preserve the statistical characteristics of the observed wind speeds, both the Kruskal–Wallis (KW) test [57] and the Kolmogorov–Smirnov (KS) [58] test were employed. These complementary non-parametric approaches enable a rigorous assessment of central tendency and distributional fidelity, which is essential given that wind speed data follow a Gamma distribution and deviate substantially from normality.

Together, the KW and KS tests provide a comprehensive framework for evaluating whether each forecasting model maintains both the central tendency and the full distributional properties of the observed wind speeds, enabling an assessment that extends beyond pointwise accuracy metrics. The KW statistic approximately follows a chi-square distribution with

k - 1

degrees of freedom, where

k

is the number of groups, while the KS statistic is based on the supremum of the absolute CDF differences. In both cases, a p-value below the selected significance level leads to the rejection of the null hypothesis that the predictions and observations share the same statistical characteristics. An increased value of the KW statistic signifies more pronounced differences between groups. When the resulting p-value is less than the selected significance level, the null hypothesis, which posits that all samples are drawn from the same distribution, is rejected. The computation of the test statistic H is expressed by the following formula:

H = \frac{12}{N (N + 1)} (\sum \frac{p_{j}^{2}}{n_{j}}) - 3 (N + 1) D_{n} = \sup_{x} |F_{n} - G_{n}| .

(6)

To rigorously assess the uncertainty associated with wind speed forecasting, the t-Location-scale distribution [59] is utilized to model the forecast error, characterized by the location parameter (μ), scale parameter (σ), and degrees of freedom (ν). The corresponding probability density function (PDF) of this distribution is formulated as follows:

f (x; ν, µ, σ) = \frac{Γ (\frac{v + 1}{2})}{σ \sqrt{v π} Γ (\frac{v}{2})} {[\frac{v + {(\frac{x - μ}{σ})}^{2}}{v}]}^{- (\frac{v + 1}{2})}

(7)

where Γ(•) is the gamma function, µ is the location parameter, σ is the scale parameter, and ν is the shape parameter.

The Kruskal–Wallis results show in Table 9 that all model vs. actual data comparisons yield p-values of 1.000 with 0.05 significance level. This indicates that, at the 95% confidence level, none of the six single models exhibit a significant distributional difference from the real wind speed data, confirming that their predictive outputs preserve the underlying wind speed distribution. Building upon this consistent statistical foundation, the MSFS further enhances forecasting performance by dynamically integrating the strengths of these models. As a result, MSFS achieves more accurate and distributionally stable wind speed forecasting, outperforming any single model despite their individually non-significant KW deviations.

In Table 9 and Figure 7, at the 95% confidence level, the uncertainty analysis based on FICP, FINAW, and AWD demonstrates the superior interval performance of the MSFS. Compared with the single models, MSFS consistently achieves the highest coverage, with FICP reaching 93–95%, indicating that its forecasting intervals more reliably encompass actual wind speed variations. Meanwhile, MSFS produces the narrowest normalized interval widths (FINAW ≈ 0.008–0.015), reflecting tighter and more informative uncertainty bounds. In addition, MSFS yields the lowest weighted deviation (AWD ≈ 0.17–0.37), showing minimal discrepancy between the interval forecasting and observed values. Collectively, these results confirm that the MSFS framework provides more accurate, compact, and reliable uncertainty analysis than any single forecasting model.

Remark: The KW test and uncertainty analysis confirm that the MSFS forecasting framework preserves the statistical characteristics of real wind speeds while providing more accurate and reliable forecasting value than any single model.

5. Conclusions

This study proposed an MSFS based on the classification model for short-term wind speed forecasting framework that integrates six deep learning models through time-step model selection. Using multi-turbine operational datasets, the framework was further extended to theoretical power forecasting and uncertainty analysis, addressing the challenges of wind variability and short-term operational decision-making.

The experimental results show that MSFS achieves superior wind-speed prediction performance relative to single-model approaches. The Kruskal–Wallis test shows that all models exhibit no significant deviation from the actual data at the 95% confidence level, confirming strong distributional consistency. The uncertainty analysis further indicates that MSFS achieves higher interval coverage (up to 93–95%), narrower interval widths, and lower weighted deviation than any single model, demonstrating superior interval reliability. When combined with turbine-specific power curves, the high-accuracy wind speed forecasts enable stable and reliable estimation of theoretical power output.

Accurate wind speed and theoretical power forecasting plays a crucial role in power system operation, as wind speed dictates aerodynamic turbine performance and theoretical power serves as an upper bound for dispatch planning and reserve scheduling. The MSFS framework, with its slightly conservative yet highly precise forecasts, effectively reduces the risk of overestimation-induced scheduling imbalance while providing operators with dependable uncertainty bounds for risk-aware decision-making. These characteristics make the proposed approach particularly suitable for wind farm operation, short-term grid scheduling, and safety margin planning.

Future research will explore the integration of multi-source environmental data, the extension of MSFS to actual power forecasting, and the application of the forecasting framework within real dispatch optimization systems. Scaling the approach to multiple wind farms and larger regional grids will also be investigated to enhance its practical applicability.

Author Contributions

Conceptualization, M.Z.; methodology, M.Z. and Q.J.; software, Z.W. and F.M.; validation, H.H. and J.P.; formal analysis, J.P.; data curation, F.M.; writing-original draft preparation, M.Z., Q.J. and Z.W.; writing-review and editing, M.Z., Q.J., Z.W., F.M., H.H. and J.P.; visualization, H.H.; supervision, Q.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets analyzed during the current study are available from the corresponding author on reasonable request. The data are not publicly available due to confidentiality agreements.

Conflicts of Interest

Authors Ming Zeng, Qianqian Jia, Fang Mao, Haotao Huang, and Jingyuan Pan were employed by Guangzhou Power Supply Bureau, a subsidiary of Guangdong Power Grid. The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

Appendix A

1.: Model Description

The six forecasting models used in this study span a wide spectrum of architectural and computational complexity, offering complementary strengths for dynamic model selection. Their hyperparameters, structural complexity, computational complexity, and runtime measurements are listed in Table A1. A detailed comparison is provided below.

Table A1. Default model settings and computational cost for the six models.

Model	Hyperparameter	Default	Structural Complexity	Computational Complexity	Training Time
CNN-LSTM	NumFilters	32	CNN filters = 32, kernel size = 3 LSTM hidden units = 64 Parameter scale: medium-to-low Two-stage hybrid structure (CNN → LSTM), a typical mixed model	CNN: O(32 × k × L) LSTM: O(L × 64²) Overall complexity: O(L × 4k + L × 4096), medium level	10–20 s Runs efficiently on GPU/CPU Suitable for medium-scale real-time forecasting
	FilterSize	3
	NumHiddenUnits	64
	Dropout	0.1
	InitialLearnRate	1.00 × 10⁻³
	MiniBatchSize	16
	MaxEpochs	150
Transformers	NumEncoderLayers	2	Self-attention + FFN Parameter scale: medium Small d_model, considered a lightweight Transformer	Multi-head attention: O(L² × d_model) FFN: O(L × 64 × 32) Main bottleneck comes from the quadratic attention term L²	8–18 s Significantly higher cost than RNN/CNN for long sequences
	dModel	32
	NumHeads	4
	NumHiddenUnits	64
	Dropout	0.1
	InitialLearnRate	1.00 × 10⁻³
TCN-LSTM	NumFilters	32	32 filters × dilation = [1, 2, 4, 8] × 2 residual blocks LSTM hidden units = 64 Parameter scale: medium-to-high	TCN (dilated convolution): O(L × 32 × 3 × blocks) LSTM: O(L × 64²) More complex than CNN-LSTM but does not suffer from the L² cost of Transformers	15–25 s Lighter than Transformers Considered a medium-complexity model
	FilterSize	3
	DilationFactor	[1 2 4 8]
	NumResidualBlocks	2
	NumHiddenUnits	64
	Dropout	0.1
CNN-GRU	NumFilters	32	CNN filters = 32 GRU hidden units = 64 Parameter scale similar to CNN-LSTM, but GRU is lighter than LSTM	CNN: O(L × 32 × 3) GRU: O(L × 64²) × 3 gates Lower than LSTM (which has 4 gates)	15–30 s Typically, faster than LSTM Medium-to-low complexity
	FilterSize	3
	NumHiddenUnits	64
	Dropout	0.1
	InitialLearnRate	1.00 × 10⁻³
LSTM-XGBoost	NumHiddenUnits	32	LSTM hidden units = 32 (smaller than other hybrid models) XGBoost trees = 200, depth = 3 Overall architecture: hybrid but not heavy	LSTM: O(L × 32²) (small scale) XGBoost: O(trees × depth × features) Overall complexity: medium-to-high depending on the number of trees	20–40 s XGBoost training can be slow on CPU
	Dropout	0.1
	max_depth	3
	learning_rate	0.05
	n_estimators	200
	subsample	0.8
	colsample_bytree	0.8
GNN-TCN	NumHiddenUnits	32	GNN layers = 2, hidden size = 32 TCN filters = 32 with dilation = [1, 2, 4] Requires graph convolution based on adjacency matrix Parameter scale: relatively high	GNN: O(Nedges × 32) TCN: O(L × 32 × 3 × layers) Combined complexity is higher than pure CNN/RNN models	25–45 s Runs efficiently on GPU/CPU
	NumGraphLayers	2
	NumFilters	32
	DilationFactor	[1 2 4]
	Dropout	0.1
	InitialLearnRate	1.00 × 10⁻³

(1): Low-Complexity Models

The CNN-LSTM and CNN-GRU represent lightweight hybrid architectures that combine convolutional feature extraction with recurrent units. Owing to their shallow convolutional layers and moderate hidden dimensions, these models exhibit low to medium structural complexity, with computational costs scaling as O(F × k × L) for convolution and O(L × H²) for recurrent operations. Their training times (8–20 s) are the shortest among all models, and inference remains at the sub-millisecond level. These properties make them well suited for scenarios requiring high responsiveness and low resource consumption.

(2): Medium-Complexity Models

The TCN-LSTM and LSTM–XGBoost models fall into the medium to medium–high complexity range.

TCN-LSTM extends the temporal receptive field through dilated causal convolutions, enabling efficient multi-scale pattern extraction. However, combining TCN with LSTM increases computational overhead, resulting in training times of 15–30 s.

LSTM–XGBoost incorporates sequential learning with ensemble-based nonlinear regression, yielding moderate structural complexity and training times of 15–25 s. Both models provide enhanced representational capacity while maintaining practical computational demands.

(3): High-Complexity Models: The Transformer and GNN–TCN exhibit the highest structural and computational complexity.

The Transformer introduces a quadratic self-attention mechanism (O(L² × d_model)), enabling the learning of global dependencies at the cost of increased computation, though its parallelization keeps training efficient (20–40 s).

The GNN–TCN is the most computationally intensive architecture, combining graph message passing (O(E × H)) with multi-scale temporal convolutions, resulting in training times of 25–45 s. These models offer the strongest expressive power but require greater computational resources.

Despite the significant differences in model complexity, all six models achieve millisecond-level inference latency on the experimental platform, ensuring real-time feasibility. Their diverse structural and computational properties justify the proposed MSFS framework, where the classifier adaptively selects the most appropriate model under varying wind conditions to balance predictive accuracy and computational efficiency.

Table A2. The Transformer Classifier Configuration and Computational Characteristics.

Category	Description
Model Type	Transformer encoder classifier
Hyperparameters	- Encoder layers: 2 - Model dimension d_model: 32 - Attention heads: 4 - FFN hidden size: 64 - Dropout: 0.1 - Optimizer: Adam, LR = 1 × 10⁻³ - Batch size: 32 - Epochs: 100
Training Input	10,080 samples × 7 features
Testing Input	4320 samples × 7 features
Structural Complexity	- Lightweight encoder architecture—Total parameters ≈ 20k—Significantly smaller than forecasting models
Computational Complexity	O (NL²d + NLddff) With (L = 7, d = 32, d_ff = 64): extremely low in practice
Training Time	3–8 s on RTX 4090 for 100 epochs (few tens of seconds on CPU)
Inference Time	Millisecond-level for 4 320 samples (negligible overhead in MSFS)
Practical Notes	- Very efficient due to short sequence length (L = 7) - Introduces negligible computational cost compared to the forecasting models - Well suited for real-time model selection

Table A3. The performance of each distribution for uncertainty estimation.

NO	Distribution	Parameter	CNN-LSTM	CNN-GRU	GNN-TCN	LSTM-XGBoost	TCN-LSTM	Transformers	MSFS
Turbine 1	normal	‘mu’	−0.0009	−0.0028	0.0010	−0.0037	−0.0024	−0.0012	−0.0005
		‘sigma’	0.1403	0.1304	0.0452	0.1329	0.1415	0.1244	0.0826
		SSE	22.7525	23.7055	36.4974	29.2852	19.0430	25.0445	48.5762
	logistic	‘mu’	0.0008	−0.0023	0.0011	−0.0024	−0.0002	−0.0008	0.0002
		‘sigma’	0.0663	0.0627	0.0240	0.0602	0.0680	0.0571	0.0369
		SSE	10.9773	12.1976	33.0165	12.6268	8.6780	9.6383	13.5334
	generalized extreme value	‘k’	−0.1654	−0.1418	−0.1932	−0.2102	−0.2666	−0.2168	−0.1488
		‘sigma’	0.1634	0.1472	0.0473	0.1673	0.1632	0.1498	0.1008
		‘mu’	−0.0591	−0.0580	−0.0167	−0.0561	−0.0541	−0.0498	−0.0350
		SSE	33.4475	34.1444	61.6511	44.6973	27.0634	38.3722	85.7774
	tlocationscale	‘mu’	0.0035	0.0002	0.0011	−0.0001	0.0016	0.0006	0.0014
		‘sigma’	0.0632	0.0618	0.0351	0.0553	0.0701	0.0561	0.0398
		‘nu’	1.8106	1.8891	5.1360	1.7513	2.0201	1.9176	2.2681
		SSE	1.5362	2.9071	36.4651	1.0454	1.1662	1.0650	7.6752
Turbine 2	normal	‘mu’	−0.0031	−0.0015	−0.0086	−0.0018	−0.0168	0.0003	−0.0036
		‘sigma’	0.3063	0.3297	0.4203	0.0946	0.2939	0.2798	0.3412
		SSE	8.7714	3.3898	6.5115	37.4751	9.2970	13.3985	7.4163
	logistic	‘mu’	−0.0061	−0.0064	−0.0018	−0.0021	−0.0146	0.0002	−0.0077
		‘sigma’	0.1454	0.1551	0.1803	0.0410	0.1377	0.1306	0.1601
		SSE	5.2544	1.7794	3.9987	10.9500	5.4140	8.7500	4.0357
	generalized extreme value	‘k’	−0.1761	−0.1463	−0.1541	−0.1438	−0.1429	−0.2046	−0.1859
		‘sigma’	0.3316	0.3671	0.4861	0.1142	0.3283	0.3138	0.3697
		‘mu’	−0.1252	−0.1370	−0.1828	−0.0411	−0.1404	−0.1088	−0.1378
		SSE	10.7998	4.5071	7.9097	59.8068	12.1690	16.0977	9.2571
	tlocationscale	‘mu’	−0.0025	−0.0060	0.0029	−0.0017	−0.0073	0.0015	−0.0087
		‘sigma’	0.1288	0.1348	0.0979	0.0408	0.1222	0.0980	0.1424
		‘nu’	1.6361	1.5962	1.0650	2.0172	1.6481	1.3634	1.6557
		SSE	1.2709	0.1652	0.1060	4.4801	1.3445	1.3377	0.7493
Turbine 3	normal	‘mu’	0.0044	0.0150	−0.0059	−0.0130	0.0268	0.0086	0.0023
		‘sigma’	0.3934	0.7600	0.7122	0.7975	0.6672	0.6798	0.2491
		SSE	2.8572	0.0547	0.0254	0.0442	0.4714	0.0623	9.7058
	logistic	‘mu’	0.0102	0.0157	−0.0041	−0.0057	0.0225	0.0129	0.0049
		‘sigma’	0.1836	0.4143	0.3930	0.4338	0.3550	0.3654	0.1151
		SSE	1.4026	0.0251	0.0100	0.0230	0.2635	0.0134	5.1727
	generalized extreme value	‘k’	−0.2296	−0.2292	−0.2837	−0.1937	−0.2190	−0.2342	−0.1907
		‘sigma’	0.4551	0.7775	0.7374	0.8335	0.6929	0.7220	0.2968
		‘mu’	−0.1474	−0.2726	−0.2567	−0.3324	−0.2290	−0.2491	−0.0988
		SSE	3.9163	0.0958	0.0530	0.1073	0.6244	0.1347	13.7061
	tlocationscale	‘mu’	0.0109	0.0157	−0.0029	−0.0047	0.0196	0.0138	0.0071
		‘sigma’	0.1674	0.6265	0.6173	0.6578	0.4909	0.5391	0.1082
		‘nu’	1.7131	6.0411	7.9591	6.2193	3.9390	5.2836	1.7880
		SSE	0.1603	0.0245	0.0093	0.0230	0.1997	0.0112	1.4037
Turbine 4	normal	‘mu’	−0.0005	−0.0024	−0.0012	−0.0028	−0.0037	−0.0009	0.0010
		‘sigma’	0.0826	0.1415	0.1244	0.1304	0.1329	0.1403	0.0452
		SSE	48.5762	19.0430	25.0445	23.7055	29.2852	22.7525	36.4974
	logistic	‘mu’	0.0002	−0.0002	−0.0008	−0.0023	−0.0024	0.0008	0.0011
		‘sigma’	0.0369	0.0680	0.0571	0.0627	0.0602	0.0663	0.0240
		SSE	13.5334	8.6780	9.6383	12.1976	12.6268	10.9773	33.0165
	generalized extreme value	‘k’	−0.1488	−0.2666	−0.2168	−0.1418	−0.2102	−0.1654	−0.1932
		‘sigma’	0.1008	0.1632	0.1498	0.1472	0.1673	0.1634	0.0473
		‘mu’	−0.0350	−0.0541	−0.0498	−0.0580	−0.0561	−0.0591	−0.0167
		SSE	85.7774	27.0634	38.3722	34.1444	44.6973	33.4475	61.6511
	tlocationscale	‘mu’	0.0014	0.0016	0.0006	0.0002	−0.0001	0.0035	0.0011
		‘sigma’	0.0398	0.0701	0.0561	0.0618	0.0553	0.0632	0.0351
		‘nu’	2.2681	2.0201	1.9176	1.8891	1.7513	1.8106	5.1360
		SSE	7.6752	1.1662	1.0650	2.9071	1.0454	1.5362	36.4651
Turbine 5	normal	‘mu’	−0.0086	−0.0080	−0.0076	−0.0079	−0.0104	−0.0113	−0.0013
		‘sigma’	0.1656	0.2200	0.2354	0.2457	0.2388	0.2550	0.0923
		SSE	36.8238	17.7037	10.3676	10.8713	4.8369	12.6825	31.4714
	logistic	‘mu’	−0.0040	−0.0072	−0.0082	−0.0082	−0.0082	−0.0091	−0.0012
		‘sigma’	0.0592	0.0949	0.1075	0.1111	0.1025	0.1183	0.0343
		SSE	14.7899	9.5452	5.1610	5.6601	1.6649	7.9237	5.1605
	generalized extreme value	‘k’	−0.1141	−0.1097	−0.1397	−0.1112	−0.1733	−0.1687	−0.0567
		‘sigma’	0.2331	0.2430	0.2670	0.2689	0.3552	0.3180	0.1090
		‘mu’	−0.0797	−0.0997	−0.1056	−0.1113	−0.1105	−0.1181	−0.0391
		SSE	51.4102	22.3621	14.2799	14.3606	9.0344	17.2393	47.7776
	tlocationscale	‘mu’	0.0003	0.0004	−0.0047	−0.0033	−0.0045	−0.0068	−0.0002
		‘sigma’	0.0420	0.0715	0.0982	0.1010	0.0884	0.1003	0.0362
		‘nu’	1.4376	1.4075	1.7277	1.7181	1.6342	1.5550	2.3046
		SSE	0.1742	0.6407	0.9189	1.1361	0.1456	1.9040	0.2898
Turbine 6	normal	‘mu’	−0.0059	−0.0100	−0.0099	−0.0088	−0.0119	0.0002	−0.0011
		‘sigma’	0.1343	0.1889	0.1788	0.2056	0.1860	0.2007	0.0695
		SSE	36.4818	9.4050	15.2219	17.5967	14.5273	10.4848	44.0969
	logistic	‘mu’	−0.0014	−0.0044	−0.0060	−0.0063	−0.0048	0.0006	0.0005
		‘sigma’	0.0515	0.0897	0.0820	0.0995	0.0852	0.0949	0.0316
		SSE	14.0928	4.4375	7.7534	10.0849	7.5479	5.4956	9.3376
	generalized extreme value	‘k’	−0.1777	−0.2106	−0.1901	−0.2204	−0.2112	−0.2020	−0.2341
		‘sigma’	0.1924	0.2202	0.2093	0.2246	0.2308	0.2358	0.0884
		‘mu’	−0.0647	−0.0854	−0.0831	−0.0875	−0.0872	−0.0787	−0.0283
		SSE	53.5973	13.8068	20.7958	22.2693	20.7974	14.7346	86.6710
	tlocationscale	‘mu’	0.0000	−0.0016	−0.0039	−0.0031	−0.0015	0.0006	0.0005
		‘sigma’	0.0398	0.0886	0.0691	0.0945	0.0737	0.0885	0.0353
		‘nu’	1.5459	1.9040	1.5669	1.7844	1.6105	1.7506	2.4024
		SSE	0.3054	0.6006	0.4354	2.2271	0.8118	0.7631	5.4124

References

Global Wind Energy Council. Global Wind Report 2024; Global Wind Energy Council (GWEC): Brussels, Belgium, 2024. [Google Scholar]
Yang, D.Z.; Wang, W.T.; Hong, T. A historical weather forecast dataset from the European centre for medium-range weather forecasts (ecmwf) for energy forecasting. Sol. Energy 2022, 232, 263–274. [Google Scholar] [CrossRef]
Alsamamra, H.R.; Salah, S.; Shoqeir, J.H. Performance analysis of arima model for wind speed forecasting in Jerusalem, Palestine. Energy Explor. Exploit. 2024, 42, 1727–1746. [Google Scholar] [CrossRef]
Dupuy, F.; Durand, P.; Hedde, T. Downscaling of surface wind forecasts using convolutional neural networks. Nonlin. Process. Geophys. 2023, 30, 553–570. [Google Scholar] [CrossRef]
Daenens, S.; Verstraeten, T.; Daems, P.J.; Nowé, A.; Helsen, J. Spatio-temporal graph neural networks for power prediction in offshore wind farms using scada data. Wind Energy Sci. 2025, 10, 1137–1152. [Google Scholar] [CrossRef]
Liao, W.; Fang, J.; Ye, L.; Bak-Jensen, B.; Yang, Z.; Porte-Agel, F. Can we trust explainable artificial intelligence in wind power forecasting? Appl. Energy 2024, 376, 124273. [Google Scholar] [CrossRef]
Liu, Z.; Guo, H.; Zhang, Y.; Zuo, Z. A comprehensive review of wind power prediction based on machine learning: Models, applications, and challenges. Energies 2025, 18, 350. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, F.; Kou, H.; Zou, R.; Hu, Q.; Wang, J.; Srinivasan, D. A review of predictive uncertainty modeling techniques and evaluation metrics in probabilistic wind speed and wind power forecasting. Appl. Energy 2025, 396, 126234. [Google Scholar] [CrossRef]
Zhao, Y.; Liao, H.; Zhao, Y.; Pan, S. Data-augmented trend-fluctuation representations by interpretable contrastive learning for wind power forecasting. Appl. Energy 2025, 380, 125052. [Google Scholar] [CrossRef]
Li, J.; Jia, L.; Zhou, C. Deep fuzzy inference system fused with probability density function control for wind power forecasting with asymmetric error distribution. J. Clean. Prod. 2025, 511, 145590. [Google Scholar] [CrossRef]
Chen, H.; Jiang, X.; Hui, H.; Zhang, K.; Meng, W.; Cheynet, E. Enhancing probabilistic wind speed forecasting by integrating self-adaptive bayesian wavelet denoising with deep gaussian process regression under uncertainties. Renew. Energy 2026, 256, 123966. [Google Scholar] [CrossRef]
He, Y.Y.; Yu, N.N.; Wang, B. Online probability density prediction of wind power considering virtual and real concept drift detection. Appl. Energy 2025, 396, 126318. [Google Scholar] [CrossRef]
Gao, J.; Cheng, Y.; Zhang, D.; Chen, Y. Physics-constrained wind power forecasting aligned with probability distributions for noise-resilient deep learning☆. Appl. Energy 2025, 383, 125295. [Google Scholar] [CrossRef]
Huang, H.; Xue, S.; Zhao, L.; Wang, W.; Wu, H. Privacy-preserving smart energy management by consumer-electronic chips and federated learning. IEEE Trans. Consum. Electron. 2024, 70, 2200–2209. [Google Scholar] [CrossRef]
Wang, X.R.; Zhou, Y.Z. Privacy-preserving probabilistic wind power forecasting: An adaptive federated approach. Appl. Energy 2025, 396, 126177. [Google Scholar] [CrossRef]
Nayak, A.K.; Sharma, K.C.; Bhakar, R.; Tiwari, H. Probabilistic online learning framework for short-term wind power forecasting using ensemble bagging regression model. Energy Convers. Manag. 2025, 323, 119142. [Google Scholar] [CrossRef]
Leal, J.I.; Pitombeira-Neto, A.R.; Bueno, A.V.; Rocha, P.A.C.; de Andrade, C.F. Probabilistic wind speed forecasting via bayesian dlms and its application in green hydrogen production. Appl. Energy 2025, 382, 125286. [Google Scholar] [CrossRef]
Chen, H. A novel wind model downscaling with statistical regression and forecast for the cleaner energy. J. Clean. Prod. 2024, 434, 140217. [Google Scholar] [CrossRef]
Zhao, L.; Liu, C.; Yang, C.; Liu, S.; Zhang, Y.; Li, Y. A location-centric transformer framework for multi-location short-term wind speed forecasting. Energy Convers. Manag. 2025, 328, 119627. [Google Scholar] [CrossRef]
Ma, J.; Du, J.; Chen, Q.; Jiang, X.; Pan, L. Multi-feature extraction spatio-temporal interaction graph network for wind speed forecasting in windfarm. Energy 2025, 333, 137229. [Google Scholar] [CrossRef]
Resifi, S.; Al Aawar, E.; Dasari, H.P.; Jebari, H.; Hoteit, I. A novel deep learning approach for regional high-resolution spatio-temporal wind speed forecasting for energy applications. Energy 2025, 328, 136356. [Google Scholar] [CrossRef]
Zhang, Z.G.; Yin, J.C. Incremental principal component analysis based depthwise separable unet model for complex wind system forecasting. Energy 2025, 334, 137751. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, Y.; Liu, K.; Zhao, C. Multi-step prediction of spatio-temporal wind speed based on the multimodal coupled st-dfnet model. Energy 2025, 334, 137670. [Google Scholar] [CrossRef]
Hu, D.; He, F.; Fan, W.; Feng, W. Dbann: Dual-branch attention neural networks with hierarchical spatiotemporal-perception for multi-node offshore wind power forecasting. Energy 2025, 334, 137521. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, Y.; Liao, H.; Pan, S.; Zheng, Y. Interpreting lasso regression model by feature space matching analysis for spatio-temporal correlation based wind power forecasting. Appl. Energy 2025, 380, 124954. [Google Scholar] [CrossRef]
Verdone, A.; Panella, M.; De Santis, E.; Rizzi, A. A review of solar and wind energy forecasting: From single-site to multi-site paradigm. Appl. Energy 2025, 392, 126016. [Google Scholar] [CrossRef]
Wan, H.; Wang, J.; Gan, Q.; Xia, Y.; Chang, Y.; Yan, H. Addressing intermittency in medium-term photovoltaic and wind power forecasting using a hybrid xlstm-tccnn model with numerical weather predictions. Renew. Energy 2025, 253, 123618. [Google Scholar] [CrossRef]
Ignatev, E.; Deriugina, G.; Suslov, K.; Balaban, G. Development of a hybrid model for medium-term wind farm power output forecasting. Renew. Energy 2025, 249, 123200. [Google Scholar] [CrossRef]
Michalakopoulos, V.; Zakynthinos, A.; Sarmas, E.; Marinakis, V.; Askounis, D. Hybrid short-term wind power forecasting model using theoretical power curves and temporal fusion transformers. Renew. Energy 2026, 256, 124008. [Google Scholar] [CrossRef]
Xu, X.; Cao, Q.; Deng, R.; Guo, Z.; Chen, Y.; Yan, J. A cross-dataset benchmark for neural network-based wind power forecasting. Renew. Energy 2025, 254, 123463. [Google Scholar] [CrossRef]
Dong, Y.; Zhou, B.; Zhang, H.; Yang, G.; Ma, S. A deep time-frequency augmented wind power forecasting model. Renew. Energy 2026, 256, 123550. [Google Scholar] [CrossRef]
Ullah, S.; Chen, X.; Han, H.; Wu, J.; Dong, J.; Liu, R.; Ding, W.; Liu, M.; Li, Q.; Qi, H.; et al. A novel hybrid ensemble approach for wind speed forecasting with dual-stage decomposition strategy using optimized gru and transformer models. Energy 2025, 329, 136739. [Google Scholar] [CrossRef]
Li, S.; Guo, L.; Zhu, J.; Liu, M.; Chen, J.; Meng, Z. Short-term multi-step wind speed forecasting with multi-feature inputs using variational mode decomposition, a novel artificial intelligence network, and the polar lights optimizer. Renew. Energy 2026, 256, 123965. [Google Scholar] [CrossRef]
Ma, C.; Zhang, C.; Yao, J.; Zhang, X.; Nazir, M.S.; Peng, T. Enhancement of wind speed forecasting using optimized decomposition technique, entropy-based reconstruction, and evolutionary patchtst. Energy Convers. Manag. 2025, 333, 119819. [Google Scholar] [CrossRef]
Wu, X.; Wang, D.; Yang, M.; Liang, C. Ceemdan-se-hdbscan-vmd-tcn-bigru: A two-stage decomposition-based parallel model for multi-altitude ultra-short-term wind speed forecasting. Energy 2025, 330, 136660. [Google Scholar] [CrossRef]
Hong, J.T.; Han, S.; Yan, J.; Liu, Y.Q. Dual-path frequency mamba-transformer model for wind power forecasting. Energy 2025, 332, 137225. [Google Scholar] [CrossRef]
Wu, B.; Lin, J.; Liu, R.; Wang, L. A multi-dimensional interpretable wind speed forecasting model with two-stage feature exploring. Renew. Energy 2026, 256, 124028. [Google Scholar] [CrossRef]
Zeng, H.; Wu, B.; Fang, H.; Lin, J. Interpretable wind speed forecasting through two-stage decomposition with comprehensive relative importance analysis. Appl. Energy 2025, 392, 126015. [Google Scholar] [CrossRef]
Xu, R.; Fang, H.; Zeng, H.; Wu, B. A novel interpretable wind speed forecasting based on the multivariate variational mode decomposition and temporal fusion transformer. Energy 2025, 331, 136497. [Google Scholar] [CrossRef]
Liang, B.J.; Tian, Z.R. Isi net: A novel paradigm integrating interpretability and intelligent selection in ensemble learning for accurate wind power forecasting. Energy Convers. Manag. 2025, 332, 119752. [Google Scholar] [CrossRef]
Wang, Q.; Xu, F.; He, J.; Luo, K.; Fan, J. A new fusion model for enhanced ultra-short-term offshore wind power forecasting. Renew. Energy 2026, 256. [Google Scholar] [CrossRef]
Li, M.; Zhang, K.; Kou, M.; Ma, Y. An offshore wind speed forecasting system based on feature enhancement, deep time series clustering, and extended lstm. Energy 2025, 333, 137335. [Google Scholar] [CrossRef]
Cui, X.; Yu, X.; Niu, H.; Niu, D.; Liu, D. A novel data-driven multi-step wind power point-interval prediction framework integrating sliding window-based two-layer adaptive decomposition and multi-objective optimization for balancing prediction accuracy and stability. Appl. Energy 2025, 397, 126348. [Google Scholar] [CrossRef]
He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl.-Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
Zöller, M.A.; Huber, M.F. Benchmark and survey of automated machine learning frameworks. J. Artif. Intell. Res. 2021, 70, 409–472. [Google Scholar] [CrossRef]
Zhou, Y.; Wu, X.; Wu, J.; Feng, L.; Tan, K.C. HM3: Hierarchical multi-objective model merging for pretrained models. arXiv 2024, arXiv:2409.18893. [Google Scholar] [CrossRef]
Telikani, A.; Tahmassebi, A.; Banzhaf, W.; Gandomi, A.H. Evolutionary machine learning: A survey. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]
Cerqueira, V.; Torgo, L.; Pinto, F.; Soares, C. Arbitrated ensembles for time series forecasting in online learning scenarios. Expert Syst. Appl. 2019, 118, 271–282. [Google Scholar]
Jiang, W.; Liu, B.; Liang, Y.; Gao, H.; Lin, P.; Zhang, D.; Hu, G. Applicability analysis of transformer to wind speed forecasting by a novel deep learning framework with multiple atmospheric variables. Appl. Energy 2024, 353, 122155. [Google Scholar] [CrossRef]
Li, W.; Li, Y.; Garg, A.; Gao, L. Enhancing real-time degradation prediction of lithium-ion battery: A digital twin framework with cnn-lstm-attention model. Energy 2024, 286, 129681. [Google Scholar] [CrossRef]
Liu, X.; Zhang, L.; Wang, J.; Zhou, Y.; Gan, W. A unified multi-step wind speed forecasting framework based on numerical weather prediction grids and wind farm monitoring data. Renew. Energy 2023, 211, 948–963. [Google Scholar] [CrossRef]
Su, T.S.; Weng, X.Y.; Vincent, F.Y.; Wu, C.C. Optimal maintenance planning for offshore wind farms under an uncertain environment. Ocean. Eng. 2023, 283, 115033. [Google Scholar] [CrossRef]
Semmelmann, L.; Henni, S.; Weinhardt, C. Load forecasting for energy communities: A novel lstm-xgboost hybrid model based on smart meter data. Energy Inform. 2022, 5, 24. [Google Scholar] [CrossRef]
Hedegaard, L.; Heidari, N.; Iosifidis, A. Continual spatio-temporal graph convolutional networks. Pattern Recognit. 2023, 140, 109528. [Google Scholar] [CrossRef]
Huang, H.; Lin, L.; Zhao, L.; Ding, S.; Huang, H. Time series focused neural network for accurate wireless human gesture recognition. IEEE Trans. Netw. Sci. Eng. 2025, 113, 118–129. [Google Scholar] [CrossRef]
Goldstein, S. On the vortex theory of screw propeller. Proc. R. Soc. A Math. Phys. Eng. Sci. 1929, 123, 440–465. [Google Scholar]
McKight, P.E.; Najab, J. Kruskal-wallis test. In The Corsini Encyclopedia of Psychology; John Wiley & Sons: Haboken, NJ, USA, 2010; Volume 4. [Google Scholar]
Xiaojia, H.; Wang, C.; Zhang, S. Research and application of a Model selection forecasting system for wind speed and theoretical power generation in wind farms based on classification and wind conversion. Energy 2024, 293, 130606. [Google Scholar] [CrossRef]
Tedeschi, P.; Sciancalepore, S.; Pietro, R. Modelling a communication channel under jamming: Experimental model and applications. In Proceedings of the Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), New York, NY, USA, 30 September–3 October 2021. [Google Scholar]

Figure 1. Global wind energy development map based on installed capacity.

Figure 2. Flowchart Illustrating the Forecasting System Framework. (First, multi-turbine operational data are collected and pre-processed for model pre-training. The models are then retrained using the datasets corresponding to their optimal performance to enhance adaptability and robustness. A classification-based mechanism is developed to adaptively select the most suitable forecasting model at each time step. The trained system generates short-term wind speed forecasts, which are converted into theoretical power outputs through a physical wind speed–power curve. The resulting power forecasting support wind farm dispatching and grid operation. Finally, uncertainty analysis is conducted to assess the effects of environmental variability and model error, thereby improving the framework’s reliability and practical applicability).

Figure 3. The statistical analysis of wind speed and corresponding power output for each turbine is presented, offering a comprehensive evaluation across six turbines. The wind speed data were accurately fitted by a Gamma distribution, with determination coefficients (R²) from 0.970 to 0.976, demonstrating an excellent goodness of fit. The derived shape (1.90–2.23) and scale (2.37–2.97) parameters, together with mean values around 5 m/s and standard deviations of 3.6–4.0 m/s, reveal a pronounced right-skew and heavy-tailed behavior. These results confirm that the Gamma distribution provides an appropriate statistical representation of both the stochastic fluctuations and extreme events inherent in wind speed dynamics.

Figure 4. Classification-Based Model Selection and Hybrid Model Forecast Results.

Figure 5. Forecasting Performance and Model Selection Accuracy Across Individual Turbines.

Figure 6. (A) Forecast-Driven Wind Energy Conversion Across Individual Turbines. (B) Forecast-Driven Wind Energy Conversion Across Individual Turbines.

Figure 7. (A) Wind Speed Prediction Intervals Derived from Each Model for Individual Turbines. (B) Wind Speed Prediction Intervals Derived from Each Model for Individual Turbines.

Table 1. Evaluation Metrics for the Comparative Analysis of Uncertainty Forecasting.

Metric	Definition	Mathematical Expression
Upper Bound	Upper bounds of the wind speed forecasting value	$U (i) = F (i) + \frac{K_{1 - 0.5 * α} \times σ}{\sqrt{N}}$
Lower Bound	Lower bounds of the wind speed forecasting value	$L (i) = F (i) - \frac{K_{1 - 0.5 * α} \times σ}{\sqrt{N}}$
FICP	Forecast interval coverage probability of testing dataset	$FICP = \frac{1}{N} \sum_{i = 1}^{N} c_{i} \times 100 %$
FINAW	Forecast interval normalized average width of testing dataset	$FINAW = \frac{1}{N R} \sum_{i = 1}^{N} (U (i) - L (i))$
AWD_i	Accumulated width deviation of testing sample i	${AWD}_{i} = \{\begin{matrix} \begin{matrix} (L (i) - {A u}_{i}) / (U (i) - L (i)), & {A u}_{i} < L (i) \end{matrix} \\ \begin{matrix} 0, & {A u}_{i} \in [L (i), U (i)] \end{matrix} \\ \begin{matrix} ({A u}_{i} - U (i)) / (U (i) - L (i)), & {A u}_{i} > \end{matrix} U (i) \end{matrix}$
AWD	Accumulated width deviation of testing dataset	$FINAW = \frac{1}{N R} \sum_{i = 1}^{N} {AWD}_{i}$

Note: At each point i, the forecasted value is denoted by F (i). The parameters K and σ define the quantile and scale of the t-Location Scale distribution function.

N

represents the total forecast length, while NR quantifies the range between the observed maximum and minimum values. The binary indicator ci takes the value 1 if the actual value

{A u}_{i}

falls within the interval

[L (i), U (i)]

, and 0 otherwise.

Table 2. Evaluation Metrics for the Assessment of Wind Speed Forecasting.

Index	Description	Mathematical Expression
MAE	MAE measures the average magnitude of forecasting errors without considering their direction.	$MAE = \frac{1}{N} \sum_{i = 1}^{N} \|{\hat{y}}_{i} - y_{i}\|$
RMSE	Evaluates the square root of the mean squared error; more sensitive to large deviations and outliers.	$RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} \|{\hat{y}}_{i} - y_{i}\|}$
MAPE	MAPE expresses forecasting accuracy as a percentage; lower MAPE means higher accuracy.	$MAPE = \frac{1}{N} \sum_{i = 1}^{N} \|\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}\| \times 100 %$
STDAPE	The Standard Deviation of the Absolute Percentage Error is used to measure the stability of a forecasting model.	$STDAPE = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(\|\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}\| - \frac{1}{N} \sum_{i = 1}^{N} \|\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}\|)}^{2}} \times 100 %$
DA	Measures the ability of the model to correctly forecast the direction (increase/decrease) of changes. Higher DA means better trend following capability.	$DA = \frac{1}{N - 1} \sum_{i = 1}^{N} a_{i}, a_{i} = \{\begin{matrix} 1, if (y_{i + 1} - {\hat{y}}_{i}) ({\hat{y}}_{i + 1} - y_{i}) \\ 0, otherwise \end{matrix}$
TIC	This indicator quantifies the relative accuracy of forecasts, with values ranging from 0 to 1; smaller values correspond to higher predictive accuracy.	$TIC = \frac{\sqrt{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}}{\sqrt{\sum_{i = 1}^{N} {(y_{i})}^{2}}}$
R²	Represents the proportion of variance in observations explained by the model. Values close to 1 indicate strong predictive correlation.	$R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\bar{y}}_{i})}^{2}}$
ACC	Accuracy (ACC) is a widely used measure in machine learning for assessing the effectiveness of a classification model	$ACC = \frac{TP + TN}{TP + TN + FP + FN}$

Note: For each observation,

y_{i}

corresponds to the true value, while

\hat{y_{u}}

represents the predicted value from a given model. N indicates the number of forecasting samples, and the accuracy of model selection is assessed using the ACC metric.

Table 3. Initial Experiment Outcomes of Each Model on Various Wind Turbines.

NO	Model	MAE	RMSE	STDAPE	DA	TIC	MAPE	R²
Turbine #1	CNN-LSTM	0.1382	0.1936	17.30%	73.88%	0.0164	4.12%	0.9587
	Transformers	0.1515	0.2055	26.45%	73.86%	0.0177	4.97%	0.9487
	TCN-LSTM	0.1505	0.1877	17.61%	67.12%	0.0155	5.48%	0.9561
	CNN-GRU	0.1018	0.1554	21.51%	70.56%	0.0159	3.88%	0.9660
	LSTM-XGBoost	0.1077	0.1547	24.47%	76.17%	0.0151	4.20%	0.9652
	GNN-TCN	0.1112	0.1661	25.66%	76.73%	0.0168	4.55%	0.9602
Turbine #2	CNN-LSTM	0.1429	0.2100	9.95%	74.84%	0.0170	4.09%	0.9555
	Transformers	0.1554	0.2138	13.23%	74.37%	0.0171	4.70%	0.9498
	TCN-LSTM	0.1430	0.2004	20.89%	74.45%	0.0169	5.63%	0.9591
	CNN-GRU	0.1711	0.2384	16.19%	72.34%	0.0197	5.57%	0.9396
	LSTM-XGBoost	0.1128	0.1654	21.29%	74.67%	0.0145	4.82%	0.9703
	GNN-TCN	0.1309	0.1843	22.36%	74.20%	0.0173	5.73%	0.9590
Turbine #3	CNN-LSTM	0.1256	0.1776	9.18%	75.08%	0.0160	4.30%	0.9599
	Transformers	0.1380	0.1792	14.59%	72.47%	0.0150	4.72%	0.9586
	TCN-LSTM	0.1469	0.2064	11.85%	69.22%	0.0180	4.80%	0.9600
	CNN-GRU	0.1471	0.1897	21.17%	74.50%	0.0166	5.21%	0.9488
	LSTM-XGBoost	0.1315	0.1888	13.52%	76.71%	0.0166	4.47%	0.9590
	GNN-TCN	0.1336	0.1862	9.83%	75.81%	0.0155	4.56%	0.9571
Turbine #4	CNN-LSTM	0.1251	0.1811	14.52%	74.27%	0.0156	4.10%	0.9597
	Transformers	0.1277	0.1690	14.79%	72.50%	0.0149	4.09%	0.9616
	TCN-LSTM	0.1341	0.1756	20.08%	66.78%	0.0158	5.14%	0.9595
	CNN-GRU	0.1438	0.2080	24.13%	73.07%	0.0183	5.44%	0.9460
	LSTM-XGBoost	0.0966	0.1528	14.49%	70.23%	0.0148	3.94%	0.9683
	GNN-TCN	0.0925	0.1387	21.73%	74.66%	0.0139	4.53%	0.9721
Turbine #5	CNN-LSTM	0.1259	0.1937	18.93%	70.85%	0.0163	4.67%	0.9629
	Transformers	0.1268	0.2032	21.29%	67.52%	0.0159	4.75%	0.9592
	TCN-LSTM	0.1374	0.2037	22.50%	68.06%	0.0163	5.51%	0.9562
	CNN-GRU	0.1371	0.1749	23.44%	70.66%	0.0144	6.12%	0.9671
	LSTM-XGBoost	0.1394	0.1967	17.50%	74.43%	0.0160	4.06%	0.9582
	GNN-TCN	0.1347	0.2103	17.55%	74.79%	0.0169	4.12%	0.9553
Turbine #6	CNN-LSTM	0.1088	0.1673	17.63%	73.94%	0.0142	4.31%	0.9665
	Transformers	0.1239	0.1633	15.84%	74.17%	0.0161	4.13%	0.9616
	TCN-LSTM	0.1146	0.1556	23.83%	73.49%	0.0144	4.91%	0.9689
	CNN-GRU	0.1300	0.1844	26.65%	74.57%	0.0165	5.31%	0.9577
	LSTM-XGBoost	0.1422	0.1887	18.60%	68.14%	0.0159	4.75%	0.9595
	GNN-TCN	0.1350	0.1932	14.70%	73.80%	0.0169	5.15%	0.9571

Table 4. Model Selection Outcomes of Each Model.

NO	Model	MAE	RMSE	STDAPE	DA	TIC	MAPE	R²	OP	SOP	SA
Turbine #1	CNN-LSTM	0.0690	0.0948	8.31%	88.66%	0.0079	2.15%	0.9861	1925	1821	94.59%
	Transformers	0.0768	0.1053	13.13%	88.64%	0.0088	2.50%	0.9852	2737	2547	93.05%
	TCN-LSTM	0.0748	0.0976	9.23%	80.54%	0.0082	2.62%	0.9802	2638	2492	94.48%
	CNN-GRU	0.0523	0.0772	10.75%	84.67%	0.0076	2.04%	0.9887	2139	2017	94.29%
	LSTM-XGBoost	0.0530	0.0779	12.26%	91.40%	0.0076	2.10%	0.9887	2031	1831	90.16%
	GNN-TCN	0.0574	0.0830	12.75%	92.07%	0.0081	2.34%	0.9860	2930	2668	91.04%
	MSFS	0.0650	0.1052	5.32%	93.83%	0.0088	1.71%	0.9934	14,400	13,376	92.89%
Turbine #2	CNN-LSTM	0.0686	0.1006	5.10%	89.81%	0.0084	2.12%	0.9896	2430	2228	91.69%
	Transformers	0.0748	0.1066	6.67%	89.24%	0.0089	2.44%	0.9891	1807	1683	93.13%
	TCN-LSTM	0.0724	0.0965	10.45%	89.34%	0.0081	2.77%	0.9803	1948	1754	90.03%
	CNN-GRU	0.0841	0.1204	8.02%	86.80%	0.0100	2.79%	0.9798	2432	2239	92.05%
	LSTM-XGBoost	0.0570	0.0811	10.31%	89.61%	0.0071	2.46%	0.9855	2764	2556	92.48%
	GNN-TCN	0.0669	0.0950	10.81%	89.03%	0.0084	2.85%	0.9790	3019	2850	94.40%
	MSFS	0.0607	0.1030	7.22%	90.25%	0.0086	1.66%	0.9919	14,400	13,309	92.43%
Turbine #3	CNN-LSTM	0.0653	0.0904	4.71%	90.09%	0.0078	2.09%	0.9899	2717	2575	94.78%
	Transformers	0.0693	0.0904	6.97%	86.97%	0.0078	2.40%	0.9886	1768	1613	91.26%
	TCN-LSTM	0.0726	0.1019	5.90%	83.06%	0.0088	2.44%	0.9865	2064	1944	94.20%
	CNN-GRU	0.0722	0.0935	10.87%	89.40%	0.0082	2.73%	0.9800	2004	1895	94.54%
	LSTM-XGBoost	0.0661	0.0908	6.55%	92.05%	0.0079	2.17%	0.9890	2655	2450	92.28%
	GNN-TCN	0.0662	0.0918	4.99%	90.97%	0.0079	2.18%	0.9888	3192	2948	92.36%
	MSFS	0.0608	0.0999	4.59%	93.02%	0.0086	1.64%	0.9954	14,400	13,426	93.23%
Turbine #4	CNN-LSTM	0.0653	0.0901	7.21%	89.12%	0.0078	2.11%	0.9895	1892	1788	94.51%
	Transformers	0.0638	0.0879	7.73%	87.01%	0.0076	2.13%	0.9883	2435	2250	92.39%
	TCN-LSTM	0.0696	0.0902	10.11%	80.13%	0.0078	2.70%	0.9821	1808	1658	91.71%
	CNN-GRU	0.0754	0.1041	12.09%	87.68%	0.0090	2.79%	0.9816	2170	2020	93.10%
	LSTM-XGBoost	0.0495	0.0745	7.44%	84.28%	0.0074	1.97%	0.9945	2536	2335	92.07%
	GNN-TCN	0.0484	0.0697	10.70%	89.59%	0.0069	2.20%	0.9860	3559	3335	93.69%
	MSFS	0.0573	0.0954	9.08%	91.77%	0.0083	1.74%	0.9950	14,400	13,386	92.96%
Turbine #5	CNN-LSTM	0.0633	0.0934	9.15%	85.02%	0.0079	2.33%	0.9853	2687	2441	90.84%
	Transformers	0.0660	0.0983	10.50%	81.03%	0.0083	2.41%	0.9829	2002	1892	94.51%
	TCN-LSTM	0.0700	0.1015	11.26%	81.68%	0.0086	2.67%	0.9792	1936	1746	90.17%
	CNN-GRU	0.0665	0.0881	11.45%	84.80%	0.0075	2.93%	0.9762	2099	1989	94.74%
	LSTM-XGBoost	0.0700	0.0974	8.34%	89.32%	0.0079	2.13%	0.9882	2517	2324	92.33%
	GNN-TCN	0.0705	0.1007	8.82%	89.75%	0.0082	2.16%	0.9871	3159	2859	90.52%
	MSFS	0.0577	0.0965	6.35%	91.95%	0.0082	1.73%	0.9906	14,400	13,251	92.02%
Turbine #6	CNN-LSTM	0.0562	0.0797	8.56%	88.73%	0.0073	2.09%	0.9895	2635	2381	90.37%
	Transformers	0.0595	0.0854	8.01%	89.01%	0.0078	2.14%	0.9889	1874	1690	90.19%
	TCN-LSTM	0.0583	0.0768	11.70%	88.18%	0.0071	2.53%	0.9865	1767	1642	92.90%
	CNN-GRU	0.0634	0.0893	13.02%	89.48%	0.0082	2.57%	0.9816	2229	2056	92.25%
	LSTM-XGBoost	0.0700	0.0974	9.01%	81.77%	0.0082	2.45%	0.9877	3326	3090	92.90%
	GNN-TCN	0.0708	0.1007	7.60%	88.56%	0.0085	2.47%	0.9871	2569	2322	90.38%
	MSFS	0.0495	0.0837	9.63%	92.68%	0.0077	1.60%	0.9911	14,400	13,181	91.53%

Table 5. Forecasting Accuracy of the Proposed MSFSC for Category II Wind Turbine Applications.

NO	Model	MAE	RMSE	STDAPE	DA	TIC	MAPE	R²	OP	SOP	SA
Turbine #1	CNN-LSTM	0.1382	0.2116	1.57%	89.94%	0.0128	1.66%	0.9953	169	152	90.21%
	Transformers	0.1145	0.1788	1.89%	85.16%	0.0144	1.99%	0.9959	128	118	92.95%
	TCN-LSTM	0.1350	0.2145	3.87%	88.68%	0.0276	3.55%	0.9609	159	143	90.29%
	CNN-GRU	0.1101	0.1742	2.87%	90.70%	0.0162	2.63%	0.9923	172	154	90.00%
	LSTM-XGBoost	0.0636	0.1079	1.82%	85.96%	0.0120	1.56%	0.9937	178	164	92.42%
	GNN-TCN	0.1269	0.1880	3.05%	94.06%	0.0198	2.87%	0.9776	202	190	94.42%
	MSFS	0.1147	0.1834	2.96%	89.38%	0.0176	2.55%	0.9908	1008	921	91.37%
Turbine #2	CNN-LSTM	0.2286	0.3823	2.88%	89.68%	0.0224	2.53%	0.9802	126	115	91.34%
	Transformers	0.0724	0.1345	2.22%	94.12%	0.0130	1.66%	0.9953	170	161	94.97%
	TCN-LSTM	0.1258	0.1917	3.43%	91.88%	0.0244	3.44%	0.9817	160	149	93.23%
	CNN-GRU	0.1512	0.2272	3.89%	89.43%	0.0246	3.72%	0.9644	123	113	91.96%
	LSTM-XGBoost	0.1211	0.1834	3.35%	85.86%	0.0217	3.20%	0.9785	191	177	92.95%
	GNN-TCN	0.1444	0.2141	3.22%	92.44%	0.0244	3.38%	0.9772	238	216	90.94%
	MSFS	0.1394	0.2244	3.28%	90.67%	0.0225	3.15%	0.9850	1008	931	92.36%
Turbine #3	CNN-LSTM	0.1703	0.3118	2.51%	93.72%	0.0221	2.12%	0.9862	191	172	90.05%
	Transformers	0.1287	0.2448	5.28%	87.74%	0.0273	3.33%	0.9721	155	140	90.76%
	TCN-LSTM	0.1269	0.2341	3.96%	88.59%	0.0320	3.28%	0.9442	149	141	94.85%
	CNN-GRU	0.1521	0.2310	3.52%	91.13%	0.0229	3.42%	0.9759	124	114	92.56%
	LSTM-XGBoost	0.1184	0.1889	3.13%	90.76%	0.0229	2.87%	0.9789	184	173	94.52%
	GNN-TCN	0.1125	0.1709	2.99%	93.17%	0.0214	2.95%	0.9791	205	192	93.68%
	MSFS	0.1292	0.2212	3.50%	91.07%	0.0239	2.97%	0.9802	1008	932	92.46%
Turbine #4	CNN-LSTM	0.1302	0.2816	2.43%	90.45%	0.0234	1.82%	0.9887	199	184	92.56%
	Transformers	0.1904	0.2836	5.22%	90.43%	0.0322	4.78%	0.9575	188	174	92.64%
	TCN-LSTM	0.1105	0.1771	2.90%	90.40%	0.0184	2.55%	0.9898	198	178	90.40%
	CNN-GRU	0.1512	0.2556	4.43%	92.44%	0.0250	3.44%	0.9832	119	109	91.83%
	LSTM-XGBoost	0.1095	0.1868	3.69%	86.21%	0.0231	2.86%	0.9695	145	136	93.85%
	GNN-TCN	0.1540	0.2471	4.05%	86.16%	0.0359	4.01%	0.7544	159	148	93.71%
	MSFS	0.1419	0.2401	4.06%	89.38%	0.0261	3.34%	0.9767	1008	929	92.16%
Turbine #5	CNN-LSTM	0.1436	0.2565	2.12%	89.76%	0.0132	1.57%	0.9931	166	154	93.11%
	Transformers	0.1027	0.2091	3.12%	85.38%	0.0171	2.13%	0.9922	171	162	94.91%
	TCN-LSTM	0.1177	0.1944	3.57%	90.28%	0.0191	2.90%	0.9907	216	194	90.06%
	CNN-GRU	0.1535	0.2340	2.67%	85.85%	0.0208	2.93%	0.9854	106	95	90.26%
	LSTM-XGBoost	0.1059	0.1492	2.78%	93.22%	0.0182	2.86%	0.9856	177	164	92.74%
	GNN-TCN	0.1049	0.1503	3.06%	90.70%	0.0241	3.25%	0.8206	172	155	90.48%
	MSFS	0.1197	0.1932	2.99%	89.48%	0.0178	2.77%	0.9920	1008	924	91.67%
Turbine #6	CNN-LSTM	0.1464	0.2681	2.06%	93.87%	0.0170	1.63%	0.9889	163	153	94.36%
	Transformers	0.0808	0.1198	2.60%	91.95%	0.0123	2.34%	0.9950	174	162	93.34%
	TCN-LSTM	0.0760	0.1153	2.35%	93.21%	0.0106	1.89%	0.9960	162	146	90.18%
	CNN-GRU	0.1332	0.2111	3.83%	85.42%	0.0219	3.31%	0.9884	144	132	92.21%
	LSTM-XGBoost	0.0727	0.1157	2.97%	86.21%	0.0146	2.24%	0.9914	174	158	90.86%
	GNN-TCN	0.1218	0.1922	2.85%	90.58%	0.0212	2.97%	0.9872	191	172	90.15%
	MSFS	0.1052	0.1774	2.93%	90.28%	0.0174	2.51%	0.9912	1008	923	91.57%

Table 6. Analysis of Power Curve and Technical Parameters of the Wind Turbine.

Power Curve with Standard Air Density
Standard Power Curve	$P (v_{i}) = \frac{1}{2} ρ (T, H) A v_{i}^{3} C_{p} (v_{i})$
Power Coefficient	$C_{p} (v_{i}) = \frac{0.3801 v_{i}^{3} - 1.924 v_{i}^{2} + 33.27 v_{i} - 418.5}{v_{i}^{4} - 26.24 v_{i}^{3} + 267.1 v_{i}^{2} - 1274 v_{i} + 2949}$
Actual Power Curve for Wind Turbine
Air Density Correction Function	$ρ (T, H) = \frac{353.1 e^{\frac{- 342 H}{T}}}{T}$
Actual Power Coefficient Curve (Low bound)	$C_{p} (v_{i}) = \frac{0.6313 v_{i}^{3} - 12.69 v_{i}^{2} + 131.4 v_{i} - 250.8}{v_{i}^{4} - 22.99 v_{i}^{3} + 187.3 v_{i}^{2} - 542.8 v_{i} + 624.1}$
Actual Power Coefficient Curve (Standard)	$C_{p} (v_{i}) = \frac{0.5466 v_{i}^{3} - 10.64 v_{i}^{2} + 121.43 v_{i} - 237.0}{v_{i}^{4} - 23.08 v_{i}^{3} + 191.2 v_{i}^{2} - 547.1 v_{i} + 629.4}$
Actual Power Coefficient Curve (Upper bound)	$C_{p} (v_{i}) = \frac{0.5689 v_{i}^{3} - 10.93 v_{i}^{2} + 124.8 v_{i} - 258.1}{v_{i}^{4} - 22.41 v_{i}^{3} + 176.9 v_{i}^{2} - 414.1 v_{i} + 326.5}$

Table 7. Parameter of wind turbine.

Type	SL1500/82
Manufacturer	Sinovel
Rated Wind Speed	10.5 m/s
Rotor Diameter	82 m
Hub Height	70 m
Cut-In Wind Speed	3 m/s
Cut-Out Wind Speed	22 m/s

Table 8. Assessment of Wind Conversion Efficiency by Each Model and the MSFSC.

NO	Model	NRMD	RMSE	R²	MPE	NO	Model	RMD	RMSE	R²	MPE
Turbine #1	CNN-LSTM	4.778%	71.5522	0.9928	−4.760%	Turbine #4	CNN-LSTM	4.155%	61.2387	0.9958	−4.042%
	Transformers	4.973%	75.5793	0.9936	−5.434%		Transformers	4.561%	68.8929	0.9949	−4.371%
	TCN-LSTM	4.984%	78.0304	0.9943	−5.699%		TCN-LSTM	4.467%	66.3624	0.9968	−4.253%
	CNN-GRU	4.802%	72.2187	0.9920	−5.348%		CNN-GRU	4.400%	64.7776	0.9959	−4.186%
	LSTM-XGBoost	4.886%	75.4574	0.9937	−5.413%		LSTM-XGBoost	4.377%	64.5617	0.9952	−4.176%
	GNN-TCN	4.813%	72.8123	0.9930	−5.379%		GNN-TCN	4.361%	64.3825	0.9960	−4.056%
	MSFS	4.438%	67.6901	0.9957	−3.823%		MSFS	4.037%	58.9629	0.9973	−3.253%
Turbine #2	CNN-LSTM	4.820%	71.2050	0.9932	−4.725%	Turbine #5	CNN-LSTM	3.960%	57.1472	0.9975	−4.252%
	Transformers	4.973%	74.1480	0.9934	−5.530%		Transformers	4.006%	57.8198	0.9971	−4.276%
	TCN-LSTM	4.968%	73.3896	0.9936	−5.428%		TCN-LSTM	4.070%	58.4351	0.9973	−4.351%
	CNN-GRU	4.737%	68.7901	0.9938	−4.542%		CNN-GRU	3.823%	54.7854	0.9978	−4.194%
	LSTM-XGBoost	5.038%	74.9899	0.9940	−5.598%		LSTM-XGBoost	3.980%	57.3913	0.9973	−4.266%
	GNN-TCN	4.829%	71.5200	0.9948	−4.926%		GNN-TCN	3.928%	56.7533	0.9974	−4.228%
	MSFS	4.262%	64.6974	0.9961	−3.329%		MSFS	3.808%	54.6707	0.9979	−3.253%
Turbine #3	CNN-LSTM	4.186%	59.5443	0.9974	−4.649%	Turbine #6	CNN-LSTM	7.737%	115.4382	0.9836	−3.901%
	Transformers	4.228%	59.8509	0.9978	−4.708%		Transformers	7.901%	117.7301	0.9931	−5.837%
	TCN-LSTM	4.167%	59.3860	0.9974	−4.577%		TCN-LSTM	7.704%	114.5872	0.9828	−3.253%
	CNN-GRU	4.122%	58.6405	0.9976	−4.545%		CNN-GRU	6.092%	113.6973	0.9819	−2.887%
	LSTM-XGBoost	4.213%	59.6998	0.9976	−4.707%		LSTM-XGBoost	7.754%	115.7431	0.9903	−5.735%
	GNN-TCN	4.117%	58.6096	0.9975	−4.452%		GNN-TCN	7.613%	113.7532	0.9828	−2.643%
	MSFS	4.067%	57.8520	0.9979	−3.897%		MSFS	5.458%	83.5771	0.9828	−1.882%

Note:

N R M D = \frac{\frac{1}{n} \sum_{i = 1}^{n} |x_{i} - \bar{x_{actual}}|}{\frac{1}{n} \sum_{i = 1}^{n} x_{actual, i}}

where n is the total number of simulated discharges during rainfall events, with

x_{i}

and

x_{a c t u a l, i}

representing the

i

th simulated and actual discharge values, and

\bar{x_{a c t u a l}}

denoting the mean of the observed reference data. While MPE does not incorporate an absolute value, this feature provides its analytical advantage. Due to the cancelation of positive and negative errors, MPE is typically unsuitable for assessing the overall forecasting performance.

Table 9. The model evaluation by kw test and uncertainty analysis of wind speed forecasting.

NO	Comparison	A-	p-Value (KW)	p-Value (KS)	FICP	FINAW	AWD
Turbine #1	CNN-LSTM vs. Actual Data	−2.128	1.0000	0.9935	74.21%	0.02827	0.5032
	Transformers vs. Actual Data	−0.604	1.0000	0.9881	73.61%	0.03073	0.5470
	TCN-LSTM vs. Actual Data	0.062	1.0000	0.9969	67.06%	0.03385	0.6026
	CNN-GRU vs. Actual Data	−0.200	1.0000	0.9995	69.84%	0.03163	0.5631
	LSTM-XGBoost vs. Actual Data	−1.375	1.0000	0.9999	87.10%	0.01723	0.3066
	GNN-TCN vs. Actual Data	1.258	1.0000	1.0000	68.25%	0.03355	0.5971
	MSFS vs. Actual Data	−1.346	1.0000	0.9935	95.24%	0.01005	0.1788
Turbine #2	CNN-LSTM vs. Actual Data	0.882	1.0000	0.9987	61.90%	0.06732	1.3600
	Transformers vs. Actual Data	−0.764	1.0000	0.9125	69.94%	0.06432	1.2992
	TCN-LSTM vs. Actual Data	10.351	1.0000	0.9798	64.98%	0.07450	1.5048
	CNN-GRU vs. Actual Data	−4.923	1.0000	0.8580	62.10%	0.10678	2.1569
	LSTM-XGBoost vs. Actual Data	−2.086	1.0000	0.9684	66.07%	0.06840	1.3816
	GNN-TCN vs. Actual Data	−1.239	1.0000	0.9881	65.58%	0.07384	1.4916
	MSFS vs. Actual Data	9.611	1.0000	0.9535	93.45%	0.01725	0.3484
Turbine #3	CNN-LSTM vs. Actual Data	−0.030	1.0000	0.9987	87.60%	0.07548	1.6983
	Transformers vs. Actual Data	−9.782	1.0000	0.9125	56.15%	0.13604	3.0609
	TCN-LSTM vs. Actual Data	−0.765	1.0000	0.9798	58.04%	0.12665	2.8497
	CNN-GRU vs. Actual Data	8.904	1.0000	0.8580	52.98%	0.14186	3.1917
	LSTM-XGBoost vs. Actual Data	0.954	1.0000	0.8265	65.77%	0.12122	2.7274
	GNN-TCN vs. Actual Data	8.791	1.0000	0.9881	62.80%	0.12189	2.7424
	MSFS vs. Actual Data	−1.833	1.0000	0.9935	94.94%	0.04645	1.0450
Turbine #4	CNN-LSTM vs. Actual Data	−2.128	1.0000	0.9935	87.10%	0.01723	0.3066
	Transformers vs. Actual Data	−0.604	1.0000	0.9881	73.61%	0.03073	0.5470
	TCN-LSTM vs. Actual Data	0.062	1.0000	0.9969	74.21%	0.02827	0.5032
	CNN-GRU vs. Actual Data	−0.200	1.0000	0.9995	69.84%	0.03163	0.5631
	LSTM-XGBoost vs. Actual Data	−1.375	1.0000	0.9999	67.06%	0.03385	0.6026
	GNN-TCN vs. Actual Data	1.258	1.0000	1.0000	68.25%	0.03355	0.5971
	MSFS vs. Actual Data	−1.346	1.0000	0.9935	95.24%	0.01005	0.1788
Turbine #5	CNN-LSTM vs. Actual Data	0.990	1.0000	1.0000	86.11%	0.02955	0.5378
	Transformers vs. Actual Data	4.540	1.0000	0.9999	72.62%	0.05190	0.9446
	TCN-LSTM vs. Actual Data	4.511	1.0000	0.9987	66.57%	0.05419	0.9863
	CNN-GRU vs. Actual Data	3.547	1.0000	0.9999	65.38%	0.05611	1.0212
	LSTM-XGBoost vs. Actual Data	5.337	1.0000	1.0000	69.54%	0.05220	0.9500
	GNN-TCN vs. Actual Data	2.429	1.0000	0.9999	63.89%	0.06312	1.1487
	MSFS vs. Actual Data	4.853	1.0000	0.9995	94.35%	0.01510	0.2748
Turbine #6	CNN-LSTM vs. Actual Data	0.600	1.0000	1.0000	87.90%	0.02369	0.4595
	Transformers vs. Actual Data	5.788	1.0000	0.9995	69.15%	0.04155	0.8061
	TCN-LSTM vs. Actual Data	7.281	1.0000	0.9987	75.20%	0.04037	0.7833
	CNN-GRU vs. Actual Data	7.008	1.0000	0.9881	66.37%	0.04717	0.9151
	LSTM-XGBoost vs. Actual Data	0.817	1.0000	0.9935	71.83%	0.04124	0.8001
	GNN-TCN vs. Actual Data	6.554	1.0000	0.9881	68.65%	0.04514	0.8758
	MSFS vs. Actual Data	2.952	1.0000	0.9999	93.35%	0.01339	0.2598

Note: When the p-value is greater than 0.05, the model’s forecasts are considered statistically consistent with the verification set observations. In contrast, a p-value less than 0.05 indicates a statistically significant discrepancy between the observed and forecasted values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zeng, M.; Jia, Q.; Wen, Z.; Mao, F.; Huang, H.; Pan, J. Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation. Future Internet 2026, 18, 7. https://doi.org/10.3390/fi18010007

AMA Style

Zeng M, Jia Q, Wen Z, Mao F, Huang H, Pan J. Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation. Future Internet. 2026; 18(1):7. https://doi.org/10.3390/fi18010007

Chicago/Turabian Style

Zeng, Ming, Qianqian Jia, Zhenming Wen, Fang Mao, Haotao Huang, and Jingyuan Pan. 2026. "Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation" Future Internet 18, no. 1: 7. https://doi.org/10.3390/fi18010007

APA Style

Zeng, M., Jia, Q., Wen, Z., Mao, F., Huang, H., & Pan, J. (2026). Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation. Future Internet, 18(1), 7. https://doi.org/10.3390/fi18010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research and Application of a Model Selection Forecasting System for Wind Speed and Theoretical Power Generation

Abstract

1. Introduction

2. Conceptual Framework of the MSFSC Model

2.1. Framework of Predictive Models and Classifiers

2.2. Classification-Integrated Framework for Model Selection Forecasting

3. Experiment and Analysis

3.1. Data Source

3.2. Evaluation Criteria

3.3. Experiment I: Forecasting System for Model Selection Driven by Classification

3.4. Experiment II: Analysis of Classification and Forecasting Outcomes for Various Wind Turbine Categories

3.5. Experiment III: Theoretical Assessment of Power Generation for Individual Wind Turbines

4. Discussion

Analysis of the Proposed Model’s Significance and Uncertainty

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI