1. Introduction
Since the turn of the 21st century, the global energy transition has accelerated, with wind power garnering an extensive adoption due to its environmentally friendly and low-carbon attributes [
1]. However, wind power generation is influenced by multiple factors, such as the wind speed and temperature, exhibiting significant nonlinearity, non-stationarity, and randomness. Specifically, the pronounced nonlinearity implies that the relationship between meteorological inputs and the power output cannot be adequately captured by traditional linear frameworks, as minor perturbations in boundary conditions can induce disproportionately large fluctuations in the generation [
2]. Non-stationarity stems from the time-varying statistical properties of meteorological signals—such as seasonal cycles, diurnal fluctuations, and abrupt climate anomalies—which invalidate the stationary assumptions of traditional time series methods like ARIMA [
3]. Meanwhile, randomness originates from chaotic micro-scale meteorological phenomena and measurement uncertainties, introducing irreducible noise that complicates the pattern extraction [
4]. Consequently, these characteristics pose substantial challenges for achieving a high forecast accuracy. The precision of these predictions directly affects the grid dispatch, energy storage optimization, and market stability, while also playing a crucial role in enhancing the wind farm operational efficiency and reducing energy costs [
5,
6,
7]. Therefore, developing accurate wind power forecasting models is essential for providing a theoretical basis for the grid power balance and renewable energy integration planning, thereby promoting the green, low-carbon energy transition [
8,
9].
Current methods for wind power forecasting can be categorized into three main types: physical methods [
10], statistical methods [
11], and artificial intelligence methods [
12]. Physical approaches rely on numerical weather prediction (NWP) models to simulate meteorological parameters, such as the wind speed and direction, using atmospheric dynamic equations and then estimate the wind power output [
13]. For instance, Al-Yahyai et al. [
14] reviewed the application of NWP models in wind energy assessments, highlighting their ability to overcome the low resolution and site limitations of traditional meteorological station data, making them suitable for mid-to-long-term forecasting. However, the high computational complexity of NWP models and their limited capacity to resolve micro-scale phenomena fundamentally constrain their ability to capture transient fluctuations and nonlinear responses in wind power generation, resulting in a suboptimal accuracy and practicality for high-value short-term forecasting scenarios [
15]. Statistical methods, on the other hand, construct linear forecasting models—such as autoregressive moving averages (ARMAs) and seasonal ARIMAs (SARIMAs)—by leveraging the time series characteristics of historical data. The models are relatively simple, computationally efficient, and perform well on stationary time series data [
16]. For instance, Wang et al. [
17] applied the ARMA model for short-term wind power prediction, effectively reducing forecasting errors; however, the method’s reliance on the assumption of data stationarity makes it less capable of handling nonlinear issues, such as abrupt wind speed changes [
18]. Traditional statistical models exhibit limited adaptability to evolving data distributions under nonlinear effects caused by abrupt wind variations, extreme weather, or a complex topography, resulting in substantially increased prediction errors and a diminished robustness.
In recent years, artificial intelligence techniques have significantly improved the modeling of nonlinear relationships by utilizing machine learning and deep learning methods, surpassing the limitations inherent in physical and statistical approaches. For example, Guo et al. [
19] employed a BP neural network to integrate multi-source meteorological data, achieving a 24 h wind power prediction accuracy of 85.6%. With the advancement of deep learning, long short-term memory (LSTM) networks and their variants have become prevalent. Shahid et al. [
20] introduced a GA-optimized LSTM (GLSTM) network that adaptively adjusts hyperparameters, reducing prediction errors by 6–30% compared to conventional LSTM models. Additionally, convolutional neural networks (CNNs) have shown a remarkable performance in spatial feature extraction; Zhu et al. [
21] demonstrated the feasibility of CNNs in wind power regression. However, single models exhibit inherent limitations in complex scenarios, such as LSTM struggling to capture spatial features and the CNN failing to efficiently model long-term temporal dependencies. To address this, research has progressively shifted toward multi-model fusion strategies. For example, Chen et al. [
22] combined a CNN with bidirectional LSTM (BiLSTM) and employed a feature weighting strategy, remarkably enhancing the time series data utilization and forecasting accuracy. Further advancements include Zhang et al. [
23], who proposed a multi-task learning model integrating a Transformer and LSTM, leveraging dilated causal convolutional networks to extract multi-dimensional features for both deterministic and probabilistic wind power forecasting—reducing the Mean Absolute Error by up to 9.190 compared to 23 benchmark models. Additionally, Zhao et al. [
24] developed a hybrid VMD-CNN-GRU framework, where the variable mode decomposition preprocesses wind speed sequences to mitigate volatility, and a CNN collaborates with a GRU to extract spatial–temporal features, achieving an RMSE of 1.5651 and an R
2 of 0.9964 in short-term predictions. These studies highlight that multi-model architectures overcome single-model constraints and enhance the prediction performance in complex energy systems through approaches such as structural integration, task-sharing, or data decomposition.
The performance of these models largely depends on the selection of hyperparameters, making hyperparameter optimization a critical step in enhancing prediction accuracy [
25]. The full realization of a model’s potential is critically constrained by whether its hyperparameters are optimally configured. Inappropriate hyperparameters can directly lead to model underfitting or overfitting, even if the model itself possesses strong capabilities. Traditional methods such as the grid search and random search, although simple and easy to implement, suffer from significant efficiency drops when dealing with high-dimensional parameter spaces and complex models [
26]. For example, deep neural networks, such as LSTM and CNN, often contain dozens of hyperparameters. Traditional methods face the problem of exponential growth in computational complexity with the increase in dimensions [
27], which leads to an exponential increase in the computational cost and an extremely low efficiency. As a result, traditional methods usually can only carry out a rough exploration within a limited range, making it difficult to find the global or near-global optimal solution. Consequently, intelligent optimization algorithms have gradually become the mainstream approach, balancing the global search and local exploitation by simulating natural evolution or swarm intelligence. Shahid et al. [
20] utilized a genetic algorithm (GA) in their GLSTM model to optimize both the time window length and the number of hidden neurons, achieving a 6–30% reduction in forecasting errors across multiple scenarios in seven major European wind farms. Furthermore, Geng et al. [
28] developed a hybrid PSO–deep reinforcement learning (DRL) framework that combines Particle Swarm Optimization with deep reinforcement learning, reducing the system’s Mean Squared Error (MSE) by 23.8%. Gao et al. [
29] used the sparrow search algorithm (SSA) to adaptively optimize the variational mode decomposition (VMD), significantly enhancing LSTM’s short-term prediction accuracy; Lu et al. [
30] employed gray wolf optimization (GWO) to simultaneously optimize the kernel parameters of a multi-output support vector machine (MSVM), outperforming single-model forecasting frameworks; Li et al. [
31] compared an improved dragonfly algorithm with traditional strategies, demonstrating that the former could more efficiently search the SVM hyperparameter space, resulting in a 23.6% reduction in forecasting errors on French wind farm data compared to BP neural networks; similarly, in regional load forecasting, Barman et al. [
32] showed that an SVM model optimized by the grasshopper optimization algorithm (GOA), which incorporates an analysis of similar climate days, improved the prediction accuracy by 12.6% and 8.3% relative to the GA-SVM and PSO-SVM, respectively. Although intelligent optimization algorithms can enhance the efficiency and effectiveness of the hyperparameter search, they still have significant defects. Concerning local optimality and convergence, the genetic algorithm (GA) tends to be trapped in local optima when handling multimodal functions [
33]. At the same time, the Particle Swarm Optimization (PSO) in high-dimensional spaces suffers premature convergences due to group collaboration constraints [
34]. In terms of computational efficiency, the Ant Colony Optimization (ACO) exhibits a low efficiency in processing large-scale problems, owing to its high-complexity pheromone update and path-searching mechanisms [
35]. The Simulated Annealing (SA) algorithm, meanwhile, incurs an exponentially increasing computational complexity in high-dimensional scenarios due to the explosive expansion of solution spaces [
36]. These limitations indicate that intelligent optimization algorithms still require continuous improvement and innovation in both theoretical and practical aspects to adapt to complex and changeable application scenarios.
In summary, we propose an innovative wind power forecasting model that integrates an improved escape optimization algorithm (OPESC) with convolutional neural networks (CNNs), bidirectional long short-term memory (BiLSTM) networks, and a self-attention mechanism. This approach leverages CNNs to extract local spatial features, utilizes BiLSTM to capture bidirectional dependencies in time series data, and employs a self-attention mechanism to focus on critical global information at key moments. The OPESC dynamically optimizes hyperparameters throughout the model, effectively mitigating issues such as local optima and overfitting that are common in single-model approaches. The primary contributions of this work are as follows:
- (1)
A multi-strategy integrated optimized escape algorithm is introduced to achieve hyperparameter tuning, thereby enhancing the training efficiency.
- (2)
A CNN-BiLSTM-SA-based wind power forecasting model is developed that integrates local feature extraction with global dependency capture, leading to significant improvements in prediction accuracy.
- (3)
Comprehensive evaluations, comparing the proposed model with ten other models, confirm its superior performance in both its predictive accuracy and generalization capability.
6. Discussion and Conclusions
In the context of accelerating global energy transitions and the critical need for reliable renewable energy management, this study addresses the challenges posed by the nonlinear, non-stationary characteristics of wind dynamics, which complicate accurate wind power forecasting. A novel integrated framework, OPESC-CNN-BiLSTM-SA, is proposed, combining an optimized escape algorithm (OPESC) with a hybrid deep learning architecture to enhance the forecasting precision and facilitate an efficient grid integration. This research innovatively leverages optimization strategies and hierarchical neural network designs to overcome limitations in hyperparameter tuning, spatiotemporal feature extraction, and adaptive pattern modeling.
The principal conclusions are as follows:
The OPESC, using logarithmic spiral opposition-based initialization and adaptive particle swarm strategies, outperforms classical methods (DE, PSO, etc.) in benchmark tests. Across 10 CEC2017 functions, it consistently achieves lower optimal values, escaping local optima to improve hyperparameter tuning for complex models.
The OPESC-optimized CNN-BiLSTM-SA reduces the RMSE by 30.07%, the MAE by 34.51%, and the MAPE by 7.81% on real wind data. With an R2 of 97.06%, it robustly models intermittent wind dynamics, highlighting the role of intelligent optimization in deep learning.
The hybrid model combines a CNN for the spatial feature extraction, BiLSTM for bidirectional temporal dependencies, and self-attention for dynamic weighting. This design captures meteorological correlations, historical/future trends, and critical time steps, outperforming standalone models in handling non-stationary data.
Validated in wind farms, the framework cuts key errors by 30–35%, enabling a reliable grid dispatch, cost-effective storage, and reduced supply–demand risks. It supports low-carbon transitions by improving renewable predictability, a vital factor for global energy integration.
The OPESC’s adaptability and the CNN-BiLSTM-SA model’s robust feature processing make the framework applicable to broader energy forecasting scenarios, including solar and hybrid energy systems, where similar challenges of non-stationarity and intermittency exist. This study not only advances the state of wind power forecasting but also provides a novel research paradigm for integrating intelligent optimization with deep learning in energy systems.
Despite its promising results, this study has several limitations warranting consideration. Firstly, the model’s validation relies primarily on data from a single wind farm, raising questions about its generalizability to geographically diverse sites with distinct meteorological profiles. Secondly, while effective, the OPESC exhibits significant computational complexity, particularly in extremely high-dimensional optimization spaces, posing challenges for large-scale applications. Finally, the deployment of real-time forecasting necessitates addressing this computational overhead and exploring its integration with edge computing platforms to enhance the operational feasibility. To address these gaps, future research will focus on three key directions: (1) extending the framework to multi-step-ahead forecasting and multi-wind farm collaborative modeling, leveraging graph neural networks to capture spatial dependencies across geographically distributed energy systems; (2) optimizing the OPESC for high-dimensional scenarios through dimensionality reduction techniques and edge computing integration, enabling real-time hyperparameter tuning with a reduced computational overhead; and (3) incorporating multi-source data fusion mechanisms and lightweight model architectures to improve the scalability and adaptability for large-scale, real-time energy management systems.