A MASPSO-Optimized CNN–GRU–Attention Hybrid Model for Short-Term Wind Speed Forecasting
Abstract
1. Introduction
- Most existing CNN–GRU–Attention models rely on a serial structure that separates spatial and temporal feature learning, limiting the effective characterization of dynamic spatiotemporal interactions and resulting in degraded performance under highly fluctuating wind speed conditions.
- Existing PSO variants are sensitive to initialization and lack adaptive search and diversity-preserving mechanisms, making them prone to premature convergence and unstable performance in high-dimensional hyperparameter optimization.
- Attention mechanisms that primarily emphasize temporal weighting often insufficiently capture spatial feature heterogeneity, which reduces robustness and leads to degraded accuracy under stochastic, fluctuating, and extreme wind conditions.
- A unified spatiotemporal learning architecture integrating CNN, GRU, and attention mechanisms is constructed to enhance the modeling of non-stationary and highly volatile wind speed sequences.
- An improved MASPSO algorithm is designed to improve hyperparameter optimization efficiency, expand the effective search space, and reduce sensitivity to initialization.
- A MASPSO–CNN–GRU–Attention hybrid forecasting model is developed to jointly optimize spatiotemporal feature learning and adaptive parameter tuning.
- Extensive experiments based on real wind farm data demonstrate the superior accuracy, robustness, and generalization performance of the proposed model across diverse operational scenarios.
2. CNN–GRU–Attention Prediction Model
2.1. Model Prediction Workflow
- (1)
- Spatial feature extraction with CNN. Spatial correlations among multi-source inputs are extracted using stacked convolutional layers, followed by average pooling to reduce dimensionality and mitigate overfitting.
- (2)
- Temporal dynamic modeling with GRU. The extracted spatial features are fed into a GRU network to capture temporal dependencies across successive time steps through gated information flow.
- (3)
- Adaptive feature integration via attention. An attention mechanism is applied to assign adaptive weights to GRU hidden states, emphasizing informative temporal features.
- (4)
- Prediction generation. The attention-weighted features are passed through a fully connected layer to generate the final wind speed prediction.
2.2. Model Architecture and Component Design
2.2.1. Convolutional Neural Network
2.2.2. Gated Recurrent Unit Network
2.2.3. Attention Mechanism
2.3. Comparative Architecture Analysis
3. MASPSO-Based Hybrid Wind Speed Prediction
3.1. MASPSO Algorithm
3.2. MASPSO–CNN–GRU–Attention Forecasting Workflow
- (1)
- Data preprocessing. Multi-source wind farm data are cleaned, normalized, and screened for relevant features. The samples are then partitioned into training, validation, and testing sets at a ratio of 7:2:1.
- (2)
- Model construction. The CNN module extracts spatial representations of the input variables, the GRU captures temporal dependencies, and the attention mechanism adaptively reweights salient features. A fully connected layer maps the fused spatiotemporal features to the prediction output.
- (3)
- Hyperparameter optimization. The validation-set root mean square error (RMSE) is used as the fitness function for the two-stage MASPSO search. The optimization variables and their corresponding search ranges include the number and size of convolutional kernels in the CNN submodule, the number of hidden units in the GRU submodule, and the weighting coefficients in the attention submodule.
- (4)
- Model training and validation. The model is trained using the Adam optimizer, while MASPSO determines the optimal hyperparameters. The validation and testing sets are used to assess forecasting accuracy and generalization.
- (5)
- Prediction generation. New input data are passed sequentially through the CNN, GRU, and attention modules to produce the final wind speed forecasts, supporting operational decision-making for wind power scheduling.
4. Case Study
4.1. Experimental Environment and Parameter Settings
4.1.1. Hardware and Software Environment
4.1.2. Data and Feature Parameters
4.1.3. Model Architecture and Training Parameters
4.2. Performance Evaluation Metrics
4.3. Model Performance Analysis
4.3.1. Wind Speed Forecasting Results
4.3.2. Prediction Performance Analysis
4.3.3. Multi-Step Forecasting Performance
4.3.4. Computational Time Performance
4.4. MASPSO Parameter Optimization Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cheng, R.; Yang, D.; Liu, D.; Zhang, G. A reconstruction-based secondary decomposition-ensemble framework for wind power forecasting. Energy 2024, 308, 132895. [Google Scholar] [CrossRef]
- Rezaei, M.; Akimov, A.; Gray, E.M.A. Techno-economics of offshore wind-based dynamic hydrogen production. Appl. Energy 2024, 374, 124030. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, F.; Kou, H.; Zou, R. A review of predictive uncertainty modeling techniques and evaluation metrics in probabilistic wind speed and wind power forecasting. Appl. Energy 2025, 396, 126234. [Google Scholar] [CrossRef]
- Spiliotis, E.; Theodorou, E. Improving wind power forecasting accuracy through bias correction of wind speed predictions. Sustain. Energy Technol. Assess. 2025, 83, 104599. [Google Scholar] [CrossRef]
- Sun, Q.; Che, J.; Hu, K.; Qin, W. Deterministic and probabilistic wind speed forecasting using decomposition methods: Accuracy and uncertainty. Renew. Energy 2025, 243, 122515. [Google Scholar] [CrossRef]
- Gong, Z.; Wan, A.; Ji, Y.; AL-Bukhaiti, K.; Yao, Z. Improving short-term offshore wind speed forecast accuracy using a VMD-PE-FCGRU hybrid model. Energy 2024, 295, 131016. [Google Scholar] [CrossRef]
- Geng, D.; Cui, H.; Lv, L.; Guo, J. A novel decomposition-prediction hybrid model improved by dual-channel cross-attention mechanism for short-term wind speed prediction. Eng. Appl. Artif. Intell. 2025, 162, 112550. [Google Scholar] [CrossRef]
- Zhao, Z.; Yun, S.; Jia, L.; Guo, J.; Meng, Y.; He, N.; Li, X.; Shi, J.; Yang, L. Hybrid VMD-CNN-GRU-based model for short-term forecasting of wind power considering spatio-temporal features. Eng. Appl. Artif. Intell. 2023, 115, 105982. [Google Scholar] [CrossRef]
- Oladipo, S.; Sun, Y.; Adeleke, O. An Improved Particle Swarm Optimization and Adaptive Neuro-Fuzzy Inference System for Predicting the Energy Consumption of University Residence. Int. Trans. Electr. Energy 2023, 18, 8508800. [Google Scholar] [CrossRef]
- Ali, Q.A.; Elsakka, M.M.; Korovkin, N.V.; Refaat, A. A novel EPSO algorithm based on shifted sigmoid function parameters for maximizing the energy yield from photovoltaic arrays: An experimental investigation. Results Eng. 2024, 24, 102967. [Google Scholar] [CrossRef]
- Hong, Y.-Y.; Rioflorido, C.L.P.P.; Zhang, W. Hybrid deep learning and quantum-inspired neural network for day-ahead spatiotemporal wind speed forecasting. Expert Syst. Appl. 2023, 64, 122645. [Google Scholar] [CrossRef]
- Ullah, S.; Chen, X.; Han, H. A novel hybrid ensemble approach for wind speed forecasting with dual-stage decomposition strategy using optimized GRU and transformer models. Energy 2025, 329, 136739. [Google Scholar] [CrossRef]
- Faruque, M.O.; Hossain, M.A.; Alam, S.M.M.; Khalid, M. Constraint-aware wind power forecasting with an optimized hybrid machine learning model. Energy Convers. Manag. X 2025, 27, 101026. [Google Scholar] [CrossRef]
- Xing, F.; Song, X.; Wang, Y.; Qin, C. A New Combined Prediction Model for Ultra-Short-Term Wind Power Based on Variational Mode Decomposition and Gradient Boosting Regression Tree. Sustainability 2023, 15, 11026. [Google Scholar] [CrossRef]
- Liu, S.; Chen, F.; Liu, Z.; Qiao, H. Overcoming Data Scarcity in Wind Power Forecasting: A Deep Learning Approach with Bidirectional Generative Adversarial network and Neighborhood Search PSO Algorithm. IEEE Access 2024, 245, 3507154. [Google Scholar] [CrossRef]
- Li, X. CNN-GRU model based on attention mechanism for large-scale energy storage optimization in smart grid. Front. Energy Res. 2023, 11, 1228256. [Google Scholar] [CrossRef]
- Xu, X.; Hu, S.; Shao, H.; Shi, P.; Li, R.; Li, D. A spatio-temporal forecasting model using optimally weighted graph convolutional network and gated recurrent unit for wind speed of different sites distributed in an offshore wind farm. Energy 2023, 284, 128565. [Google Scholar] [CrossRef]
- Hu, X.; Gao, G.; Li, B.; Wang, W.; Ghannouchi, F.M. A Novel Lightweight Grouped Gated Recurrent Unit for Automatic Modulation Classification. IEEE Wirel. Commun. Lett. 2024, 446, 3402975. [Google Scholar] [CrossRef]
- Yu, M.; Niu, D.; Gao, T.; Wang, K.; Sun, L.; Li, M.; Xu, X. A novel framework for ultra-short-term interval wind power prediction based on RF-WOA-VMD and BiGRU optimized by the attention mechanism. Energy 2023, 269, 126738. [Google Scholar] [CrossRef]
- Nyangon, J. Physics informed neural networks for maritime energy systems and blue economy innovations. Mach. Learn. Earth 2025, 1, 011002. [Google Scholar] [CrossRef]
- Deng, L.; Su, X.; Wei, B. A self-adjusting representation-based multitask PSO for high-dimensional feature selection. Swarm Evol. Comput. 2025, 98, 102084. [Google Scholar] [CrossRef]
- Gong, H.; Li, Y.; Zhang, J.; Zhang, B.; Wang, X. A new filter feature selection algorithm for classification task by ensembling pearson correlation coefficient and mutual information. Eng. Appl. Artif. Intell. 2024, 131, 107865. [Google Scholar] [CrossRef]







| Components | Compared Models | Comparative Advantages |
|---|---|---|
| Spatial feature extraction | CNN vs. GCN | Lower computational cost than GCN |
| Temporal feature modeling | GRU vs. LSTM/Transformer | Fewer parameters than LSTM; better for short-term, small-sample series than Transformer |
| Feature weighting mechanism | Attention vs. SVM/RF | Dynamic temporal weighting vs. static feature weighting in SVM/RF |
| Features | PCC | IV | Selected |
|---|---|---|---|
| Wind speed (t − 1 h) | 0.94 | 0.88 | Yes |
| Wind speed (t − 2 h) | 0.89 | 0.82 | Yes |
| Wind speed (t − 3 h) | 0.83 | 0.76 | Yes |
| Wind speed (t − 4 h) | 0.76 | 0.69 | Yes |
| Wind speed (t − 5 h) | 0.68 | 0.61 | Yes |
| Wind speed (t − 6 h) | 0.60 | 0.54 | Yes |
| Air temperature (t − 1 h) | 0.32 | 0.48 | Yes |
| Air temperature (t − 2 h) | 0.28 | 0.42 | No |
| Air temperature (t − 3 h) | 0.25 | 0.38 | No |
| Atmospheric pressure (t − 1 h) | 0.29 | 0.45 | Yes |
| Atmospheric pressure (t − 2 h) | 0.24 | 0.39 | No |
| Atmospheric pressure (t − 3 h) | 0.21 | 0.35 | No |
| Wind direction (t − 1 h) | 0.18 | 0.30 | No |
| Wind direction (t − 2 h) | 0.15 | 0.26 | No |
| Wind direction variation rate (t − 1 h) | 0.25 | 0.68 | Yes |
| Wind direction variation rate (t − 2 h) | 0.17 | 0.62 | No |
| Temperature gradient (t − 1 h) | 0.32 | 0.52 | Yes |
| Temperature gradient (t − 2 h) | 0.09 | 0.46 | No |
| Parameters | Value Range |
|---|---|
| Convolution kernel size | [3, 5, 7] |
| Input feature complexity | [32, 64, 128] |
| GRU hidden unit size | [32, 64, 128, 256] |
| Attention weight coefficient | [0.1, 0.5, 1.0] |
| Metrics | Formula | Description | Ideal Value |
|---|---|---|---|
| MAE | Mean absolute error between predictions and observations | 0 | |
| RMSE | Dispersion of prediction errors, sensitive to large deviations | 0 | |
| Degree of variance explained by the model [−∞, 1] | 1 | ||
| 95% CI | 95% confidence interval of MAE estimated from repeated runs | - | |
| PICP | - | Coverage probability of the prediction interval | 0.95 |
| PINAW | - | Normalized average width of the prediction interval | →0 |
| Forecasting Horizon | Wind-Speed Fluctuation Amplitude | Models: MAE and 95% CI | ||
|---|---|---|---|---|
| CNN–GRU | CNN–GRU–Attention | MASPSO–CNN–GRU–Attention | ||
| 08:00–10:00 | <1 m/s | 0.43 ([0.40, 0.46]) | 0.31 ([0.29, 0.33]) | 0.15 ([0.13, 0.17]) |
| 18:00–20:00 | >3 m/s | 0.55 ([0.52, 0.58]) | 0.41 ([0.38, 0.44]) | 0.22 ([0.20, 0.24]) |
| Module Combinations | Accuracy Improvement | Individual Component Contribution |
|---|---|---|
| GRU | - | - |
| CNN–GRU | 22.4% | CNN: 22.4% |
| CNN–GRU–Attention | 40% | Attention: 17.6% |
| CNN–GRU–MASPSO | 44.8% | MASPSO: 22.4% |
| MASPSO–CNN–GRU–Attention | 69% | Synergistic contribution: 24.2% |
| Models | Training Time (min) | Inference Time Per Sample (ms) | Batch Inference Time for 1000 Samples (ms) |
|---|---|---|---|
| SVR | 12.8 | 8.6 | 125.3 |
| LSTM | 28.5 | 3.2 | 45.7 |
| CNN–GRU | 35.2 | 3.0 | 42.1 |
| CNN–GRU–Attention | 42.3 | 2.8 | 38.9 |
| MASPSO–CNN–GRU–Attention | 56.7 | 2.5 | 32.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Du, H.; Sun, Y. A MASPSO-Optimized CNN–GRU–Attention Hybrid Model for Short-Term Wind Speed Forecasting. Sustainability 2026, 18, 583. https://doi.org/10.3390/su18020583
Du H, Sun Y. A MASPSO-Optimized CNN–GRU–Attention Hybrid Model for Short-Term Wind Speed Forecasting. Sustainability. 2026; 18(2):583. https://doi.org/10.3390/su18020583
Chicago/Turabian StyleDu, Haoran, and Yaling Sun. 2026. "A MASPSO-Optimized CNN–GRU–Attention Hybrid Model for Short-Term Wind Speed Forecasting" Sustainability 18, no. 2: 583. https://doi.org/10.3390/su18020583
APA StyleDu, H., & Sun, Y. (2026). A MASPSO-Optimized CNN–GRU–Attention Hybrid Model for Short-Term Wind Speed Forecasting. Sustainability, 18(2), 583. https://doi.org/10.3390/su18020583

