Wind-Mambaformer: Ultra-Short-Term Wind Turbine Power Forecasting Based on Advanced Transformer and Mamba Models
Abstract
1. Introduction
- (1)
- A novel network architecture termed Wind-Mambaformer is introduced in this paper, offering superior time-series processing capabilities compared to current models, thereby advancing wind power prediction.
- (2)
- A structure embedding Flow-Attention with Mamba is proposed to solve the problems of high computational complexity, weak time-series prediction, and weak model adaptation in the task of wind power prediction.
- (3)
- The multi-head flow-attention mechanism is introduced in the proposed Wind-Mambaformer model. The multi-variable correlations between variables related to wind power can be extracted by this mechanism to effectively improve the feature extraction capacity.
2. Related Work
3. Methodology
3.1. Problem Formulation
3.2. Flow-Attention
- (1)
- During the processing of input sequences, the Transformer model can focus on different information locations due to its attention mechanism, which serves as a core component. This capability provides the Transformer with a significant advantage over traditional RNN or CNN models. In these latter models, computational complexity is frequently represented as , where N stands for the length of the sequence, unlike Transformer’s approach. For Transformer, although the attention mechanism greatly improves the performance of processing long sequences, there still exists a secondary computational complexity problem in its model. It is mainly manifested in the following three points: (1) The computational formula for Self-attention, presented in Equation (1), describes the self-attention mechanism, which is based on the idea of adjusting the representation of each element by calculating its similarity to all other elements in the input sequence. In this mechanism, the similarity between the query (Q) and key (K) is first measured by computing their dot product, followed by scaling and softmax normalization of the results to obtain the weighted importance of each element with respect to others. Finally, these weights are applied to the values (V) for weighted summation, generating a new representation for each element. This formula underscores the dependency of each position’s output on the entire set of positions within the input sequences in the Transformer’s Self-attention layer. This reliance gives rise to a computational complexity denoted as , which is a function of N, representing the length of the sequence. This means that as the length of the sequence increases, the computation of the Self-attention layer will grow quadratically.
- (2)
- The processing of long sequences by Transformer models gives rise to a notable need for extensive memory and computational resources, a requirement that stems from the quadratic computational complexity deeply rooted in the Self-attention mechanism. When the sequence length is very long, the model often exceeds the available memory limit or computational resource limit, resulting in training or inference becoming difficult or even infeasible.
- (3)
- The accuracy of long-range prediction needs to be further improved: when the sequence is very long, each element needs to take into account the information of many other elements associated with it. This may result in important information being diluted in the attention distribution, especially when the attention mechanism needs to select information from a large number of inputs. Information dilution poses a challenge for the model in capturing crucial long-distance dependencies, thereby influencing the accuracy of predictions. Additionally, an increase in sequence length results in a greater number of elements competing for limited “attention” resources during the computation of attention weights using the Softmax function. This competition has the potential to cause a more even spread of most weights. This averaged distribution of attention may make it difficult for the model to highlight the truly important parts of the sequence, thus reducing the ability to capture long-distance dependencies.
3.3. Mamba Module
3.4. Implementation Details of the Model
4. Experiments and Results
4.1. Dataset
4.2. Data Preprocessing
4.3. Baselines
4.4. Metrics
4.5. Results
4.6. Ablation Studies
5. Discussion
6. Conclusions
- (1)
- Improved prediction accuracy with Mamba: Wind-Mambaformer incorporates the Mamba module to capture long-term wind power dependencies, significantly reducing MAE by 30% and MSE by 60% compared to the standard Transformer model.
- (2)
- Reduced computational complexity with Flow-Attention: the Flow-Attention mechanism lowers the computational burden while maintaining high efficiency, providing an edge in large-scale power prediction tasks.
- (3)
- Enhanced adaptability and generalization: the model performs well across diverse wind farm datasets, showing strong adaptability and accuracy in handling variable wind conditions and power changes.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Xue, Y.; Lei, X.; Xue, F.; Yu, C.; Dong, Z.; Wen, F.; Ju, P. A review on the impact of wind power uncertainty on power systems. Chin. J. Electr. Eng. 2014, 34, 5029–5040. [Google Scholar]
- Qian, Z.; Pei, Y.; Zareipour, H.; Chen, N. A review of wind power prediction methods. High Volt. Technol. 2016, 42, 1047–1060. [Google Scholar]
- Stathopoulos, C.; Kaperoni, A.; Galanis, G.; Kallos, G. Windpower prediction based on numerical and statistical models. J. Wind Eng. Ind. Aerodyn. 2013, 112, 25–38. [Google Scholar] [CrossRef]
- Awad, M.; Khanna, R.; Awad, M.; Khanna, R. Support vector regression. In Efficient Learning Machines; Springer: Berlin/Heidelberg, Germany, 2015; pp. 67–80. [Google Scholar]
- Li, Z.; Ye, L.; Zhao, Y.; Song, X.; Teng, J.; Jin, J. Short-Term Wind Power Prediction Based on Extreme Learning Machine with Error Correction. Prot. Control. Mod. Power Syst. 2016, 1, 1–8. [Google Scholar] [CrossRef]
- Abhinav, R.; Pindoriya, N.M.; Wu, J.; Long, C. Short-term wind power forecasting using wavelet-based neural network. Energy Procedia 2017, 142, 455–460. [Google Scholar] [CrossRef]
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef]
- Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 64–67. [Google Scholar]
- Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Wang, A.; Pei, Y.; Qian, Z.; Zareipour, H.; Jing, B.; An, J. A two-stage anomaly decomposition scheme based on multi-variable correlation extraction for wind turbine fault detection and identification. Appl. Energy 2022, 321, 119373. [Google Scholar] [CrossRef]
- Wang, A.; Pei, Y.; Zhu, Y.; Qian, Z. Wind turbine fault detection and identification through self-attention-based mechanism embedded with a multivariable query pattern. Renew. Energy 2023, 211, 918–937. [Google Scholar] [CrossRef]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
- Hofstätter, S.; Zamani, H.; Mitra, B.; Craswell, N.; Hanbury, A. Local self-attention over long text for efficient document retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 25–30 July 2020; p. 2021. [Google Scholar]
- Wu, N.; Green, B.; Ben, X.; O’Banion, S. Deep transformer models for time series forecasting: The influenza prevalence case. arXiv 2020, arXiv:2001.08317. [Google Scholar]
- Putz, D.; Gumhalter, M.; Auer, H. A novel approach to multi-horizon wind power forecasting based on deep neural architecture. Renew. Energy 2021, 178, 494–505. [Google Scholar] [CrossRef]
- Pan, X.; Wang, L.; Wang, Z.; Huang, C. Short-term wind speed forecasting based on spatial-temporal graph transformer networks. Energy 2022, 253, 124095. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
- Hannan, E.J.; Kavalieris, L. Regression, autoregression models. J. Time Ser. Anal. 1986, 7, 27–49. [Google Scholar] [CrossRef]
- Hansun, S. A new approach of moving average method in time series analysis. In Proceedings of the 2013 Conference on New Media Studies (CoNMedia), Tangerang, Indonesia, 27–28 November 2013; IEEE: New York, NY, USA, 2013; pp. 1–4. [Google Scholar]
- Benjamin, M.A.; Rigby, R.A.; Stasinopoulos, D.M. Generalized autoregressive moving average models. J. Am. Stat. Assoc. 2003, 98, 214–223. [Google Scholar] [CrossRef]
- Zhang, N.; Zhang, Y.; Lu, H. Seasonal autoregressive integrated moving average and support vector machine models: Prediction of short-term traffic flow on freeways. Transp. Res. Rec. 2011, 2215, 85–92. [Google Scholar] [CrossRef]
- Shen, X.; Liu, X.; Hu, X.; Zhang, D.; Song, S. Contrastive learning of subject-invariant EEG representations for cross-subject emotion recognition. IEEE Trans. Affect. Comput. 2022, 14, 2496–2511. [Google Scholar] [CrossRef]
- Fan, J.; Ma, X.; Wu, L.; Zhang, F.; Yu, X.; Zeng, W. Light gradient boosting machine: An efficient soft computing model for estimating daily reference evapotranspiration with local and external meteorological data. Agricult. Water Manag. 2019, 225, 105758. [Google Scholar] [CrossRef]
- Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. xgboost: Extreme gradient boosting. R Package Version 0.4-2 2015, 1, 1–4. [Google Scholar]
- Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; IEEE: New York, NY, USA, 2017; pp. 1597–1600. [Google Scholar]
- Kiranyaz, S.; Avci, O.; Abdeljaber, O.; Ince, T.; Gabbouj, M.; Inman, D.J. 1D convolutional neural networks and applications: A survey. Mech. Syst. Signal Process. 2021, 151, 107398. [Google Scholar] [CrossRef]
- Bommidi, B.S.; Teeparthi, K.; Kosana, V. Hybrid wind speed forecasting using ICEEMDAN and transformer model with novel loss function. Energy 2023, 265, 126383. [Google Scholar] [CrossRef]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-documenttransformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The efficient transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
- Wu, H.; Wu, J.; Xu, J.; Wang, J.; Long, M. Flowformer: Linearizing Transformers with Conservation Flows. arXiv 2022, arXiv:2202.06258. [Google Scholar] [CrossRef]
- Neshat, M.; Nezhad, M.M.; Mirjalili, S.; Piras, G.; Garcia, D.A. Quaternion convolutional long short- term memory neural model with an adaptive decomposition method for wind speed forecasting: North aegean islands case studies. Energy Convers. Manag. 2022, 259, 115590. [Google Scholar] [CrossRef]
- Nezhad, M.M.; Heydari, A.; Pirshayan, E.; Groppi, D.; Garcia, D.A. A novel forecasting model for wind speed assessment using sentinel family satellites images and machine learning method. Renew. Energy 2021, 179, 2198–2211. [Google Scholar] [CrossRef]
- Neshat, M.; Nezhad, M.M.; Abbasnejad, E.; Mirjalili, S.; Tjernberg, L.B.; Garcia, D.A.; Alexander, B.; Wagner, M. A deep learning-based evolutionary model for short-term wind speed forecasting: A case study of the lillgrund offshore wind farm. Energy Convers. Manag. 2021, 236, 114002. [Google Scholar] [CrossRef]
- Neshat, M.; Nezhad, M.M.; Sergiienko, N.Y.; Mirjalili, S.; Piras, G.; Garcia, D.A. Wave power forecasting using an effective decomposition-based convolutional Bi-directional model with equilibrium Nelder-Mead optimiser. Energy 2022, 256, 124623. [Google Scholar] [CrossRef]
- Neshat, M.; Nezhad, M.M.; Abbasnejad, E.; Mirjalili, S.; Groppi, D.; Heydari, A.; Tjernberg, L.B.; Garcia, D.A.; Alexander, B.; Shi, Q.; et al. Wind turbine power output prediction using a new hybrid neuro-evolutionary method. Energy 2021, 229, 120617. [Google Scholar] [CrossRef]
- Le, T.M.C.; Le, X.C.; Huynh, N.N.P.; Doan, A.T.; Dinh, T.V.; Duong, M.Q. Optimal power flow solutions to power systems with wind energy using a highly effective meta-heuristic algorithm. Int. J. Renew. Energy Dev. 2023, 12, 467–477. [Google Scholar] [CrossRef]
- Yang, M.; Jiang, Y.; Che, J.; Han, Z.; Lv, Q. Short-Term Forecasting of Wind Power Based on Error Traceability and Numerical Weather Prediction Wind Speed Correction. Electronics 2024, 13, 1559. [Google Scholar] [CrossRef]
- Kovalnogov, V.N.; Fedorov, R.V.; Chukalin, A.V.; Klyachkin, V.N.; Tabakov, V.P.; Demidov, D.A. Applied Machine Learning to Study the Movement of Air Masses in the Wind Farm Area. Energies 2024, 17, 3961. [Google Scholar] [CrossRef]
- Allen, D.M. Mean square error of prediction as a criterion for selectingvariables. Technometrics 1971, 13, 469–475. [Google Scholar] [CrossRef]
- De Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Meanabsolute percentage error for regression models. Neurocomputing 2016, 192, 38–48. [Google Scholar] [CrossRef]
Index | Feature Name |
---|---|
1 | Wind speed at height of 10 m (m/s) |
2 | Wind direction at height of 10 m (˚) |
3 | Wind speed at height of 30 m (m/s) |
4 | Wind direction at height of 30 m (˚) |
5 | Wind speed at height of 50 m (m/s) |
6 | Wind direction at height of 50 m (˚) |
7 | Wind speed—at the height of wheel hub (m/s) |
8 | Wind speed—at the height of wheel hub (˚) |
9 | Air temperature (°C) |
10 | Atmosphere (hpa) |
11 | Relative humidity (%) |
12 (target) | Power (MW) |
Wind Farm Site | Total Sample Size | Missing Data and Outliers’ Rate |
---|---|---|
1 | 70,176 | 1.58% |
2 | 0.45% | |
3 | 1.39% | |
4 | 3.25% | |
5 | 0.27% |
Wind Farm Site 1 | Timesteps | MAE | MSE | RMSE | MAPE |
4 | 0.1969 | 0.0853 | 0.2921 | 1.2233 | |
8 | 0.2459 | 0.1281 | 0.3580 | 1.8596 | |
12 | 0.2052 | 0.0866 | 0.2944 | 1.4693 | |
16 | 0.1921 | 0.0740 | 0.2721 | 1.2859 | |
Wind Farm Site 2 | Timesteps | MAE | MSE | RMSE | MAPE |
4 | 0.2000 | 0.0794 | 0.2818 | 1.6340 | |
8 | 0.1973 | 0.0785 | 0.2801 | 0.9749 | |
12 | 0.1974 | 0.0749 | 0.2737 | 1.4448 | |
16 | 0.2189 | 0.0889 | 0.2981 | 1.7445 | |
Wind Farm Site 3 | Timesteps | MAE | MSE | RMSE | MAPE |
4 | 0.1584 | 0.0541 | 0.2325 | 1.0810 | |
8 | 0.1854 | 0.0729 | 0.2700 | 1.4373 | |
12 | 0.1770 | 0.0637 | 0.2525 | 1.1955 | |
16 | 0.1654 | 0.0570 | 0.2386 | 1.3030 | |
Wind Farm Site 4 | Timesteps | MAE | MSE | RMSE | MAPE |
4 | 0.1155 | 0.0282 | 0.1680 | 0.4291 | |
8 | 0.1415 | 0.0428 | 0.2069 | 0.7862 | |
12 | 0.1180 | 0.0288 | 0.1698 | 0.5726 | |
16 | 0.1328 | 0.0367 | 0.1915 | 0.6631 | |
Wind Farm Site 5 | Timesteps | MAE | MSE | RMSE | MAPE |
4 | 0.1448 | 0.0447 | 0.2114 | 0.8623 | |
8 | 0.1326 | 0.0357 | 0.1890 | 1.1014 | |
12 | 0.1069 | 0.0225 | 0.1500 | 0.7305 | |
16 | 0.1061 | 0.0232 | 0.1522 | 0.7584 |
Ablation study on Wind farm site 1 | ||||
WindMambaformer-FlowAttention-Mamba | √ | √ | √ | √ |
Flow-Attention | √ | √ | ||
Mamba | √ | √ | ||
MSE | 0.2548 | 0.2217 | 0.1305 | 0.0740 |
MAE | 0.3319 | 0.3128 | 0.2487 | 0.1921 |
RMSE | 0.5048 | 0.4649 | 0.3612 | 0.2721 |
MAPE | 2.3586 | 2.2059 | 1.7166 | 1.2859 |
Ablation study on Wind farm site 2 | ||||
WindMambaformer-FlowAttention-Mamba | √ | √ | √ | √ |
Flow-Attention | √ | √ | ||
Mamba | √ | √ | ||
MSE | 0.2231 | 0.2140 | 0.1633 | 0.0889 |
MAE | 0.3281 | 0.3351 | 0.2895 | 0.2189 |
RMSE | 0.4723 | 0.4626 | 0.4042 | 0.2981 |
MAPE | 3.5596 | 1.4172 | 2.5721 | 1.7445 |
Ablation study on Wind farm site 3 | ||||
WindMambaformer-FlowAttention-Mamba | √ | √ | √ | √ |
Flow-Attention | √ | √ | ||
Mamba | √ | √ | ||
MSE | 0.1498 | 0.1482 | 0.0809 | 0.0570 |
MAE | 0.2611 | 0.2554 | 0.1977 | 0.1654 |
RMSE | 0.3870 | 0.3849 | 0.2844 | 0.2386 |
MAPE | 1.6958 | 1.7366 | 1.1964 | 1.3030 |
Ablation study on Wind farm site 4 | ||||
WindMambaformer-FlowAttention-Mamba | √ | √ | √ | √ |
Flow-Attention | √ | √ | ||
Mamba | √ | √ | ||
MSE | 0.0862 | 0.0821 | 0.0558 | 0.0367 |
MAE | 0.1954 | 0.1890 | 0.1633 | 0.1328 |
RMSE | 0.2936 | 0.2866 | 0.2362 | 0.1915 |
MAPE | 1.0913 | 1.1392 | 0.8351 | 0.6631 |
Ablation study on Wind farm site 5 | ||||
WindMambaformer-FlowAttention-Mamba | √ | √ | √ | √ |
Flow-Attention | √ | √ | ||
Mamba | √ | √ | ||
MSE | 0.0533 | 0.0469 | 0.0344 | 0.0232 |
MAE | 0.1552 | 0.1456 | 0.1285 | 0.1061 |
RMSE | 0.2309 | 0.2167 | 0.1856 | 0.1522 |
MAPE | 1.1737 | 1.2284 | 0.9551 | 0.7584 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dong, Z.; Zhao, Y.; Wang, A.; Zhou, M. Wind-Mambaformer: Ultra-Short-Term Wind Turbine Power Forecasting Based on Advanced Transformer and Mamba Models. Energies 2025, 18, 1155. https://doi.org/10.3390/en18051155
Dong Z, Zhao Y, Wang A, Zhou M. Wind-Mambaformer: Ultra-Short-Term Wind Turbine Power Forecasting Based on Advanced Transformer and Mamba Models. Energies. 2025; 18(5):1155. https://doi.org/10.3390/en18051155
Chicago/Turabian StyleDong, Zhe, Yiyang Zhao, Anqi Wang, and Meng Zhou. 2025. "Wind-Mambaformer: Ultra-Short-Term Wind Turbine Power Forecasting Based on Advanced Transformer and Mamba Models" Energies 18, no. 5: 1155. https://doi.org/10.3390/en18051155
APA StyleDong, Z., Zhao, Y., Wang, A., & Zhou, M. (2025). Wind-Mambaformer: Ultra-Short-Term Wind Turbine Power Forecasting Based on Advanced Transformer and Mamba Models. Energies, 18(5), 1155. https://doi.org/10.3390/en18051155