A Hybrid GRU-MHSAM-ResNet Model for Short-Term Power Load Forecasting
Abstract
1. Introduction
- (1)
- A key limitation of existing hybrid models is that the importance of input features is not evaluated dynamically. Consequently, important features may be overlooked under different datasets. This reduces their ability to generalize across different datasets and application conditions.
- (2)
- Higher model complexity may improve accuracy. However, simply stacking modules often does not lead to a stable or generalizable system. Many such models fail to balance accuracy and generalization. Therefore, it remains a challenge to design a hybrid model that can effectively meet different forecasting needs.
- (1)
- A deep temporal encoder, composed of stacked GRU layers, is employed to effectively capture the complex non-linear temporal dependencies inherent in the load data.
- (2)
- The vector output by the GRU encoder after processing historical load data is concatenated with the load from the previous hour (), hour type (), and week type () for forecast time t. The resulting concatenated vector is then passed into a MHSAM. In this module, MHSAM is used in parallel to focus on different parts of the input. This structure helps capture complex dependencies within the merged features better than simpler attention mechanisms. As a result, this allows the model to differentially weight features when constructing representations of the input.
- (3)
- The skip connections of ResNet block ensure that the model can converge stably and achieve improved prediction accuracy, even as network depth increases. This architecture allows the model to safely increase its depth, ensuring stable convergence and enhancing its capacity to extract complex features from the data.
2. Method
2.1. Model Framework
- (1)
- Data preprocessing: Historical load data is processed using a sliding window method, and temporal information is extracted from it. This step prepares the input and output datasets required for model training and forms the foundation for the training phase.
- (2)
- Historical load feature extraction: The preprocessed historical load data is input into a GRU encoder. Taking advantage of the GRU’s ability to capture temporal patterns, deeper features in the historical data are extracted. The encoder’s final hidden state is used as the output.
- (3)
- Key information extraction: The features extracted by GRU encoder are concatenated with temporal information from the preprocessing stage and the load from the previous hour. This concatenated feature is then passed into the MHSAM. This mechanism is used to adaptively learn internal relationships within the input.
- (4)
- Feature integration and prediction: The output from the MHSAM is passed into a fully connected network for further processing. At the end of this network, a ResNet block composed of fully connected layers and a skip connection is incorporated to generate the final prediction results.
2.2. GRU Model
2.3. Multi-Head Self-Attention Module
2.4. ResNet Block
2.5. Training Details
- (1)
- Offline learning phase: In the initial stage of forecasting, the model is trained on historical data, including both training and validation sets. This phase helps the model learn typical load patterns from a large dataset and build a solid performance for future predictions. Offline learning uses batch training to gradually optimize model weights, ensuring certain predictive abilities when facing new data.
- (2)
- Online learning phase: After offline training is completed, the model enters the online learning phase. For each test sample, once a prediction is made, its input and true output are added to the historical dataset to form an expanded training set. One epoch of training is then performed on this updated set to fine-tune the model weights established during offline training. This helps the model adapt to new data and learn recent trends in power load changes.
- (3)
- Prediction and update cycle: After each prediction, the model’s weights are updated using the online learning method. The updated model is then used to forecast the next data points in the test set. This cycle repeats continuously, allowing the model to improve over time and gradually increase its prediction accuracy.
3. Data Preprocessing and Evaluation Metrics
3.1. Data Preprocessing
- (1)
- 33 h loads before hour t: The loads from the 33 h preceding the forecast hour t serve as the first part of the input. A sliding partition method is applied: each row of load data covers a continuous 24 h span. For example, the first row contains 24 load values from 00:00 to 24:00 on 1 January; the second row contains 24 values from 01:00 on 1 January to 01:00 on 2 January, and so forth. The first 10 rows constructed in this way are used as the first input for prediction model.
- (2)
- Load at hour t − 1: The load at the hour immediately before hour t serves as the second input for prediction model.
- (3)
- Hour type and week type: The hour information and the week information corresponding to hour t serve as the third input for prediction model.
- (4)
- Output: The load at hour t is defined as the model output.
3.2. Data Normalization
3.3. Evaluation Metrics
4. Case Study
4.1. Experiment Environment and Training Hyperparameter Selection
4.2. Design of Comparison Models
4.3. Case 1: China System Analysis
4.4. Case 2: New South Wales, Australia System Analysis
4.5. Case 3: Malaysia System Analysis
4.6. Accuracy and Stability Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Dowejko, J.; Jaworski, J. Beyond traditional grid: A novel quantitative framework for assessing automation’s impact on system average interruption duration index and system average interruption frequency index. Energies 2025, 18, 2671. [Google Scholar] [CrossRef]
- Kumar, G.V.B.; Sarojini, R.K.; Palanisamy, K.; Padmanaban, S.; Holm-Nielsen, J.B. Large scale renewable energy integration: Issues and solutions. Energies 2019, 12, 1996. [Google Scholar] [CrossRef]
- Yang, H.; Chen, Q.; Tang, K.; Zhang, D.; Shen, Y. Flexibility aggregation and cooperative scheduling for distributed resources using a virtual battery equivalence technique. Energy 2025, 334, 137770. [Google Scholar] [CrossRef]
- Yang, H.; Chen, Q.; Liu, Y.; Ma, Y.; Zhang, D. Demand response strategy of user-side energy storage system and its application to reliability improvement. J. Energy Storage 2024, 92, 112150. [Google Scholar] [CrossRef]
- Shah, S.A.H.; Ahmed, U.; Bilal, M.; Khan, A.R.; Razzaq, S.; Aziz, I.; Mahmood, A. Improved electric load forecasting using quantile long short-term memory network with dual attention mechanism. Energy Rep. 2025, 13, 2343–2353. [Google Scholar] [CrossRef]
- Ahmad, N.; Ghadi, Y.; Adnan, M.; Ali, M. Load forecasting techniques for power system: Research challenges and survey. IEEE Access 2022, 10, 71054–71090. [Google Scholar] [CrossRef]
- Boroojeni, K.G.; Amini, M.H.; Bahrami, S.; Iyengar, S.S.; Sarwat, A.I.; Karabasoglu, O. A novel multi-time-scale modeling for electric power demand forecasting: From short-term to medium-term horizon. Electr. Pow. Syst. Res. 2017, 142, 58–73. [Google Scholar] [CrossRef]
- Huang, R.; Zhu, L.; Gao, F.; Wang, Y.; Yang, Y.; Xiong, X. Short-term power load forecasting method based on variational modal decomposition for convolutional long-short-term memory network. Mod. Electr. Power 2024, 41, 97–105. [Google Scholar] [CrossRef]
- Rubasinghe, O.; Zhang, X.; Chau, T.K.; Chow, Y.H.; Fernando, T.; Lu, H.H. A novel sequence to sequence data modelling based CNN-LSTM algorithm for three years ahead monthly peak load forecasting. IEEE Trans. Power Syst. 2024, 39, 1932–1947. [Google Scholar] [CrossRef]
- Yang, Z.; Ce, L.; Lian, L. Electricity price forecasting by a hybrid model, combining wavelet transform, ARMA and kernel-based extreme learning machine methods. Appl. Energy 2017, 190, 291–305. [Google Scholar] [CrossRef]
- Tarmanini, C.; Sarma, N.; Gezegin, C.; Ozgonenel, O. Short term load forecasting based on ARIMA and ANN approaches. Energy Rep. 2023, 9, 550–557. [Google Scholar] [CrossRef]
- Rao, C.; Zhang, Y.; Wen, J.; Xiao, X.; Goh, M. Energy demand forecasting in China: A support vector regression-compositional data second exponential smoothing model. Energy 2023, 263, 125955. [Google Scholar] [CrossRef]
- Liu, C.; Sun, B.; Zhang, C.; Li, F. A hybrid prediction model for residential electricity consumption using holt-winters and extreme learning machine. Appl. Energy 2020, 275, 115383. [Google Scholar] [CrossRef]
- Cheng, S.; Shi, J.; Cheng, Q.; Zhou, X.; Zeng, S. Hybrid model for medium-term load forecasting in urban power grids. Energies 2025, 18, 4378. [Google Scholar] [CrossRef]
- Minhas, D.M.; Usman, M.; Raja, I.B.; Wakeel, A.; Ali, M.; Frey, G. Virtual energy replication framework for predicting residential PV power, heat pump load, and thermal comfort using weather forecast data. Energies 2025, 18, 5036. [Google Scholar] [CrossRef]
- Chen, L.; Liu, X.; Zhou, Z. Short-Term wind power forecasting based on ISFOA-SVM. Electronics 2025, 14, 3172. [Google Scholar] [CrossRef]
- Song, K.; Kim, T.; Cho, S.; Song, K.; Yoon, S. XGBoost-based very short-term load forecasting using day-ahead load forecasting results. Electronics 2025, 14, 3747. [Google Scholar] [CrossRef]
- Sivhugwana, K.S.; Ranganai, E. Short-term forecasting of unplanned power outages using machine learning algorithms: A robust feature engineering strategy against multicollinearity and nonlinearity. Energies 2025, 18, 4994. [Google Scholar] [CrossRef]
- Cordeiro-Costas, M.; Labandeira-Pérez, H.; Villanueva, D.; Pérez-Orozco, R.; Eguía-Oller, P. NSGA-II based short-term building energy management using optimal LSTM-MLP forecasts. Int. J. Electr. Power 2024, 159, 110070. [Google Scholar] [CrossRef]
- Shiblee, M.F.H.; Koukaras, P. Short-term load forecasting in the Greek power distribution system: A comparative study of gradient boosting and deep learning models. Energies 2025, 18, 5060. [Google Scholar] [CrossRef]
- Ding, Z.; Chu, Y. Fault recovery strategy with net load forecasting using Bayesian optimized LSTM for distribution networks. Entropy 2025, 27, 888. [Google Scholar] [CrossRef] [PubMed]
- Kim, T.; Kwon, B.; Yoon, S.; Song, K. Very short-term load forecasting for large power systems with Kalman Filter-based pseudo-trend information using LSTM. Energies 2025, 18, 4890. [Google Scholar] [CrossRef]
- Li, J.; Li, J.; Li, J.; Zhang, G. Bayesian-optimized GCN-BiLSTM-Adaboost model for power-load forecasting. Electronics 2025, 14, 3332. [Google Scholar] [CrossRef]
- Dakheel, F.; Çevik, M. Optimizing smart grid load forecasting via a hybrid long short-term memory-XGBoost framework: Enhancing accuracy, robustness, and energy management. Energies 2025, 18, 2842. [Google Scholar] [CrossRef]
- Mazibuko, T.; Akindeji, K. Hybrid forecasting for energy consumption in south Africa: LSTM and XGBoost approach. Energies 2025, 18, 4285. [Google Scholar] [CrossRef]
- Liu, X.; Song, J.; Tao, H.; Wang, P.; Mo, H.; Du, W. Quarter-hourly power load forecasting based on a hybrid CNN-BiLSTM-Attention model with CEEMDAN, K-Means, and VMD. Energies 2025, 18, 2675. [Google Scholar] [CrossRef]
- Wan, A.; Chang, Q.; AL-Bukhaiti, K.; He, J. Short-term power load forecasting for combined heat and power using CNN-LSTM enhanced by attention mechanism. Energy 2023, 282, 128274. [Google Scholar] [CrossRef]
- Dou, X.; Yang, R.; Dou, Z.; Zhang, C.; Xu, C.; Li, J. A load forecasting model based on spatiotemporal partitioning and cross-regional Attention collaboration. Sustainability 2025, 17, 8162. [Google Scholar] [CrossRef]
- Dong, J.; Luo, L.; Lu, Y.; Zhang, Q. A parallel short-term power load forecasting method considering high-level elastic loads. IEEE Trans. Instrum. Meas. 2023, 72, 1–10. [Google Scholar] [CrossRef]
- Luo, L.; Dong, J.; Zhang, Q.; Shi, S. A distributed short-term load forecasting method in consideration of holiday distinction. Sustain. Energy Grids Netw. 2024, 38, 101296. [Google Scholar] [CrossRef]
- Wang, Y.; Liao, W.; Chang, Y. Gated recurrent unit network-based short-term photovoltaic forecasting. Energies 2018, 11, 2163. [Google Scholar] [CrossRef]
- Chiu, M.; Hsu, H.; Chen, K.; Wen, C. A hybrid CNN-GRU based probabilistic model for load forecasting from individual household to commercial building. Energy Rep. 2023, 9, 94–105. [Google Scholar] [CrossRef]
- Chen, K.; Chen, K.; Wang, Q.; He, Z.; Hu, J.; He, J. Short-term load forecasting with deep residual networks. IEEE Trans. Smart Grid 2019, 10, 3943–3952. [Google Scholar] [CrossRef]








| Parameter | Value |
|---|---|
| Nodes in the GRU layers | 256/256/128/32 |
| Number of heads in MHSAM | 2 |
| Dimension per head in MHSAM | 16 |
| Nodes in fully connected layers | 200/100/24/1 |
| Nodes in the ResNet block | 20/24/1 |
| Epoch | 500 |
| Batch size | 128 |
| Optimizer | Adam |
| Loss | MSE |
| Model | TEE (GWh) | MAPE (%) | MFE (%) |
|---|---|---|---|
| Group (1) | 168.9 | 4.74 | 6.68 |
| Group (2) | 77.83 | 2.06 | 2.73 |
| Group (3) | 59.41 | 1.65 | 2.23 |
| Model | TEE (kWh) | MAPE (%) | MFE (%) |
|---|---|---|---|
| Group (1) | 5.72 | 9.62 | 12.02 |
| Group (2) | 3.34 | 5.75 | 7.25 |
| Group (3) | 3.18 | 5.52 | 7.03 |
| Model | TEE (TWh) | MAPE (%) | MFE (%) |
|---|---|---|---|
| Group (1) | 2.80 | 4.28 | 9.03 |
| Group (2) | 1.14 | 1.83 | 2.87 |
| Group (3) | 0.99 | 1.57 | 2.37 |
| Model | Result of Train 1 | Result of Train 2 | Result of Train 3 | Result of Train 4 | Result of Train 5 | Mean Result |
|---|---|---|---|---|---|---|
| LSTM- ResNet | 1.76 | 1.88 | 1.78 | 1.72 | 2.02 | 1.83 |
| GRU-MHSAM-ResNet | 1.57 | 1.59 | 1.58 | 1.57 | 1.56 | 1.57 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, X.; Zhou, F.; Xu, R.; Jiang, Y.; Yang, H. A Hybrid GRU-MHSAM-ResNet Model for Short-Term Power Load Forecasting. Processes 2025, 13, 3646. https://doi.org/10.3390/pr13113646
Yang X, Zhou F, Xu R, Jiang Y, Yang H. A Hybrid GRU-MHSAM-ResNet Model for Short-Term Power Load Forecasting. Processes. 2025; 13(11):3646. https://doi.org/10.3390/pr13113646
Chicago/Turabian StyleYang, Xin, Fan Zhou, Ran Xu, Yiwen Jiang, and Hejun Yang. 2025. "A Hybrid GRU-MHSAM-ResNet Model for Short-Term Power Load Forecasting" Processes 13, no. 11: 3646. https://doi.org/10.3390/pr13113646
APA StyleYang, X., Zhou, F., Xu, R., Jiang, Y., & Yang, H. (2025). A Hybrid GRU-MHSAM-ResNet Model for Short-Term Power Load Forecasting. Processes, 13(11), 3646. https://doi.org/10.3390/pr13113646

