Federated Hybrid Graph Attention Network with Two-Step Optimization for Electricity Consumption Forecasting
Abstract
1. Introduction
- (1)
- The paper introduces FedHMGAT’s novel hybrid approach for local model training, combining (1) a numerical structure graph with Gaussian encoding to transform volatile peak fluctuations into stable covariance features (reducing noise sensitivity), and (2) a multi-scale attention mechanism that extracts periodic consumption patterns through hierarchical feature fusion. This dual-component architecture enables simultaneous modeling of both erratic peaks and periodic trends while preventing overfitting to local noise.
- (2)
- To address parameter heterogeneity in federated aggregation, FedHMGAT proposes a two-step strategy: (1) regularization during local training aligns model parameters across clients, while (2) server-side dynamic fusion customizes aggregation weights according to regional data characteristics. This preserves specialized feature representations while preventing dilution during global model updates, improving cross-regional generalization.
2. Related Works
2.1. Traditional Electricity Consumption Prediction Methods
2.2. Machine Learning-Based Prediction Methods
2.3. Federated Learning Applications
3. Proposed Methodology
- (1)
- Local Model Architecture
- A.
- Hybrid Feature Representation:
- a.
- Constructs numerical structure graphs.
- B.
- Hybrid Model Design:
- a.
- Implements Gaussian encoder for dynamic peak fluctuation modeling, and generates covariance features to suppress noise-induced overfitting.
- b.
- Incorporates multi-scale attention mechanism to capture periodic consumption patterns.
- c.
- Performs adaptive feature fusion for robust predictions.
- (2)
- Two-step Global Optimization
- A.
- Training-stage Regularization to enforce parameter consistency across local models.
- B.
- Server-side Adaptive Fusion.
- (3)
- Integrated Framework Benefits:
- A.
- Simultaneously optimizes local training and global aggregation.
3.1. Problem Formulation and Setup
3.2. Electricity Consumption Demand Forecasting Model Construction
3.2.1. Data Preprocessing and Numerical Structure Graph Construction
3.2.2. Hybrid Graph Attention Network for Electricity Consumption Demand Forecasting
- (1)
- For modeling peak fluctuations in numerical structure graphs:
- ➢
- First Hop: Numerical Correlation Information Filtering
- ➢
- Second Hop: Dependency Relationship Modeling
- ➢
- Third Hop: Higher-order Reasoning Integration as parameterized Gaussian distribution
- (2)
- Multi-scale attention mechanism captures periodic consumption patterns:
3.2.3. Electricity Consumption Forecasting Based on All Components
3.3. Two-Step Parameter Aggregation Strategy for Global Model
- (1)
- Training-Stage Regularization
- (2)
- Server-side adaptive dynamic fusion
4. Experiments
4.1. Dataset and Experiment Settings
- (1)
- The Individual Household Electric Power Consumption Dataset (Data_IHE) records a Parisian family’s electricity usage over nearly four years (December 2006–November 2010) with minute-level sampling, containing detailed attributes including date (dd/mm/yyyy format), time (hh:mm format), global_active_power (minute-averaged active power in kilowatts), global_reactive_power (minute-averaged reactive power in kilowatts), voltage (minute-averaged in volts), global_intensity (minute-averaged current in amperes), and three sub-metering measurements—sub_metering_1 (kitchen appliances like dishwasher, oven, and microwave in watt-hours), sub_metering_2 (laundry room equipment including washing-machine, tumble-drier, refrigerator, and light in watt-hours), and sub_metering_3 (electric water-heater and air-conditioner consumption in watt-hours). Seven of the above nine attributes can be used as features; date and time are removed. The last three attributes are not the electricity consumption of all the circuits in the home. The total electric energy consumptions can be calculated by the following formula:
- (2)
- The electricity consumption dataset from Tetouan, Morocco (Kaggle: Electric Power Consumption [26]) comprises 52,416 observations recorded at 10 min intervals (Data_Kaggle), each characterized by nine features: timestamp (10 min window), ambient temperature, relative humidity, wind speed, and general diffuse flows—a term describing low-temperature fluid emissions (<0.2 °C to ~100 °C) typically associated with geological formations like sulfide mounds and bacterial mats. The dataset also includes power consumption metrics for three designated zones (Zones 1–3), while environmental variables (temperature, humidity, wind speed, and diffuse flows) serve as auxiliary attributes in feature vector f. Zone 3 consumption trends are visualized in Figure 3. Here, the X-axis indicates the number of collected data points, and the Y-axis represents normalized electricity consumption.
- (3)
- The second electricity consumption dataset is obtained from the five regions of China Southern Power Grid’s operational jurisdiction (1 December–31 May 2025) across seven administrative divisions (Data_Southern) (https://pan.baidu.com/s/1F7diPhFfdCIZG01tmRIOeg?pwd=r2vy, accessed on 5 July 2025). The electricity consumption data comes from five regions of the Southern Power Grid, with hourly level consumption records for each region. The user structure, including industries, residents, and others, varies across different regions, resulting in distinct electricity consumption distribution characteristics. Figure 4 shows the consumption data in a region of the China Southern Power Grid, including May Day holiday. Here, the X-axis indicates the number of collected data points (this dataset employs an hour-window), and the Y-axis represents normalized electricity consumption.
4.2. Baselines
- (1)
- The first group combines common electricity consumption forecasting methods with typical federated learning frameworks such as Average Federated Learning AvgFL and Correlation-Based Active Client Selection Strategy FedCor, forming models like AvgFL-SVR, AvgFL-BiLSTM, AvgFL-Multi-Scale attention model (AvgFL-MSattention), FedCor-SVR, FedCor-BiLSTM, and FedCor-Multi-Scale attention model (FedCor-MSattention). Among these, SVR refers to the Support Vector Regression model, BiLSTM refers to the Bidirectional LSTM model, and Multi-Scale attention model refers to the Multi-Scale Attention model.
- (2)
- CNN-LSTM FED [27]: This is a forecasting model based on the augmented Smart* dataset. In this study, electricity consumption data is generated using GANs (Generative Adversarial Networks) to alleviate the overfitting problem of client data through data augmentation.
- (3)
- Adaptive Stacked LSTM [28]: This framework for energy consumption forecasting leverages adaptive learning, federated learning, and edge computing concepts.
- (4)
- SparseMoE [29]: The expert network of the Mixture of Experts (MoE) architecture is implemented through a transformer-based deep learning model called Metaformer. It utilizes Exponential Moving Average (EMA) operations and pooling operators for prediction.
4.3. Discussion on FedHMGAT
4.3.1. Validation on Data_IHE
4.3.2. Validation on Data_kaggle
4.3.3. Validation on Data_Southern
4.3.4. Discussion
4.4. Ablation Experiments for FedHMGAT
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- González, A.M.; Roque, A.S.; García-González, J. Modeling and forecasting electricity prices with input/output hidden Markov models. IEEE Trans. Power Syst. 2005, 20, 13–24. [Google Scholar] [CrossRef]
- Yildiz, B.; Bilbao, J.I.; Sproul, A.B. A review and analysis of regression and machine learning models on commercial building electricity load forecasting. Renew. Sustain. Energy Rev. 2017, 73, 1104–1122. [Google Scholar] [CrossRef]
- Bhattarai, B.P.; Paudyal, S.; Luo, Y.; Mohanpurkar, M.; Cheung, K.; Tonkoski, R.; Hovsapian, R.; Myers, K.S.; Zhang, R.; Zhao, P.; et al. Big data analytics in smart grids: State-of-the-art, challenges, opportunities, and future directions. IET Smart Grid 2019, 2, 141–154. [Google Scholar] [CrossRef]
- Hu, J.; Harmsen, R.; Crijns-Graus, W.; Worrell, E.; van den Broek, M. Identifying barriers to large-scale integration of variable renewable electricity into the electricity market: A literature review of market design. Renew. Sustain. Energy Rev. 2018, 81, 2181–2195. [Google Scholar] [CrossRef]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Bousbiat, H.; Bousselidj, R.; Himeur, Y.; Amira, A.; Bensaali, F.; Fadli, F.; Mansoor, W.; Elmenreich, W. Crossing roads of federated learning and smart grids: Overview, challenges, and perspectives. arXiv 2023, arXiv:2304.08602. [Google Scholar] [CrossRef]
- Silva, F.A.; Orang, O.; Erazo-Costa, F.J.; Silva, P.C.; Barros, P.H.; Ferreira, R.P.; Guimarães, F.G. Time Series Classification Using Federated Convolutional Neural Networks and Image-Based Representations. IEEE Access 2025, 13, 56180–56194. [Google Scholar] [CrossRef]
- Cheng, X.; Li, C.; Liu, X. A review of federated learning in energy systems. In Proceedings of the 2022 IEEE/IAS Industrial and Commercial Power System Asia (I&CPS Asia), Shanghai, China, 8–11 July 2022; pp. 2089–2095. [Google Scholar]
- Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
- Mohd Nizam, M.A.; Sulaiman, S.A.; Ramli, N.A. Predictive Model for Electricity Consumption in Malaysia Using Support Vector Regression. In International Conference on Electrical, Control & Computer Engineering; Springer: Singapore, 2024. [Google Scholar] [CrossRef]
- Chen, H.Y.; Lee, C.H. Electricity consumption prediction for buildings using multiple adaptive network-based fuzzy inference system models and gray relational analysis. Energy Rep. 2019, 5, 1509–1524. [Google Scholar] [CrossRef]
- Gomez, W.; Wang, F.K.; Amogne, Z.E. Electricity Load and Price Forecasting Using a Hybrid Method Based Bidirectional Long Short-Term Memory with Attention Mechanism Model. Int. J. Energy Res. 2023, 2023, 3815063. [Google Scholar] [CrossRef]
- Wang, W.; Shimakawa, H.; Jie, B.; Sato, M.; Kumada, A. BE-LSTM: An LSTM-based framework for feature selection and building electricity consumption prediction on small dataset. J. Build. Eng. 2025, 102, 111910. [Google Scholar] [CrossRef]
- Imani, M. Electrical load-temperature CNN for residential load forecasting. Energy 2021, 227, 120480. [Google Scholar] [CrossRef]
- Wang, S.; Wang, X.; Wang, S.; Wang, D. Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting. Int. J. Electr. Power Energy Syst. 2019, 109, 470–479. [Google Scholar] [CrossRef]
- Lotfipoor, A.; Patidar, S.; Jenkins, D.P. Deep neural network with empirical mode decomposition and Bayesian optimisation for residential load forecasting. Expert Syst. Appl. 2024, 237, 121355. [Google Scholar] [CrossRef]
- Gashler, M.; Ashmore, S. Modeling time series data with deep Fourier neural networks. Neurocomputing 2016, 188, 3–11. [Google Scholar] [CrossRef]
- Liu, Y.; Guan, L.; Hou, C.; Han, H.; Liu, Z.; Sun, Y.; Zheng, M. Wind Power Short-Term Prediction Based on LSTM and Discrete Wavelet Transform. Appl. Sci. 2019, 9, 1108. [Google Scholar] [CrossRef]
- Liang, F.; Zhang, Z.; Lu, H.; Leung, V.; Guo, Y.; Hu, X. Communication-efficient large-scale distributed deep learning: A comprehensive survey. arXiv 2024, arXiv:2404.06114. [Google Scholar]
- Huang, W.; Wang, D.; Ouyang, X.; Wan, J.; Liu, J.; Li, T. Multimodal federated learning: Concept, methods, applications and future directions. Inf. Fusion 2024, 112, 102576. [Google Scholar] [CrossRef]
- Jia, N.; Qu, Z.; Ye, B.; Wang, Y.; Hu, S.; Guo, S. A comprehensive survey on communication-efficient federated learning in mobile edge environments. IEEE Commun. Surv. Tutor. 2025. Early access. [Google Scholar] [CrossRef]
- Sun, T.; Li, D.; Wang, B. Decentralized federated averaging. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4289–4301. [Google Scholar] [CrossRef]
- Geyer, R.C.; Klein, T.; Nabi, M. Differentially private federated learning: A client level perspective. arXiv 2017, arXiv:1712.07557. [Google Scholar]
- Venkataramanan, V.; Kaza, S.; Annaswamy, A.M. DER forecast using privacy-preserving federated learning. IEEE Internet Things J. 2022, 10, 2046–2055. [Google Scholar] [CrossRef]
- Ahmadi, A.; Talaei, M.; Sadipour, M.; Amani, A.M.; Jalili, M. Deep federated learning-based privacy-preserving wind power forecasting. IEEE Access 2022, 11, 39521–39530. [Google Scholar] [CrossRef]
- Electric Power Consumption. Available online: https://www.kaggle.com/datasets/fedesoriano/electric-power-consumption (accessed on 1 March 2017).
- de Moraes Sarmento, E.M.; Ribeiro, I.F.; Marciano, P.R.N.; Neris, Y.G.; de Oliveira Rocha, H.R.; Mota, V.F.S.; da Silva Villaça, R. Forecasting energy power consumption using federated learning in edge computing device. Internet Things 2024, 25, 101050. [Google Scholar] [CrossRef]
- Abdulla, N.; Demirci, M.; Ozdemir, S. Smart meter-based energy consumption forecasting for smart cities using adaptive federated learning. Sustain. Energy Grids Netw. 2024, 38, 101342. [Google Scholar] [CrossRef]
- Wang, R.; Bai, L.; Rayhana, R.; Liu, Z. Personalized federated learning for buildings energy consumption forecasting. Energy Build. 2024, 323, 114762. [Google Scholar] [CrossRef]
Models | Region 1 | Region 2 | Region 3 | Region 4 | Region 5 | Region 6 | Region 7 | Region 8 |
---|---|---|---|---|---|---|---|---|
AvgFL-SVR | 0.188 | 0.181 | 0.207 | 0.237 | 0.212 | 0.205 | 0.250 | 0.213 |
AvgFL-BiLSTM | 0.182 | 0.176 | 0.201 | 0.231 | 0.198 | 0.199 | 0.233 | 0.199 |
AvgFL-MSattention | 0.177 | 0.173 | 0.198 | 0.227 | 0.194 | 0.196 | 0.231 | 0.197 |
FedCor-SVR | 0.179 | 0.176 | 0.202 | 0.235 | 0.209 | 0.204 | 0.247 | 0.211 |
FedCor-BiLSTM | 0.166 | 0.164 | 0.193 | 0.229 | 0.195 | 0.199 | 0.231 | 0.199 |
FedCor-MSattention | 0.162 | 0.161 | 0.189 | 0.223 | 0.191 | 0.192 | 0.227 | 0.195 |
CNN-LSTM FED | 0.158 | 0.155 | 0.185 | 0.216 | 0.186 | 0.183 | 0.211 | 0.189 |
Adaptive Stacked LSTM | 0.151 | 0.147 | 0.177 | 0.217 | 0.181 | 0.181 | 0.213 | 0.181 |
SparseMoE | 0.147 | 0.147 | 0.178 | 0.217 | 0.183 | 0.176 | 0.219 | 0.183 |
FedHMGAT | 0.141 | 0.139 | 0.172 | 0.210 | 0.178 | 0.173 | 0.209 | 0.180 |
Models | Region 1 | Region 2 | Region 3 | Region 4 | Region 5 | Region 6 | Region 7 | Region 8 |
---|---|---|---|---|---|---|---|---|
FedHMGAT-local | 0.156 | 0.154 | 0.183 | 0.226 | 0.183 | 0.186 | 0.222 | 0.196 |
FedHMGAT | 0.141 | 0.139 | 0.172 | 0.210 | 0.178 | 0.173 | 0.209 | 0.180 |
Models | Zone 1 | Zone 2 | Zone 3 | Zone 4 | Zone 5 | Zone 6 |
---|---|---|---|---|---|---|
AvgFL-SVR | 0.178 | 0.179 | 0.258 | 0.257 | 0.215 | 0.217 |
AvgFL-BiLSTM | 0.162 | 0.166 | 0.234 | 0.237 | 0.197 | 0.197 |
AvgFL-MSattention | 0.161 | 0.164 | 0.233 | 0.234 | 0.194 | 0.196 |
FedCor-SVR | 0.171 | 0.173 | 0.251 | 0.253 | 0.213 | 0.211 |
FedCor-BiLSTM | 0.157 | 0.159 | 0.229 | 0.229 | 0.202 | 0.201 |
FedCor-MSattention | 0.157 | 0.161 | 0.227 | 0.229 | 0.196 | 0.198 |
CNN-LSTM FED | 0.139 | 0.144 | 0.207 | 0.209 | 0.192 | 0.194 |
Adaptive Stacked LSTM | 0.137 | 0.140 | 0.215 | 0.216 | 0.188 | 0.191 |
SparseMoE | 0.131 | 0.133 | 0.209 | 0.208 | 0.189 | 0.190 |
FedHMGAT | 0.127 | 0.131 | 0.199 | 0.197 | 0.181 | 0.183 |
Models | Zone 1 | Zone 2 | Zone 3 | Zone 4 | Zone 5 | Zone 6 |
---|---|---|---|---|---|---|
FedHMGAT-local | 0.138 | 0.143 | 0.209 | 0.206 | 0.188 | 0.191 |
FedHMGAT | 0.127 | 0.131 | 0.199 | 0.197 | 0.181 | 0.183 |
Models | Zone 1 | Zone 2 | Zone 3 | Zone 4 | Zone 5 | Zone 6 | Zone 7 | Zone 8 | Zone 9 | Zone 10 |
---|---|---|---|---|---|---|---|---|---|---|
AvgFL-SVR | 0.164 | 0.163 | 0.193 | 0.191 | 0.185 | 0.181 | 0.199 | 0.200 | 0.229 | 0.222 |
AvgFL-BiLSTM | 0.152 | 0.149 | 0.183 | 0.182 | 0.176 | 0.178 | 0.189 | 0.190 | 0.217 | 0.213 |
AvgFL-MSattention | 0.149 | 0.149 | 0.179 | 0.178 | 0.168 | 0.170 | 0.189 | 0.188 | 0.213 | 0.209 |
FedCor-SVR | 0.155 | 0.153 | 0.190 | 0.188 | 0.182 | 0.181 | 0.198 | 0.197 | 0.222 | 0.217 |
FedCor-BiLSTM | 0.147 | 0.146 | 0.180 | 0.179 | 0.173 | 0.170 | 0.189 | 0.187 | 0.212 | 0.209 |
FedCor-MSattention | 0.144 | 0.145 | 0.177 | 0.176 | 0.167 | 0.165 | 0.186 | 0.187 | 0.209 | 0.205 |
CNN-LSTM FED | 0.139 | 0.138 | 0.171 | 0.169 | 0.162 | 0.164 | 0.179 | 0.180 | 0.194 | 0.192 |
Adaptive Stacked LSTM | 0.136 | 0.135 | 0.169 | 0.166 | 0.155 | 0.157 | 0.181 | 0.179 | 0.203 | 0.204 |
SparseMoE | 0.133 | 0.136 | 0.166 | 0.164 | 0.158 | 0.156 | 0.180 | 0.180 | 0.199 | 0.202 |
FedHMGAT | 0.126 | 0.125 | 0.161 | 0.159 | 0.147 | 0.149 | 0.171 | 0.169 | 0.197 | 0.196 |
Models | Zone 1 | Zone 2 | Zone 3 | Zone 4 | Zone 5 | Zone 6 | Zone 7 | Zone 8 | Zone 9 | Zone 10 |
---|---|---|---|---|---|---|---|---|---|---|
FedHMGAT-local | 0.146 | 0.149 | 0.179 | 0.176 | 0.170 | 0.167 | 0.189 | 0.188 | 0.211 | 0.205 |
FedHMGAT | 0.126 | 0.125 | 0.161 | 0.159 | 0.147 | 0.149 | 0.171 | 0.169 | 0.197 | 0.196 |
Models | Zone 1 | Zone 2 | Zone 3 | Zone 4 | Zone 5 | Zone 6 | Zone 7 | Zone 8 | Zone 9 | Zone 10 |
---|---|---|---|---|---|---|---|---|---|---|
FedHMGAT-temproal | 0.140 | 0.135 | 0.167 | 0.166 | 0.165 | 0.165 | 0.182 | 0.183 | 0.201 | 0.205 |
FedHMGAT-AE | 0.129 | 0.128 | 0.161 | 0.169 | 0.162 | 0.164 | 0.179 | 0.180 | 0.194 | 0.192 |
FedHMGAT-Avg | 0.131 | 0.130 | 0.166 | 0.163 | 0.153 | 0.153 | 0.175 | 0.172 | 0.200 | 0.202 |
FedHMGAT-AttAE | 0.127 | 0.129 | 0.163 | 0.161 | 0.148 | 0.149 | 0.173 | 0.170 | 0.199 | 0.199 |
FedHMGAT | 0.126 | 0.125 | 0.161 | 0.159 | 0.147 | 0.149 | 0.171 | 0.169 | 0.197 | 0.196 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, H.; Ji, X.; Liu, Q.; Zeng, L.; Ai, Y.; Dai, H. Federated Hybrid Graph Attention Network with Two-Step Optimization for Electricity Consumption Forecasting. Energies 2025, 18, 4465. https://doi.org/10.3390/en18174465
Yang H, Ji X, Liu Q, Zeng L, Ai Y, Dai H. Federated Hybrid Graph Attention Network with Two-Step Optimization for Electricity Consumption Forecasting. Energies. 2025; 18(17):4465. https://doi.org/10.3390/en18174465
Chicago/Turabian StyleYang, Hao, Xinwu Ji, Qingchan Liu, Lukun Zeng, Yuan Ai, and Hang Dai. 2025. "Federated Hybrid Graph Attention Network with Two-Step Optimization for Electricity Consumption Forecasting" Energies 18, no. 17: 4465. https://doi.org/10.3390/en18174465
APA StyleYang, H., Ji, X., Liu, Q., Zeng, L., Ai, Y., & Dai, H. (2025). Federated Hybrid Graph Attention Network with Two-Step Optimization for Electricity Consumption Forecasting. Energies, 18(17), 4465. https://doi.org/10.3390/en18174465