Heterogeneous Federated Learning via Knowledge Transfer Guided by Global Pseudo Proxy Data
Abstract
1. Introduction
- (1)
- Global to local knowledge transfer to mitigate heterogeneity data induced bias: We propose a global knowledge-guided local model optimization module that effectively transfers knowledge from the global model to local models using global pseudo-data, thereby addressing the classification bias caused by data heterogeneity.
- (2)
- Noise filtered generation for robust pseudo data construction: We design an optimization and filtering mechanism for pseudo data generation, which mitigates the negative impact of noisy samples and ensures the fidelity of transferred knowledge.
- (3)
- Extensive empirical validation under heterogeneous settings: We validate the proposed approach on widely used benchmark datasets and demonstrate superior performance in terms of federated classification accuracy compared to state-of-the-art models under non-IID data distributions.
2. Related Work
2.1. Heterogeneous Federated Learning
2.2. Federated Learning with Knowledge Distillation
3. Methodology
3.1. Problem Definition
3.2. Localized Personalized Knowledge Transfer
3.3. Local Model Optimization Guided by Global Knowledge
3.4. Global Aggregation of Local Models
4. Experiments
4.1. Dataset Description
- (1)
- Benchmark Datasets
- (2)
- Dirichlet-based Non-IID Datasets
4.2. Experimental Setting and Evaluation Metrics
4.3. Comparative Analysis of Experimental Results
4.4. Analysis of Module Effectiveness
4.5. Ablation Study
- (1)
- Train the local model solely based on the prediction loss between the local model’s output and the true labels from a single batch of user data.
- (2)
- Building on experiment (1), a global generator is trained on the server side. The final total loss for training the local model includes both the cross-entropy loss between the local model’s predictions and all true user labels, and a latent loss measuring the discrepancy between the local model’s predictions on real user data and those on pseudo-samples generated by the global generator. No pseudo-sample filter is applied here; the global generator attempts to produce pseudo-samples that approximate the user’s real data as closely as possible.
- (3)
- Based on experiment (2), a pseudo-sample filter is added to assist the training of the global generator. This filter removes noisy information from the generated samples, resulting in more realistic pseudo-samples.
- (4)
- Building on experiment (3), knowledge distillation from the global model to the local model is incorporated during local training. Specifically, the KL divergence between the local model’s and global model’s predictions is added as a distillation loss with a weighting factor of 0.01, enabling knowledge transfer between the server and clients.
- (5)
- Extending experiment (4), a local generator is randomly assigned to one user for training, analogous to the global generator. A pseudo-sample filter is also applied to the local generator to remove noise. During local model training, an attention mechanism is introduced to focus the generator’s output on key global information. Additionally, a KL divergence loss between the local model’s and global model’s predictions is constructed as a generative distillation loss with a weight of 0.01. This enables the powerful global model to guide the training of the local model, with knowledge distillation from experiment (4) serving as auxiliary supervision.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Shrivastava, A. Privacy-Centric AI: Navigating the Landscape with Federated Learning. Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET) 2024, 12, 357–363. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, FL, USA, 20–22 April 2017; PMLR. pp. 1273–1282. [Google Scholar]
- Smith, V.; Chiang, C.K.; Sanjabi, M.; Talwalkar, A.S. Federated multi-task learning. Adv. Neural Inf. Process. Syst. 30 (NIPS) 2017, 30. [Google Scholar]
- Lim, W.Y.B.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.; Miao, C. Federated learning in mobile edge networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2020, 22, 2031–2063. [Google Scholar] [CrossRef]
- Wang, L. Heterogeneous data and big data analytics. Autom. Control Inf. Sci. 2017, 3, 8–15. [Google Scholar] [CrossRef]
- Li, H.; Reynolds, J.F. On definition and quantification of heterogeneity. Oikos 1995, 73, 280–284. [Google Scholar] [CrossRef]
- Huang, W.; Ye, M.; Du, B. Learn from others and be yourself in heterogeneous federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10143–10153. [Google Scholar]
- Chen, H.; Wang, Y.; Xu, C.; Yang, Z.; Liu, C.; Shi, B.; Xu, C.; Xu, C.; Tian, Q. Data-free learning of student networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3514–3522. [Google Scholar]
- Pan, H.; Wang, C.; Qiu, M.; Zhang, Y.; Li, Y.; Huang, J. Meta-KD: A meta knowledge distillation framework for language model compression across domains. arXiv 2020, arXiv:2012.01266. [Google Scholar]
- Zhu, Z.; Hong, J.; Zhou, J. Data-free knowledge distillation for heterogeneous federated learning. In Proceedings of the 38 th International Conference on Machine Learning, Virtual, 18–24 July 2020; PMLR. pp. 12878–12889. [Google Scholar]
- Yurochkin, M.; Agarwal, M.; Ghosh, S.; Greenewald, K.; Hoang, N.; Khazaeni, Y. Bayesian nonparametric federated learning of neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR. pp. 7252–7261. [Google Scholar]
- Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated learning with matched averaging. arXiv 2020, arXiv:2002.06440. [Google Scholar] [CrossRef]
- Chen, H.Y.; Chao, W.L. Fedbe: Making bayesian model ensemble applicable to federated learning. arXiv 2020, arXiv:2009.01974. [Google Scholar]
- Yu, F.; Zhang, W.; Qin, Z.; Xu, Z.; Wang, D.; Liu, C.; Tian, Z.; Chen, X. Fed2: Feature-aligned federated learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 2066–2074. [Google Scholar]
- Corinzia, L.; Beuret, A.; Buhmann, J.M. Variational federated multi-task learning. arXiv 2019, arXiv:1906.06268. [Google Scholar]
- Jiang, Y.; Konečný, J.; Rush, K.; Kannan, S. Improving federated learning personalization via model agnostic meta learning. arXiv 2019, arXiv:1909.12488. [Google Scholar]
- Duan, M.; Liu, D.; Chen, X.; Tan, Y.; Ren, J.; Qiao, L.; Liang, L. Astraea: Self-balancing federated learning for improving classification accuracy of mobile deep learning applications. In Proceedings of the 2019 IEEE 37th International Conference on Computer Design (ICCD), Abu Dhabi, United Arab Emirates, 17–20 November 2019; pp. 246–254. [Google Scholar]
- Alansary, S.A.; Ayyad, S.M.; Talaat, F.M.; Saafan, M.M. Emerging AI threats in cybercrime: A review of zero-day attacks via machine, deep, and federated learning. Knowl. Inf. Syst. 2025, 67, 10951–10987. [Google Scholar] [CrossRef]
- Yang, L.; Miao, Y.; Liu, Z.; Liu, Z.; Li, X.; Kuang, D.; Li, H.; Deng, R.H. Enhanced model poisoning attack and multi-strategy defense in federated learning. IEEE Trans. Inf. Forensics Secur. 2025, 20, 3877–3892. [Google Scholar] [CrossRef]
- Acar, D.A.E.; Zhao, Y.; Zhu, R.; Matas, R.; Mattina, M.; Whatmough, P.; Saligrama, V. Debiasing model updates for improving personalized federated training. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; PMLR. pp. 21–31. [Google Scholar]
- Shoham, N.; Avidor, T.; Keren, A.; Israel, N.; Benditkis, D.; Mor-Yosef, L.; Zeitak, I. Overcoming forgetting in federated learning on non-iid data. arXiv 2019, arXiv:1910.07796. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- T Dinh, C.; Tran, N.; Nguyen, J. Personalized federated learning with moreau envelopes. Adv. Neural Inf. Process. Syst. 2020, 33, 21394–21405. [Google Scholar]
- Li, Q.; He, B.; Song, D. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10713–10722. [Google Scholar]
- Mu, X.; Shen, Y.; Cheng, K.; Geng, X.; Fu, J.; Zhang, T.; Zhang, Z. Fedproc: Prototypical contrastive federated learning on non-iid data. Future Gener. Comput. Syst. 2023, 143, 93–104. [Google Scholar] [CrossRef]
- Yoon, T.; Shin, S.; Hwang, S.J.; Yang, E. Fedmix: Approximation of mixup under mean augmented federated learning. arXiv 2021, arXiv:2107.00233. [Google Scholar] [CrossRef]
- Xu, X.; Li, H.; Li, Z.; Zhou, X. Safe: Synergic data filtering for federated learning in cloud-edge computing. IEEE Trans. Ind. Inform. 2022, 19, 1655–1665. [Google Scholar] [CrossRef]
- Liu, L.; Zhang, J.; Song, S.H.; Letaief, K.B. Communication-efficient federated distillation with active data sampling. In Proceedings of the ICC 2022-IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 201–206. [Google Scholar]
- He, C.; Annavaram, M.; Avestimehr, S. Group knowledge transfer: Federated learning of large cnns at the edge. Adv. Neural Inf. Process. Syst. 2020, 33, 14068–14080. [Google Scholar]
- Fang, X.; Ye, M. Robust federated learning with noisy and heterogeneous clients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10072–10081. [Google Scholar]
- Huang, W.; Ye, M.; Du, B.; Gao, X. Few-shot model agnostic federated learning. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 7309–7316. [Google Scholar]
- Li, D.; Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar] [CrossRef]
- Chang, H.; Shejwalkar, V.; Shokri, R.; Houmansadr, A. Cronus: Robust and heterogeneous collaborative learning with black-box knowledge transfer. arXiv 2019, arXiv:1912.11279. [Google Scholar] [CrossRef]
- Ozkara, K.; Singh, N.; Data, D.; Diggavi, S. Quped: Quantized personalization via distillation with applications to federated learning. Adv. Neural Inf. Process. Syst. 2021, 34, 3622–3634. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
- Zhang, L.; Shen, L.; Ding, L.; Tao, D.; Duan, L.Y. Fine-tuning global model via data-free knowledge distillation for non-iid federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10174–10183. [Google Scholar]
- Zhang, J.; Chen, C.; Li, B.; Lyu, L.; Wu, S.; Ding, S.; Shen, C.; Wu, C. Dense: Data-free one-shot federated learning. Adv. Neural Inf. Process. Syst. 2022, 35, 21414–21428. [Google Scholar]
- Heinbaugh, C.E.; Luz-Ricca, E.; Shao, H. Data-free one-shot federated learning under very high statistical heterogeneity. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Wu, Z.; Sun, S.; Wang, Y.; Liu, M.; Pan, Q.; Zhang, J.; Li, Z.; Liu, Q. Exploring the distributed knowledge congruence in proxy-data-free federated distillation. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–34. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, C.; Zhuang, W.; Lyu, L. Target: Federated class-continual learning via exemplar-free distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 4782–4793. [Google Scholar]
- Zhu, Y.; Li, X.; Wu, Z.; Wu, D.; Hu, M.; Li, R.H. FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning. arXiv 2024, arXiv:2404.14061. [Google Scholar]
- Wang, S.; Fu, Y.; Li, X.; Lan, Y.; Gao, M. DFRD: Data-Free Robustness Distillation for Heterogeneous Federated Learning. Adv. Neural Inf. Process. Syst. 2023, 36, 17854–17866. [Google Scholar]
- Zhao, S.; Liao, T.; Fu, L.; Chen, C.; Bian, J.; Zheng, Z. Data-free knowledge distillation via generator-free data generation for Non-IID federated learning. Neural Netw. 2024, 179, 106627. [Google Scholar] [CrossRef]















| Parameter Name | Value |
|---|---|
| Number of Clients | 20 |
| Global Training Rounds | 200 |
| Local Training Rounds | 20 |
| Local Training Batch Size | 32 |
| Local Training Learning Rate | 0.01 |
| Generator Batch Size | 32 |
| Backbone Network | ResNet-18 |
| Method | |||
|---|---|---|---|
| FedAvg | 88.43% | 90.31% | 94.73% |
| FedProx | 87.94% | 90.21% | 94.55% |
| FedDistill | 60.29% | 60.89% | 80.06% |
| FedEnsemble | 89.35% | 91.50% | 94.70% |
| FedGen | 94.52% | 95.52% | 97.29% |
| FedKDG | 97.08% | 98.83% | 99.27% |
| Method | |||
|---|---|---|---|
| FedAvg | 66.89% | 70.46% | 78.53% |
| FedProx | 65.93% | 69.67% | 77.71% |
| FedDistill | 40.52% | 45.15% | 60.26% |
| FedEnsemble | 67.33% | 70.69% | 78.65% |
| FedGen | 75.06% | 78.53% | 84.20% |
| FedKDG | 88.47% | 93.00% | 94.61% |
| Method | |||
|---|---|---|---|
| FedAvg | 33.29% | 39.85% | 48.46% |
| FedProx | 33.20% | 39.38% | 48.05% |
| FedDistill | 41.79% | 33.19% | 28.84% |
| FedEnsemble | 36.81% | 42.13% | 49.30% |
| FedGen | 40.89% | 45.29% | 53.42% |
| FedKDG | 45.85% | 51.21% | 64.48% |
| Method | |||
|---|---|---|---|
| FedAvg | 14.35% | 17.56% | 21.07% |
| FedProx | 14.31% | 17.35% | 20.89% |
| FedDistill | 18.01% | 14.62% | 12.54% |
| FedEnsemble | 15.87% | 18.56% | 21.43% |
| FedGen | 17.63% | 19.95% | 23.23% |
| FedKDG | 19.76% | 22.42% | 28.04% |
| Ex | Global Generator | Filter | Global Distillation | Local Distillation | Accuracy |
|---|---|---|---|---|---|
| (1) | ✗ | ✗ | ✗ | ✗ | 33.29% |
| (2) | ✓ | ✗ | ✗ | ✗ | 41.64% |
| (3) | ✓ | ✓ | ✗ | ✗ | 41.88% |
| (4) | ✓ | ✓ | ✓ | ✗ | 41.50% |
| (5) | ✓ | ✓ | ✓ | ✓ | 45.85% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, W.; Guo, X.; Liu, W.; Sun, F. Heterogeneous Federated Learning via Knowledge Transfer Guided by Global Pseudo Proxy Data. Future Internet 2026, 18, 36. https://doi.org/10.3390/fi18010036
Sun W, Guo X, Liu W, Sun F. Heterogeneous Federated Learning via Knowledge Transfer Guided by Global Pseudo Proxy Data. Future Internet. 2026; 18(1):36. https://doi.org/10.3390/fi18010036
Chicago/Turabian StyleSun, Wenhao, Xiaoxuan Guo, Wenjun Liu, and Fang Sun. 2026. "Heterogeneous Federated Learning via Knowledge Transfer Guided by Global Pseudo Proxy Data" Future Internet 18, no. 1: 36. https://doi.org/10.3390/fi18010036
APA StyleSun, W., Guo, X., Liu, W., & Sun, F. (2026). Heterogeneous Federated Learning via Knowledge Transfer Guided by Global Pseudo Proxy Data. Future Internet, 18(1), 36. https://doi.org/10.3390/fi18010036

