pFedZKD: A One-Shot Personalized Federated Learning Framework via Evolutionary Architecture Search and Data-Free Distillation
Abstract
1. Introduction
- We propose pFedZKD, a data-free one-shot federated learning framework tailored for structurally heterogeneous scenarios. The proposed framework breaks away from the prevailing reliance on predefined homogeneous model architectures by introducing a decouple-and-reconstruct paradigm. Without requiring structural alignment across clients, pFedZKD enables efficient personalized collaboration while preserving data privacy.
- We design PSO-FedNAS, an adaptive federated neural architecture search algorithm based on particle swarm optimization. PSO-FedNAS empowers each client to autonomously evolve a customized convolutional neural network architecture according to its local data distribution. This design enables architectural heterogeneity and personalization without relying on cloud-side computational resources.
- We develop a structure-agnostic multi-teacher zero-shot knowledge distillation (Multi-ZSKD) mechanism for global knowledge aggregation. To address parameter incompatibility induced by model heterogeneity, the proposed mechanism elevates the aggregation process from the parameter space to the semantic space through generative pseudo-samples. By exploiting the ensemble knowledge of heterogeneous teachers, Multi-ZSKD enables robust data-free knowledge transfer under a single communication round.
- We conduct extensive empirical evaluations on multiple non-IID benchmark datasets. Experimental results demonstrate that, under simultaneous data and model heterogeneity, pFedZKD consistently outperforms state-of-the-art methods in terms of personalized accuracy, communication efficiency, and global generalization capability.
2. Related Work and Preliminaries
2.1. Personalized Federated Learning
2.2. Federated Distillation
2.3. Preliminaries
2.3.1. Conventional Federated Learning
| Algorithm 1 Federated Averaging (FedAvg) |
|
1: Input: Private training datasets ; initial global model ; number of clients K; communication rounds R; local epochs E; batch size B; client fraction C; learning rate 2: Output: Final global model 3: 4: Server-Side Execution: 5: for to do 6: Sample a client subset with 7: for all clients in parallel do 8: 9: end for 10: 11: end for 12: 13: Function ClientUpdate: 14: 15: for to E do 16: for all mini-batches with do 17: 18: end for 19: end for 20: return |
2.3.2. Knowledge Distillation
2.3.3. Particle Swarm Optimization
3. Proposed Method: pFedZKD Framework
3.1. Overview of the Decouple-and-Reconstruct Paradigm
3.2. Client-Side Decoupling: Personalized Architecture Search via PSO-FedNAS
3.2.1. Search Space Definition and Particle Encoding
3.2.2. Fitness Evaluation and Local Optimization Strategy
| Algorithm 2 PSO-FedNAS: Client-Side Architecture Autonomy |
|
3.2.3. Handling Model Heterogeneity via Architecture Autonomy
3.3. Server-Side Reconstruction: Structure-Agnostic Knowledge Aggregation
3.3.1. Model-Specific Data Inversion Without Real Data
| Algorithm 3 Server-Side Model-Specific Pseudo-Image Inversion |
|
3.3.2. Soft Label Construction from Heterogeneous Teachers
3.3.3. Multi-Teacher Zero-Shot Knowledge Distillation
3.4. Communication and Computational Analysis
| Algorithm 4 pFedZKD: Personalized One-Shot Federated Learning via PSO-FedNAS and Multi-ZSKD |
|
4. Experiments, Results and Analysis
4.1. Experimental Setup
4.1.1. Datasets and Non-IID Partitioning
- MNIST and Fashion-MNIST: MNIST is a foundational benchmark in handwritten digit recognition, consisting of standardized grayscale images. To introduce a more challenging task while preserving the same spatial resolution, we additionally employ Fashion-MNIST, which replaces digit classes with ten categories of clothing items (e.g., coats and footwear). These two datasets are primarily used to assess the learning capability and stability of the models under low visual complexity and highly structured pattern recognition scenarios.
- SVHN: The Street View House Numbers (SVHN) dataset represents a more realistic visual scenario with increased complexity. Unlike the clean backgrounds in MNIST, SVHN contains over 600,000 color images that are significantly affected by variations in illumination, motion blur, and background clutter. This dataset is employed to evaluate the robustness of the proposed framework when handling complex natural scenes and substantial noise interference.
- CIFAR-10: As a mainstream benchmark for generic object recognition, CIFAR-10 consists of ten mutually exclusive object categories (e.g., airplanes, automobiles, and animals). The high intra-class variability and complex background textures of CIFAR-10 pose substantial challenges to the feature representation learning capability of personalized models.
4.1.2. Hyperparameter Settings and Baselines
4.2. Performance Comparison
4.2.1. Personalized Model Performance on Heterogeneous Clients
4.2.2. Server-Side Comparison of pFedZKD with Baseline and SOTA Methods
4.2.3. Robustness Under Extreme Data Heterogeneity
4.2.4. Visualization of Generated Pseudo Images
4.3. Ablation Study
4.3.1. Impact of Architecture Search (PSO-FedNAS)
- Fixed Homogeneous Architecture (pFedZKD-Hom): the client-side architecture search mechanism is removed, and all clients are forced to deploy an identical model architecture, thereby simulating a conventional homogeneous federated learning setting;
- Random Heterogeneous Architecture (pFedZKD-RandPool): the search optimization process is removed, and each client is randomly assigned a network architecture from a predefined heterogeneous model pool, simulating a heterogeneous federated learning scenario without optimization guidance.
- Except for the strategy used to generate client models, these variants share the same data partitioning scheme, training hyperparameters, and distillation procedure as the complete pFedZKD framework.
- CNN1 & CNN2: lightweight convolutional networks. CNN1 comprises two convolutional layers (32–64 channels) followed by a single fully connected layer, making it suitable for extremely resource-constrained devices; CNN2 further enhances feature representation by increasing the depth of convolutional stacking and the dimensionality of the fully connected layer (256);
- MobileNetV2: an efficient architecture based on depthwise separable convolutions, which achieves strong classification performance while maintaining low computational complexity, representing a typical mobile-oriented network design;
- VGG11: a moderately to relatively high-capacity model built upon deep convolutional stacking, used to emulate edge nodes with comparatively sufficient computational resources;
- ResNet18: a model with higher structural complexity that incorporates residual connections, featuring a larger parameter scale and a more complex optimization landscape, and serving to characterize nodes with stronger computational capabilities.
- By randomly assigning these architectures to clients, the resulting configuration effectively captures device heterogeneity in realistic edge environments, providing a reasonable baseline for evaluating the optimization capability of PSO-FedNAS under heterogeneous resource constraints.
4.3.2. Impact of Multi-Teacher Zero-Shot Knowledge Distillation (ZSKD)
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawit, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; PMLR: Cambridge, MA, USA, 2017; Volume 54, pp. 1273–1282. [Google Scholar]
- Chaddad, A.; Wu, Y.; Desrosiers, C. Federated Learning for Healthcare Applications. IEEE Internet Things J. 2024, 11, 7339–7358. [Google Scholar] [CrossRef]
- Wang, X.; Li, Z.; Jin, S.; Zhang, J. Achieving Linear Speedup in Asynchronous Federated Learning with Heterogeneous Clients. IEEE Trans. Mob. Comput. 2025, 24, 435–448. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
- Tan, A.Z.; Yu, H.; Cui, L.; Yang, Q. Towards Personalized Federated Learning. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 9587–9603. [Google Scholar] [CrossRef]
- Dinh, C.T.; Tran, N.H.; Nguyen, T.D. Personalized Federated Learning with Moreau Envelopes. arXiv 2022, arXiv:2006.08848. [Google Scholar] [CrossRef]
- Collins, L.; Hassani, H.; Mokhtari, A.; Shakkottai, S. Exploiting Shared Representations for Personalized Federated Learning. arXiv 2021, arXiv:2102.07078. [Google Scholar]
- Fallah, A.; Mokhtari, A.; Ozdaglar, A. Personalized Federated Learning: A Meta-Learning Approach. arXiv 2020, arXiv:2002.07948. [Google Scholar] [CrossRef]
- Zhan, Z.H.; Li, J.Y.; Zhang, J. Evolutionary deep learning: A survey. Neurocomputing 2022, 483, 42–58. [Google Scholar] [CrossRef]
- Liu, Y.; Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G.; Tan, K.C. A Survey on Evolutionary Neural Architecture Search. IEEE Trans. Neural Networks Learn. Syst. 2023, 34, 550–570. [Google Scholar] [CrossRef]
- Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable Architecture Search. arXiv 2019, arXiv:1806.09055. [Google Scholar] [CrossRef]
- Akimoto, Y.; Shirakawa, S.; Yoshinari, N.; Uchida, K.; Saito, S.; Nishida, K. Adaptive Stochastic Natural Gradient Method for One-Shot Neural Architecture Search. arXiv 2019, arXiv:1905.08537. [Google Scholar] [CrossRef]
- Li, L.; Khodak, M.; Balcan, M.F.; Talwalkar, A. Geometry-Aware Gradient Algorithms for Neural Architecture Search. arXiv 2021, arXiv:2004.07802. [Google Scholar] [CrossRef]
- Lu, Z.; Whalen, I.; Boddeti, V.; Dhebar, Y.; Deb, K.; Goodman, E.; Banzhaf, W. NSGA-Net: Neural Architecture Search using Multi-Objective Genetic Algorithm. arXiv 2019, arXiv:1810.03522. [Google Scholar]
- Lu, Z.; Deb, K.; Goodman, E.; Banzhaf, W.; Boddeti, V.N. NSGANetV2: Evolutionary Multi-Objective Surrogate-Assisted Neural Architecture Search. arXiv 2020, arXiv:2007.10396. [Google Scholar]
- Sinha, N.; Chen, K.W. Evolving Neural Architecture Using One Shot Model. arXiv 2020, arXiv:2012.12540. [Google Scholar] [CrossRef]
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
- Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. A Particle Swarm Optimization-Based Flexible Convolutional Autoencoder for Image Classification. IEEE Trans. Neural Networks Learn. Syst. 2019, 30, 2295–2309. [Google Scholar] [CrossRef] [PubMed]
- Wang, B.; Xue, B.; Zhang, M. Surrogate-Assisted Particle Swarm Optimization for Evolving Variable-Length Transferable Blocks for Image Classification. IEEE Trans. Neural Networks Learn. Syst. 2022, 33, 3727–3740. [Google Scholar] [CrossRef]
- Fernandes, F.E., Jr.; Yen, G.G. Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol. Comput. 2019, 49, 62–74. [Google Scholar] [CrossRef]
- He, C.; Annavaram, M.; Avestimehr, S. Towards Non-I.I.D. and Invisible Data with FedNAS: Federated Deep Learning via Neural Architecture Search. arXiv 2021, arXiv:2004.08546. [Google Scholar]
- Dudziak, L.; Laskaridis, S.; Fernandez-Marques, J. FedorAS: Federated Architecture Search under system heterogeneity. arXiv 2022, arXiv:2206.11239. [Google Scholar] [CrossRef]
- Zhou, Y.; Pu, G.; Ma, X.; Li, X.; Wu, D. Distilled One-Shot Federated Learning. arXiv 2021, arXiv:2009.07999. [Google Scholar] [CrossRef]
- Zhang, J.; Chen, C.; Li, B.; Lyu, L.; Wu, S.; Ding, S.; Shen, C.; Wu, C. DENSE: Data-Free One-Shot Federated Learning. arXiv 2022, arXiv:2112.12371. [Google Scholar]
- Heinbaugh, C.E.; Luz-Ricca, E.; Shao, H. Data-Free One-Shot Federated Learning Under Very High Statistical Heterogeneity. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. FedBN: Federated Learning on Non-IID Features via Local Batch Normalization. arXiv 2021, arXiv:2102.07623. [Google Scholar]
- Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and Robust Federated Learning Through Personalization. arXiv 2021, arXiv:2012.04221. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. arXiv 2020, arXiv:1812.06127. [Google Scholar] [CrossRef]
- Liang, P.P.; Liu, T.; Ziyin, L.; Allen, N.B.; Auerbach, R.P.; Brent, D.; Salakhutdinov, R.; Morency, L.P. Think Locally, Act Globally: Federated Learning with Local and Global Representations. arXiv 2020, arXiv:2001.01523. [Google Scholar] [CrossRef]
- Arivazhagan, M.G.; Aggarwal, V.; Singh, A.K.; Choudhary, S. Federated Learning with Personalization Layers. arXiv 2019, arXiv:1912.00818. [Google Scholar] [CrossRef]
- Smith, V.; Chiang, C.K.; Sanjabi, M.; Talwalkar, A. Federated Multi-Task Learning. arXiv 2018, arXiv:1705.10467. [Google Scholar] [CrossRef]
- Sattler, F.; Müller, K.R.; Samek, W. Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints. IEEE Trans. Neural Networks Learn. Syst. 2021, 32, 3710–3722. [Google Scholar] [CrossRef]
- Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An Efficient Framework for Clustered Federated Learning. IEEE Trans. Inf. Theory 2022, 68, 8076–8091. [Google Scholar] [CrossRef]
- Huang, Y.; Chu, L.; Zhou, Z.; Wang, L.; Liu, J.; Pei, J.; Zhang, Y. Personalized Cross-Silo Federated Learning on Non-IID Data. arXiv 2021, arXiv:2007.03797. [Google Scholar] [CrossRef]
- Li, D.; Wang, J. FedMD: Heterogenous Federated Learning via Model Distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar] [CrossRef]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble Distillation for Robust Model Fusion in Federated Learning. arXiv 2021, arXiv:2006.07242. [Google Scholar] [CrossRef]
- Li, M.; Zhang, X.; Wang, Q.; LIU, T.; Wu, R.; Wang, W.; Zhuang, F.; Xiong, H.; Yu, D. Resource-Aware Federated Self-Supervised Learning with Global Class Representations. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Zhu, Z.; Hong, J.; Zhou, J. Data-Free Knowledge Distillation for Heterogeneous Federated Learning. arXiv 2021, arXiv:2105.10056. [Google Scholar] [CrossRef]
- Yang, Q.; Chen, J.; Yin, X.; Xie, J.; Wen, Q. FedMMD: Heterogenous Federated Learning based on Multi-teacher and Multi-feature Distillation. In Proceedings of the 2022 7th International Conference on Computer and Communication Systems (ICCCS), Wuhan, China, 22–25 April 2022; pp. 897–902. [Google Scholar] [CrossRef]
- Yao, D.; Shi, Y.; Liu, T.; Xu, Z. FedMHO: Heterogeneous One-Shot Federated Learning Towards Resource-Constrained Edge Devices. arXiv 2025, arXiv:2502.08518. [Google Scholar]
- Sang, T.; Chu, Z.; Xuan, J.; Zhang, X.; Li, X. Personalized Federated Learning in One-Shot: A Method for Heterogeneous Data Scenarios. IEEE Internet Things J. 2025, 12, 40415–40425. [Google Scholar] [CrossRef]
- Liu, X.; Liu, L.; Ye, F.; Shen, Y.; Li, X.; Jiang, L.; Li, J. FedLPA: Personalized One-shot Federated Learning with Layer-Wise Posterior Aggregation. Adv. Neural Inf. Process. Syst. 2024, 37, 81510–81548. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Nayak, G.K.; Mopuri, K.R.; Shaj, V.; Babu, R.V.; Chakraborty, A. Zero-Shot Knowledge Distillation in Deep Networks. arXiv 2019, arXiv:1905.08114. [Google Scholar] [CrossRef]





| Dataset | Input Size | Channels | Classes | Train/Test Samples | Complexity |
|---|---|---|---|---|---|
| MNIST | 1 | 10 | 60,000/10,000 | Low | |
| Fashion-MNIST | 1 | 10 | 60,000/10,000 | Medium | |
| SVHN | 3 | 10 | 73,257/26,032 | High | |
| CIFAR-10 | 3 | 10 | 50,000/10,000 | High |
| Dataset | MNIST | SVHN | Fashion-MNIST | CIFAR-10 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Data Partition () | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 |
| FedAvg | 51.62 | 74.92 | 86.91 | 37.49 | 46.24 | 55.83 | 48.48 | 58.24 | 66.05 | 24.65 | 36.88 | 31.58 |
| FedProx | 68.80 | 77.81 | 88.97 | 45.93 | 49.17 | 55.89 | 51.52 | 61.98 | 68.19 | 31.27 | 35.45 | 45.78 |
| FedDF | 65.34 | 78.12 | 89.23 | 52.85 | 71.29 | 72.51 | 45.78 | 62.56 | 67.85 | 35.54 | 41.57 | 50.36 |
| DENSE | 75.92 | 85.54 | 92.54 | 56.47 | 69.89 | 78.53 | 58.00 | 69.13 | 74.02 | 38.78 | 47.64 | 59.40 |
| FedMHO | 87.49 | 92.85 | 94.00 | 75.42 | 79.21 | 81.34 | 62.14 | 72.36 | 75.36 | – | – | – |
| FedOM | 85.54 | 90.14 | 92.85 | 71.67 | 79.59 | 86.33 | 65.64 | 71.48 | 80.70 | 46.57 | 56.15 | 62.68 |
| FedLPA | 77.43 | 85.77 | 88.73 | 39.77 | 52.23 | 54.27 | 55.33 | 68.20 | 73.33 | 19.97 | 26.6 | 24.2 |
| pFedZKD (Ours) | 87.58 | 94.54 | 95.82 | 67.51 | 78.45 | 80.81 | 70.52 | 77.05 | 83.98 | 49.12 | 56.49 | 64.42 |
| Method | Strong Non-IID | Extreme Non-IID | Drop ↓ |
|---|---|---|---|
| FedAvg | 24.65 | 11.35 | 13.30 |
| FedProx | 31.27 | 13.37 | 17.90 |
| DENSE | 38.78 | 20.47 | 18.31 |
| FedLPA | 19.97 | 16.17 | 3.80 |
| pFedZKD (Ours) | 49.12 | 37.96 | 11.16 |
| Dataset | MNIST | SVHN | Fashion-MNIST | CIFAR-10 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Data Partition () | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 |
| pFedZKD-Hom | 75.09 | 80.29 | 86.91 | 37.66 | 43.53 | 59.83 | 41.90 | 59.86 | 64.53 | 33.16 | 38.14 | 46.57 |
| pFedZKD-RandPool | 58.20 | 62.30 | 64.36 | 20.89 | 26.64 | 31.89 | 25.09 | 41.15 | 53.51 | 20.25 | 21.24 | 35.22 |
| pFedZKD (Full) | 87.08 | 94.54 | 95.82 | 67.51 | 78.45 | 80.81 | 70.52 | 77.05 | 83.98 | 49.12 | 56.49 | 64.42 |
| Dataset | MNIST | SVHN | Fashion-MNIST | CIFAR-10 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Data Partition () | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 | 0.1 | 0.3 | 0.5 |
| pFedZKD (w/o ZSKD) | 80.27 | 86.95 | 91.24 | 47.81 | 53.31 | 67.63 | 65.73 | 72.89 | 80.21 | 37.02 | 43.18 | 56.76 |
| pFedZKD (Full) | 87.08 | 94.54 | 95.82 | 67.51 | 78.45 | 80.81 | 70.52 | 77.05 | 83.98 | 49.12 | 56.49 | 64.42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yan, J.; Yang, X.; Wang, D.; Xu, Y.; Hua, G. pFedZKD: A One-Shot Personalized Federated Learning Framework via Evolutionary Architecture Search and Data-Free Distillation. Appl. Sci. 2026, 16, 3878. https://doi.org/10.3390/app16083878
Yan J, Yang X, Wang D, Xu Y, Hua G. pFedZKD: A One-Shot Personalized Federated Learning Framework via Evolutionary Architecture Search and Data-Free Distillation. Applied Sciences. 2026; 16(8):3878. https://doi.org/10.3390/app16083878
Chicago/Turabian StyleYan, Jiaqi, Xuan Yang, Desheng Wang, Yonggang Xu, and Gang Hua. 2026. "pFedZKD: A One-Shot Personalized Federated Learning Framework via Evolutionary Architecture Search and Data-Free Distillation" Applied Sciences 16, no. 8: 3878. https://doi.org/10.3390/app16083878
APA StyleYan, J., Yang, X., Wang, D., Xu, Y., & Hua, G. (2026). pFedZKD: A One-Shot Personalized Federated Learning Framework via Evolutionary Architecture Search and Data-Free Distillation. Applied Sciences, 16(8), 3878. https://doi.org/10.3390/app16083878

