Distributed Collaborative Learning with Representative Knowledge Sharing
Abstract
:1. Introduction
2. Challenges and Advances in Distributed Collaborative Learning
2.1. Federated Learning (FL)
2.2. Personalized Federated Learning (PFL)
2.3. Knowledge Distillation and Representative Datasets
2.4. Distributed Collaborative Learning (DCL)
3. Algorithm
3.1. Formulation of Objective
3.1.1. Prediction Task Distributions and Local Training Distributions
- represents the feature space of the training data for node k;
- is the index set of a local dataset on node k;
- represents the number of data points at node k;
- denotes the total number of observations across all nodes.
3.1.2. Core Idea of CTL
3.1.3. Collaborative Objective
- ) represents the loss on the local dataset .
- is the distillation loss, which measures the discrepancy between predictions from guest node j and the local model k on the shared representative dataset. This is typically calculated using the Kullback–Leibler (KL) divergence.
- represents the adaptive distillation weight, which dynamically scales the contribution of each guest node j to the learning process at node k.
- The tuning parameter balances the importance of local optimization and collaborative knowledge transfer.
3.2. Local Learning
3.3. Knowledge Distillation via Representatives
- The centroid for class 0, with
- The centroid for class 1, with
3.4. Distillation Weights
3.4.1. Energy Coefficients with Feature Importance
3.4.2. Class-Wise Energy Coefficients
3.4.3. Distillation Weight Formula
3.5. Complete Algorithm for Collaborative Transfer Learning
Algorithm 1 Collaborative Transfer Learning (CTL) |
|
4. Simulation Studies
4.1. Simulation: Performance of Collaborative Learning Strategies
4.1.1. Node Creation
4.1.2. Comparison of Guest Node Selection Strategies
- All: All other nodes are used as guest nodes.
- Best: Select the two nodes with the lowest Energy Coefficients H relative to each prediction task.
- Random: Select two random guest nodes.
- Worst: Select the two nodes with the highest H relative to each prediction task.
- Local: No guest nodes are used (local-only training).
4.1.3. Results
4.2. Simulation: Effect of Spread and Dispersion on Collaboration
4.2.1. Data Generation
4.2.2. Results
5. Discussion and Future Work
5.1. Convergence
5.2. Computational Efficiency and Scalability
5.3. Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
DCL | Distributed Collaborative Learning |
CTL | Collaborative transfer learning |
DML | Distributed machine learning |
MTL | Multi-Task Learning |
FL | Federated learning |
PFL | Personalized Federated Learning |
Appendix A. Additional Figures
References
- DeepSeek-AI; Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2025. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Reddi, S.; Charles, Z.; Zaheer, M.; Garrett, Z.; Rush, K.; Konečný, J.; Kumar, S.; McMahan, H.B. Adaptive Federated Optimization. arXiv 2021. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Tan, L.; Zhang, X.; Zhou, Y.; Che, X.; Hu, M.; Chen, X.; Wu, D. AdaFed: Optimizing participation-aware federated learning with adaptive aggregation weights. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2708–2720. [Google Scholar] [CrossRef]
- Xie, C.; Koyejo, S.; Gupta, I. Asynchronous Federated Optimization. arXiv 2020. [Google Scholar] [CrossRef]
- Chen, Y.; Huang, W.; Ye, M. Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 12077–12086. [Google Scholar]
- Acar, D.A.E.; Zhao, Y.; Navarro, R.M.; Mattina, M.; Whatmough, P.N.; Saligrama, V. Federated Learning Based on Dynamic Regularization. arXiv 2021. [Google Scholar] [CrossRef]
- Seyedmohammadi, S.J.; Atapour, S.K.; Abouei, J.; Mohammadi, A. KnFu: Effective Knowledge Fusion. arXiv 2024, arXiv:2403.11892. [Google Scholar]
- Zhang, J.; Guo, S.; Ma, X.; Wang, H.; Xu, W.; Wu, F. Parameterized knowledge transfer for personalized federated learning. Adv. Neural Inf. Process. Syst. 2021, 34, 10092–10104. [Google Scholar]
- Tan, A.Z.; Yu, H.; Cui, L.; Yang, Q. Towards personalized federated learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9587–9603. [Google Scholar] [CrossRef]
- Jeong, W.; Yoon, J.; Yang, E.; Hwang, S.J. Federated Semi-Supervised Learning with Inter Client Consistency & Disjoint Learning. arXiv 2021. [Google Scholar] [CrossRef]
- Malinin, A.; Mlodozeniec, B.; Gales, M. Ensemble distribution distillation. arXiv 2019, arXiv:1905.00076. [Google Scholar]
- Cho, J.H.; Hariharan, B. On the efficacy of knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4794–4802. [Google Scholar]
- Gretton, A.; Smola, A.; Huang, J.; Schmittfull, M.; Borgwardt, K.; Schölkopf, B. Covariate shift and local learning by distribution matching. In Dataset Shift in Machine Learning; MIT Press: Cambridge, MA, USA, 2009; pp. 131–160. [Google Scholar]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Tian, Y.; Krishnan, D.; Isola, P. Contrastive Representation Distillation. arXiv 2022. [Google Scholar] [CrossRef]
- Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the International Conference on Machine Learning. PMLR, Online, 18–24 July 2021; pp. 6357–6368. [Google Scholar]
- Fallah, A.; Mokhtari, A.; Ozdaglar, A. Personalized federated learning: A meta-learning approach. arXiv 2020, arXiv:2002.07948. [Google Scholar]
- Sui, D.; Chen, Y.; Zhao, J.; Jia, Y.; Xie, Y.; Sun, W. Feded: Federated learning via ensemble distillation for medical relation extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 2118–2128. [Google Scholar]
- Passalis, N.; Tefas, A. Learning deep representations with probabilistic knowledge transfer. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 268–284. [Google Scholar]
- Huang, Z.; Wang, N. Like What You Like: Knowledge Distill via Neuron Selectivity Transfer. arXiv 2017. [Google Scholar] [CrossRef]
- Li, K.; Yang, J. Score-matching representative approach for big data analysis with generalized linear models. Electron. J. Stat. 2022, 16, 592–635. [Google Scholar] [CrossRef]
- Wu, C.; Wu, F.; Lyu, L.; Huang, Y.; Xie, X. Communication-efficient federated learning via knowledge distillation. Nat. Commun. 2022, 13, 2032. [Google Scholar] [CrossRef]
- Mu, X.; Shen, Y.; Cheng, K.; Geng, X.; Fu, J.; Zhang, T.; Zhang, Z. Fedproc: Prototypical contrastive federated learning on non-iid data. Future Gener. Comput. Syst. 2023, 143, 93–104. [Google Scholar] [CrossRef]
- Székely, G.J. E-Statistics: The Energy of Statistical Samples; Bowling Green State University: Bowling Green, OH, USA, 2003; Volume 3, pp. 1–18. [Google Scholar]
- Székely, G.J.; Rizzo, M.L. A new test for multivariate normality. J. Multivar. Anal. 2005, 93, 58–80. [Google Scholar] [CrossRef]
- Fan, M.; Geng, B.; Shterenberg, R.; Casey, J.A.; Chen, Z.; Li, K. Measuring Heterogeneity in Machine Learning with Distributed Energy Distance. arXiv 2025. [Google Scholar] [CrossRef]
- Bennett, K.P.; Mangasarian, O.L. Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1992, 1, 23–34. [Google Scholar] [CrossRef]
- Li, D.; Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar]
- Chen, H.Y.; Chao, W.L. Fedbe: Making bayesian model ensemble applicable to federated learning. arXiv 2020, arXiv:2009.01974. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Casey, J.; Chen, Q.; Fan, M.; Geng, B.; Shterenberg, R.; Chen, Z.; Li, K. Distributed Collaborative Learning with Representative Knowledge Sharing. Mathematics 2025, 13, 1004. https://doi.org/10.3390/math13061004
Casey J, Chen Q, Fan M, Geng B, Shterenberg R, Chen Z, Li K. Distributed Collaborative Learning with Representative Knowledge Sharing. Mathematics. 2025; 13(6):1004. https://doi.org/10.3390/math13061004
Chicago/Turabian StyleCasey, Joseph, Qianjiao Chen, Mengchen Fan, Baocheng Geng, Roman Shterenberg, Zhong Chen, and Keren Li. 2025. "Distributed Collaborative Learning with Representative Knowledge Sharing" Mathematics 13, no. 6: 1004. https://doi.org/10.3390/math13061004
APA StyleCasey, J., Chen, Q., Fan, M., Geng, B., Shterenberg, R., Chen, Z., & Li, K. (2025). Distributed Collaborative Learning with Representative Knowledge Sharing. Mathematics, 13(6), 1004. https://doi.org/10.3390/math13061004