Federated Learning for Extreme Label Noise: Enhanced Knowledge Distillation and Particle Swarm Optimization
Abstract
:1. Introduction
- The application of the particle swarm optimization algorithm to weight adjustment is investigated. The particle swarm optimization algorithm employs a population-based approach to perform a global search for the optimal weights. It uses the prediction results of a provisional aggregation model on an auxiliary dataset as a criterion, updating the positions and velocities of particles through continuous iterations to search for the optimal solution in the weight space.
- The FedDPSO approach is proposed. The server dynamically identifies extremely noisy clients by predicting uncertainty with local models and then uses the particle swarm optimization algorithm to adjust the aggregation weights for global model aggregation. Extremely noisy clients construct the interpolation loss of the pseudo-label loss and knowledge distillation loss to train the local model.
- Experimental results on various datasets demonstrate that this method mitigates the impact of highly polluted data from extremely noisy clients on the global model and effectively enhances the robustness of model training.
2. Related Works
2.1. Federated Learning with Label Noise
2.2. Knowledge Distillation
2.3. Particle Swarm Optimization Algorithm
3. Methods
3.1. Problem Definition
3.2. Framework Overview
3.3. Identification of Extremely Noisy Clients
3.4. Model Training for Extremely Noisy Clients
3.5. Global Model Aggregation
Algorithm 1 Enhanced Knowledge Distillation and Particle Swarm Optimization for Federated Learning (FedDPSO) |
Input: the global round T, the uncertainty threshold for Identify , the distillation temperature , the total number of clients . is the inertial weight, and are the maximum and minimum values of the inertia weights, and are the acceleration coefficients. |
Output: the global model |
1: Initialize global model send to each client |
2: for to T do |
3: // Server executes: |
4: Randomly select a set of clients |
5: for do |
6: ← LocalUpdate(k, ) |
7: end for |
8: Dynamically identify and with Equations (1) and (2) |
9: Calculate global model aggregate weights with Equations (9) and (13) |
10: Generate global model with Equation (14) |
11: |
12: // Client executes: |
13: function LocalUpdate(k, ) |
14: if then |
15: Update local model with Equation (6) |
16: else |
17: Update local model with |
18: end if |
19: return |
20: end function |
21: end for |
4. Experiments
4.1. Experimental Setup
4.2. Implementation Detail
4.3. Noise Ratio Effect
4.4. Method Comparison Analysis
4.5. Ablation Study
5. Discussion
5.1. Communication and Computation Costs
5.2. Stability and Robustness
5.3. Limitation and Scalability
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Posner, J.; Tseng, L.; Aloqaily, M.; Jararweh, Y. Federated learning in vehicular networks: Opportunities and solutions. IEEE Netw. 2021, 35, 152–159. [Google Scholar] [CrossRef]
- Guo, J.; Liu, Z.; Tian, S.; Huang, F.; Li, J.; Li, X.; Igorevich, K.K.; Ma, J. TFL-DT: A trust evaluation scheme for federated learning in digital twin for mobile networks. IEEE J. Sel. Areas Commun. 2023, 41, 3548–3560. [Google Scholar] [CrossRef]
- Qi, P.; Chiaro, D.; Piccialli, F. Small models, big impact: A review on the power of lightweight Federated Learning. Future Gener. Comput. Syst. 2025, 162, 107484. [Google Scholar] [CrossRef]
- Wang, X.; Han, Y.; Wang, C.; Zhao, Q.; Chen, X.; Chen, M. In-edge ai: Intelligentizing mobile edge computing, caching and communication by federated learning. IEEE Netw. 2019, 33, 156–165. [Google Scholar] [CrossRef]
- Zhang, C.; Yu, Z.; Fu, H.; Zhu, P.; Chen, L.; Hu, Q. Hybrid noise-oriented multilabel learning. IEEE Trans. Cybern. 2019, 50, 2837–2850. [Google Scholar] [CrossRef] [PubMed]
- Xiao, T.; Xia, T.; Yang, Y.; Huang, C.; Wang, X. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2691–2699. [Google Scholar]
- Wei, J.; Zhu, Z.; Cheng, H.; Liu, T.; Niu, G.; Liu, Y. Learning with noisy labels revisited: A study using real-world human annotations. arXiv 2021, arXiv:2110.12088. [Google Scholar]
- Arpit, D.; Jastrzebski, S.; Ballas, N.; Krueger, D.; Bengio, E.; Kanwal, M.S.; Maharaj, T.; Fischer, A.; Courville, A.; Bengio, Y.; et al. A closer look at memorization in deep networks. In Proceedings of the International Conference on Machine Learning, ICML, Sydney, NSW, Australia, 6–11 August 2017; pp. 233–242. [Google Scholar]
- Miao, Y.; Xie, R.; Li, X.; Liu, Z.; Choo, K.K.R.; Deng, R.H. Efficient and secure federated learning against backdoor attacks. IEEE Trans. Dependable Secur. Comput. 2024, 21, 4619–4636. [Google Scholar] [CrossRef]
- Tam, K.; Li, L.; Han, B.; Xu, C.; Fu, H. Federated noisy client learning. IEEE Trans. Neural Netw. Learn. Syst. 2023, 36, 1799–1812. [Google Scholar] [CrossRef]
- Zhao, Z.; Chu, L.; Tao, D.; Pei, J. Classification with label noise: A Markov chain sampling framework. Data Min. Knowl. Discov. 2019, 33, 1468–1504. [Google Scholar] [CrossRef]
- Chen, Y.; Yang, X.; Qin, X.; Yu, H.; Chan, P.; Shen, Z. Dealing with label quality disparity in federated learning. In Federated Learning: Privacy and Incentive; Springer: Cham, Switzerland, 2020; pp. 108–121. [Google Scholar]
- Xu, J.; Chen, Z.; Quek, T.Q.; Chong, K.F.E. Fedcorr: Multi-stage federated learning for label noise correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10184–10193. [Google Scholar]
- Yang, S.; Park, H.; Byun, J.; Kim, C. Robust federated learning with noisy labels. IEEE Intell. Syst. 2022, 37, 35–43. [Google Scholar] [CrossRef]
- Lu, Y.; Chen, L.; Zhang, Y.; Zhang, Y.; Han, B.; Cheung, Y.m.; Wang, H. Federated learning with extremely noisy clients via negative distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 14184–14192. [Google Scholar]
- Li, D.; Wang, J. Fedmd: Heterogenous federated learning via model distillation. arXiv 2019, arXiv:1910.03581. [Google Scholar]
- Chen, K.; Zhou, F.; Liu, A. Chaotic dynamic weight particle swarm optimization for numerical function optimization. Knowl.-Based Syst. 2018, 139, 23–40. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning, ICML, New York, NY, USA, 19–24 June 2016; pp. 1050–1059. [Google Scholar]
- Mendieta, M.; Yang, T.; Wang, P.; Lee, M.; Ding, Z.; Chen, C. Local learning matters: Rethinking data heterogeneity in federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8397–8406. [Google Scholar]
- Yao, Y.; Sun, Z.; Zhang, C.; Shen, F.; Wu, Q.; Zhang, J.; Tang, Z. Jo-src: A contrastive approach for combating noisy labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5192–5201. [Google Scholar]
- Ghosh, A.; Kumar, H.; Sastry, P.S. Robust loss functions under label noise for deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
- Xia, X.; Liu, T.; Han, B.; Gong, C.; Wang, N.; Ge, Z.; Chang, Y. Robust early-learning: Hindering the memorization of noisy labels. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 30 April 2020. [Google Scholar]
- Han, B.; Yao, Q.; Yu, X.; Niu, G.; Xu, M.; Hu, W.; Tsang, I.; Sugiyama, M. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
- Tuor, T.; Wang, S.; Ko, B.J.; Liu, C.; Leung, K.K. Overcoming noisy and irrelevant data in federated learning. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5020–5027. [Google Scholar]
- Ba, J.; Caruana, R. Do deep nets really need to be deep? In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Urban, G.; Geras, K.J.; Kahou, S.E.; Aslan, O.; Wang, S.; Caruana, R.; Mohamed, A.; Philipose, M.; Richardson, M. Do deep convolutional nets really need to be deep and convolutional? arXiv 2016, arXiv:1603.05691. [Google Scholar]
- Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Zhu, Y.; Yu, X.; Chandraker, M.; Wang, Y.X. Private-kNN: Practical differential privacy for computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11854–11862. [Google Scholar]
- Jeong, E.; Oh, S.; Kim, H.; Park, J.; Bennis, M.; Kim, S.L. Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv 2018, arXiv:1811.11479. [Google Scholar]
- Yao, D.; Pan, W.; Dai, Y.; Wan, Y.; Ding, X.; Yu, C.; Jin, H.; Xu, Z.; Sun, L. FedGKD: Towards Heterogeneous Federated Learning via Global Knowledge Distillation. IEEE Trans. Comput. 2024, 73, 3–17. [Google Scholar] [CrossRef]
- Zhu, D.; Wang, S.; Zhou, C.; Yan, S.; Xue, J. Human memory optimization algorithm: A memory-inspired optimizer for global optimization problems. Expert Syst. Appl. 2024, 237, 121597. [Google Scholar] [CrossRef]
- Kahraman, H.T.; Aras, S.; Gedikli, E. Fitness-distance balance (FDB): A new selection method for meta-heuristic search algorithms. Knowl.-Based Syst. 2020, 190, 105169. [Google Scholar] [CrossRef]
- Zhu, D.; Wang, S.; Zhou, C.; Yan, S. Manta ray foraging optimization based on mechanics game and progressive learning for multiple optimization problems. Appl. Soft Comput. 2023, 145, 110561. [Google Scholar] [CrossRef]
- Eberhart, R.; Kennedy, J. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
- Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
- Zhu, D.; Wang, S.; Shen, J.; Zhou, C.; Li, T.; Yan, S. A multi-strategy particle swarm algorithm with exponential noise and fitness-distance balance method for low-altitude penetration in secure space. J. Comput. Sci. 2023, 74, 102149. [Google Scholar] [CrossRef]
- Houssein, E.H.; Sayed, A. Boosted federated learning based on improved Particle Swarm Optimization for healthcare IoT devices. Comput. Biol. Med. 2023, 163, 107195. [Google Scholar] [CrossRef]
- Park, S.; Suh, Y.; Lee, J. FedPSO: Federated learning using particle swarm optimization to reduce communication costs. Sensors 2021, 21, 600. [Google Scholar] [CrossRef] [PubMed]
- Devarajan, G.G.; Nagarajan, S.M.; Daniel, A.; Vignesh, T.; Kaluri, R. Consumer product recommendation system using adapted PSO with federated learning method. IEEE Trans. Consum. Electron. 2023, 70, 2708–2715. [Google Scholar] [CrossRef]
- Zhao, Z.; Xia, J.; Fan, L.; Lei, X.; Karagiannidis, G.K.; Nallanathan, A. System optimization of federated learning networks with a constrained latency. IEEE Trans. Veh. Technol. 2021, 71, 1095–1100. [Google Scholar] [CrossRef]
- Supriya, Y.; Gadekallu, T.R. Particle swarm-based federated learning approach for early detection of forest fires. Sustainability 2023, 15, 964. [Google Scholar] [CrossRef]
- Kandati, D.R.; Gadekallu, T.R. Federated learning approach for early detection of chest lesion caused by COVID-19 infection using particle swarm optimization. Electronics 2023, 12, 710. [Google Scholar] [CrossRef]
- Liu, J.; Xu, H.; Wang, L.; Xu, Y.; Qian, C.; Huang, J.; Huang, H. Adaptive asynchronous federated learning in resource-constrained edge computing. IEEE Trans. Mob. Comput. 2021, 22, 674–690. [Google Scholar] [CrossRef]
- Burgert, T.; Ravanbakhsh, M.; Demir, B. On the effects of different types of label noise in multi-label remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5413713. [Google Scholar] [CrossRef]
- Jiang, X.; Sun, S.; Wang, Y.; Liu, M. Towards federated learning against noisy labels via local self-regularization. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 862–873. [Google Scholar]
- Lin, T.; Kong, L.; Stich, S.U.; Jaggi, M. Ensemble distillation for robust model fusion in federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 2351–2363. [Google Scholar]
- Yuan, L.; Tay, F.E.; Li, G.; Wang, T.; Feng, J. Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3903–3911. [Google Scholar]
- Li, W.; Liang, P.; Sun, B.; Sun, Y.; Huang, Y. Reinforcement learning-based particle swarm optimization with neighborhood differential mutation strategy. Swarm Evol. Comput. 2023, 78, 101274. [Google Scholar] [CrossRef]
- Zhang, Y.; Kong, X. A particle swarm optimization algorithm with empirical balance strategy. Chaos Solitons Fractals X 2023, 10, 100089. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, AISTATS, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
Method | CIFAR10 | CIFAR100 | ||||
---|---|---|---|---|---|---|
Beta | (0.1, 0.1) | (0.1, 0.3) | (0.3, 0.5) | (0.1, 0.1) | (0.1, 0.3) | (0.3, 0.5) |
FedAvg | 55.29% | 72.82% | 63.62% | 43.70% | 54.68% | 46.49% |
FedProx | 55.04% | 72.77% | 63.20% | 43.02% | 54.41% | 46.02% |
Mixup | 56.65% | 70.81% | 63.10% | 45.19% | 55.19% | 47.93% |
RoFL | 57.42% | 71.28% | 64.29% | 42.63% | 53.67% | 45.74% |
FOCUS | 58.93% | 75.12% | 66.46% | 46.87% | 56.89% | 48.70% |
FedDPSO | 70.29% | 78.58% | 70.74% | 51.96% | 55.77% | 49.28% |
Beta | Dataset | 0.5 | 0.6 | 0.7 | 0.8 |
---|---|---|---|---|---|
(0.1, 0.1) | CIFAR10 | 69.60% | 70.29% | 69.40% | 70.50% |
CIFAR100 | 49.39% | 51.96% | 49.28% | 49.07% | |
(0.1, 0.3) | CIFAR10 | 78.89% | 78.58% | 78.39% | 77.90% |
CIFAR100 | 54.57% | 55.77% | 55.23% | 49.86% | |
(0.3, 0.5) | CIFAR10 | 70.57% | 70.74% | 70.96% | 70.78% |
CIFAR100 | 43.39% | 49.28% | 49.74% | 49.43% |
Dataset | FedAvg | FedPSO | FedD | FedDPSO |
---|---|---|---|---|
CIFAR10 | 55.29% | 61.06% | 59.96% | 70.29% |
CIFAR100 | 43.70% | 50.89% | 47.27% | 51.96% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ouyang, C.; Mao, J.; Li, Y.; Li, T.; Zhu, D.; Zhou, C.; Xu, Z. Federated Learning for Extreme Label Noise: Enhanced Knowledge Distillation and Particle Swarm Optimization. Electronics 2025, 14, 366. https://doi.org/10.3390/electronics14020366
Ouyang C, Mao J, Li Y, Li T, Zhu D, Zhou C, Xu Z. Federated Learning for Extreme Label Noise: Enhanced Knowledge Distillation and Particle Swarm Optimization. Electronics. 2025; 14(2):366. https://doi.org/10.3390/electronics14020366
Chicago/Turabian StyleOuyang, Chengtian, Jihong Mao, Yehong Li, Taiyong Li, Donglin Zhu, Changjun Zhou, and Zhenyu Xu. 2025. "Federated Learning for Extreme Label Noise: Enhanced Knowledge Distillation and Particle Swarm Optimization" Electronics 14, no. 2: 366. https://doi.org/10.3390/electronics14020366
APA StyleOuyang, C., Mao, J., Li, Y., Li, T., Zhu, D., Zhou, C., & Xu, Z. (2025). Federated Learning for Extreme Label Noise: Enhanced Knowledge Distillation and Particle Swarm Optimization. Electronics, 14(2), 366. https://doi.org/10.3390/electronics14020366