Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables
Abstract
:1. Introduction
- We address one of the most troublesome FL challenges, i.e., client drift. To enhance the generalization capability of the global model, we first propose FedGAM, which shifts the objective of client model training to simultaneously minimize the loss value and first-order flatness, aiming to find flat minima.
- To achieve direct smoothing of the global model, we propose FedGAM-CV based on FedGAM. FedGAM-CV leverages control variable techniques to better align the updates from each client, guiding them towards a common global flat region.
- We conduct extensive experiments to demonstrate the superiority of FedGAM and FedGAM-CV. The results of these experiments demonstrate that FedGAM and FedGAM-CV not only achieve stronger generalization performance than baseline algorithms but also exhibit robustness against causes of client drift in various settings.
2. Related Works
2.1. Regularization of the Local Target
2.2. Modeling Aggregation
2.3. Flatness-Based Federated Learning
3. Preliminaries
3.1. General Federated Learning
3.2. Analysis of the Causes of Client Drift
- Local minima: For each client (i), the local minimum () satisfies the following condition:
- Global minima: The global minimum () satisfies:
- The Non-IID nature of client data distributions: Different clients have varying data distributions, resulting in each client’s loss function having a different minimum (), which leads to inconsistent local update directions.
- Multiple local training steps: Before each round of global aggregation, clients perform multiple local training steps, causing the local model () to overfit the local data and deviate from the global optimal solution ().
- Partial client participation in training: Only a subset of clients participates in each training round, leading to a lack of representativeness in the global model updates, which further exacerbates client drift.
3.3. Zeroth-Order Flatness and First-Order Flatness
3.3.1. Zeroth-Order Flatness
3.3.2. First-Order Flatness
3.3.3. Comparison with Zeroth-Order Flatness
4. The Proposed Algorithms
4.1. FedGAM: Federated Learning Based on Gradient Norm-Aware Minimization
Algorithm 1 FedGAM |
Input: Initial server model w; Learning rate ; Perturbation radius ; Trade-off coefficient Output: Updated global model w
|
4.2. FedGAM-CV: Federated Learning Based on Gradient Norm-Aware Minimization and Control Variables
4.2.1. Method
Algorithm 2 FedGAM-CV |
Input: Initial server model w; Learning rate ; Perturbation radius ; Trade-off coefficient ; Initial control variables c, Output: Updated global model w
|
4.2.2. Validity Analyses of Control Variables
5. Experiments
5.1. Experimental Setup
5.2. Overall Performance Comparison
5.3. In-Depth Experiments
5.3.1. Robustness to Client Drift
5.3.2. Loss Surface Visualization
5.3.3. Hyperparameter Sensitivity
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Saranya, T.; Deisy, C.; Sridevi, S.; Anbananthen, K.S.M. A comparative study of deep learning and Internet of Things for precision agriculture. Eng. Appl. Artif. Intell. 2023, 122, 106034. [Google Scholar] [CrossRef]
- Subhashini, R.; Khang, A. The role of Internet of Things (IoT) in smart city framework. In Smart Cities; CRC Press: Boca Raton, FL, USA, 2023; pp. 31–56. [Google Scholar]
- Aminizadeh, S.; Heidari, A.; Toumaj, S.; Darbandi, M.; Navimipour, N.J.; Rezaei, M.; Talebi, S.; Azad, P.; Unal, M. The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things. Comput. Methods Programs Biomed. 2023, 241, 107745. [Google Scholar] [CrossRef]
- Rajak, P.; Ganguly, A.; Adhikary, S.; Bhattacharya, S. Internet of Things and smart sensors in agriculture: Scopes and challenges. J. Agric. Food Res. 2023, 14, 100776. [Google Scholar] [CrossRef]
- Ng, D.T.K.; Lee, M.; Tan, R.J.Y.; Hu, X.; Downie, J.S.; Chu, S.K.W. A review of AI teaching and learning from 2000 to 2020. Educ. Inf. Technol. 2023, 28, 8445–8501. [Google Scholar] [CrossRef]
- Cetinic, E.; She, J. Understanding and creating art with AI: Review and outlook. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2022, 18, 1–22. [Google Scholar] [CrossRef]
- Cao, L. Ai in finance: Challenges, techniques, and opportunities. ACM Comput. Surv. (CSUR) 2022, 55, 1–38. [Google Scholar]
- Wu, X.; Xiao, L.; Sun, Y.; Zhang, J.; Ma, T.; He, L. A survey of human-in-the-loop for machine learning. Future Gener. Comput. Syst. 2022, 135, 364–381. [Google Scholar] [CrossRef]
- Yang, P.; Xiong, N.; Ren, J. Data security and privacy protection for cloud storage: A survey. IEEE Access 2020, 8, 131723–131740. [Google Scholar] [CrossRef]
- Li, Q.; Wen, Z.; Wu, Z.; Hu, S.; Wang, N.; Li, Y.; Liu, X.; He, B. A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Trans. Knowl. Data Eng. 2021, 35, 3347–3366. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated learning of deep networks using model averaging. arXiv 2016, arXiv:1602.056292. [Google Scholar]
- Guendouzi, B.S.; Ouchani, S.; Assaad, H.E.; Zaher, M.E. A systematic review of federated learning: Challenges, aggregation methods, and development tools. J. Netw. Comput. Appl. 2023, 220, 103714. [Google Scholar] [CrossRef]
- Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Li, Q.; Diao, Y.; Chen, Q.; He, B. Federated learning on non-iid data silos: An experimental study. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 965–978. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
- Zhu, F.; Zhang, J.; Liu, S.; Wang, X. DRAG: Divergence-based Adaptive Aggregation in Federated learning on Non-IID Data. arXiv 2023, arXiv:2309.01779. [Google Scholar]
- Acar, D.A.E.; Zhao, Y.; Navarro, R.M.; Mattina, M.; Whatmough, P.N.; Saligrama, V. Federated learning based on dynamic regularization. arXiv 2021, arXiv:2111.04263. [Google Scholar]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, Online, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
- Li, Q.; He, B.; Song, D. Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10713–10722. [Google Scholar]
- Li, Z.; Lin, T.; Shang, X.; Wu, C. Revisiting weighted aggregation in federated learning with neural networks. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 19767–19788. [Google Scholar]
- Zhang, J.; Li, A.; Tang, M.; Sun, J.; Chen, X.; Zhang, F.; Chen, C.; Chen, Y.; Li, H. Fed-cbs: A heterogeneity-aware client sampling mechanism for federated learning via class-imbalance reduction. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 41354–41381. [Google Scholar]
- Chen, H.; Chao, W. Fedbe: Making bayesian model ensemble applicable to federated learning. arXiv 2020, arXiv:2009.01974. [Google Scholar]
- Park, S.; Suh, Y.; Lee, J. FedPSO: Federated learning using particle swarm optimization to reduce communication costs. Sensors 2021, 21, 600. [Google Scholar] [CrossRef]
- Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated learning with matched averaging. arXiv 2020, arXiv:2002.06440. [Google Scholar]
- Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 7611–7623. [Google Scholar]
- Chaudhari, P.; Choromanska, A.; Soatto, S.; LeCun, Y.; Baldassi, C.; Borgs, C.; Chayes, J.; Sagun, L.; Zecchina, R. Entropy-sgd: Biasing gradient descent into wide valleys. J. Stat. Mech. Theory Exp. 2019, 2019, 124018. [Google Scholar] [CrossRef]
- Foret, P.; Kleiner, A.; Mobahi, H.; Neyshabur, B. Sharpness-aware Minimization for Efficiently Improving Generalization. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021. [Google Scholar]
- Keskar, N.S.; Socher, R. Improving generalization performance by switching from adam to sgd. arXiv 2017, arXiv:1712.07628. [Google Scholar]
- Dziugaite, G.K.; Roy, D.M. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv 2017, arXiv:1703.11008. [Google Scholar]
- Jiang, Y.; Neyshabur, B.; Mobahi, H.; Krishnan, D.; Bengio, S. Fantastic generalization measures and where to find them. arXiv 2019, arXiv:1912.02178. [Google Scholar]
- Zhang, X.; Xu, R.; Yu, H.; Zou, H.; Cui, P. Gradient norm aware minimization seeks first-order flatness and improves generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20247–20257. [Google Scholar]
- Jia, Z.; Su, H. Information-theoretic local minima characterization and regularization. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 4773–4783. [Google Scholar]
- Kaur, S.; Cohen, J.; Lipton, Z.C. On the maximum hessian eigenvalue and generalization. In Proceedings of the Proceedings on “I Can’t Believe It’s Not Better!—Understanding Deep Learning Through Empirical Falsification” at NeurIPS 2022 Workshops, New Orleans, LA, USA, 3 December 2022; pp. 51–65. [Google Scholar]
- Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv 2016, arXiv:1609.04836. [Google Scholar]
- Zhuang, J.; Gong, B.; Yuan, L.; Cui, Y.; Adam, H.; Dvornek, N.; Tatikonda, S.; Duncan, J.; Liu, T. Surrogate gap minimization improves sharpness-aware training. arXiv 2022, arXiv:2203.08065. [Google Scholar]
- Qu, Z.; Li, X.; Duan, R.; Liu, Y.; Tang, B.; Lu, Z. Generalized federated learning via sharpness aware minimization. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 18250–18280. [Google Scholar]
- Caldarola, D.; Caputo, B.; Ciccone, M. Improving generalization in federated learning by seeking flat minima. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 654–672. [Google Scholar]
- Sun, Y.; Shen, L.; Huang, T.; Ding, L.; Tao, D. Fedspeed: Larger local interval, less communication round, and higher generalization accuracy. arXiv 2023, arXiv:2302.10429. [Google Scholar]
- Sun, Y.; Shen, L.; Chen, S.; Ding, L.; Tao, D. Dynamic regularized sharpness aware minimization in federated learning: Approaching global consistency and smooth landscape. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 32991–33013. [Google Scholar]
- Panchal, K.; Choudhary, S.; Mitra, S.; Mukherjee, K.; Sarkhel, S.; Mitra, S.; Guan, H. Flash: Concept drift adaptation in federated learning. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 26931–26962. [Google Scholar]
- Li, Q.; He, B.; Song, D. Adversarial collaborative learning on non-iid features. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 19504–19526. [Google Scholar]
- Xie, L.; Liu, J.; Lu, S.; Chang, T.H.; Shi, Q. An efficient learning framework for federated XGBoost using secret sharing and distributed optimization. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–28. [Google Scholar] [CrossRef]
- Dai, R.; Yang, X.; Sun, Y.; Shen, L.; Tian, X.; Wang, M.; Zhang, Y. Fedgamma: Federated learning with global sharpness-aware minimization. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–14. [Google Scholar] [CrossRef]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning multiple layers of features from tiny images. Handb. Syst. Autoimmune Dis. 2009, 1, 4. [Google Scholar]
- Deng, L. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 2012, 29, 141–142. [Google Scholar] [CrossRef]
- Hsu, T.M.H.; Qi, H.; Brown, M. Measuring the effects of non-identical data distribution for federated visual classification. arXiv 2019, arXiv:1909.06335. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Li, H.; Xu, Z.; Taylor, G.; Studer, C.; Goldstein, T. Visualizing the loss landscape of neural nets. arXiv 2018, arXiv:1712.09913. [Google Scholar]
Symbol | Description | Symbol | Description |
---|---|---|---|
N | Total number of clients | Global model, local model of client i | |
T | Communication rounds | E | Local epochs |
Loss function | ∇ | Gradient operator | |
Trade-off coefficient | Perturbation radius | ||
Aggregation weight of client i | Learning rate | ||
K | Number of clients participating in training | c, | Global control variable, control variable of client i |
Set of clients participating in the t-th communication round | Size of the random sample taken from client i’s training data |
Algorithm | Major Contributions |
---|---|
FedAvg [11] | The algorithm first proposes the federated framework that incorporates partial participation and local multiple training. |
FedProx [16] | The algorithm guarantees the local collective objectives through local regularization. |
FedNova [26] | The algorithm normalizes and scales each client’s local updates based on their local steps before updating the global model. |
SCAFFOLD [19] | The algorithm employs the control variate technique to mitigate the client drift issue. |
MOON [20] | The algorithm introduces a contrastive loss to enhance the model’s generalization ability by reducing the distance between the representations learned by the local model and those learned by the global model. |
FedSAM [37] | The algorithm incorporates zeroth-order flatness into the client’s objective function to enhance the model’s generalization ability by seeking flat minima. |
FedSMOO [40] | Building on FedSAM, the algorithm employs a dynamic regularizer to ensure that the local objectives align with the global objective. |
Dataset | Algorithm | Dir(0.3) | Dir(0.7) | IID | |||
---|---|---|---|---|---|---|---|
Test | Train | Test | Train | Test | Train | ||
MNIST | FedAvg | 95.83 | 96.82 | 97.19 | 97.84 | 98.00 | 98.69 |
FedProx | 95.65 | 96.40 | 97.13 | 97.99 | 97.95 | 98.17 | |
MOON | 98.56 | 98.77 | 98.57 | 99.18 | 98.96 | 99.29 | |
SCAFFOLD | 97.62 | 98.13 | 97.84 | 98.67 | 98.31 | 98.68 | |
FedNova | 96.95 | 97.86 | 97.21 | 97.93 | 98.07 | 98.93 | |
FedSAM | 96.20 | 97.56 | 97.20 | 97.98 | 98.07 | 98.61 | |
FedSMOO | 98.89 | 99.42 | 98.93 | 99.57 | 98.90 | 99.60 | |
FedGAM | 98.69 | 98.85 | 98.74 | 98.96 | 98.89 | 99.13 | |
FedGAM-CV | 98.98 | 99.48 | 99.11 | 99.51 | 99.15 | 99.53 | |
FashionMNIST | FedAvg | 83.05 | 85.03 | 84.09 | 85.26 | 86.02 | 87.09 |
FedProx | 77.86 | 79.90 | 80.41 | 82.22 | 83.67 | 85.47 | |
MOON | 87.17 | 89.34 | 87.81 | 90.76 | 88.24 | 92.16 | |
SCAFFOLD | 83.72 | 84.48 | 85.03 | 86.12 | 86.54 | 87.68 | |
FedNova | 83.47 | 85.19 | 84.15 | 86.28 | 85.87 | 88.65 | |
FedSAM | 83.40 | 85.22 | 84.54 | 86.32 | 86.04 | 88.22 | |
FedSMOO | 87.62 | 89.32 | 88.66 | 90.46 | 89.25 | 91.57 | |
FedGAM | 87.81 | 88.55 | 88.30 | 89.94 | 88.33 | 90.12 | |
FedGAM-CV | 88.61 | 89.22 | 89.20 | 90.32 | 89.66 | 90.58 | |
CIFAR-10 | FedAvg | 72.98 | 74.55 | 76.57 | 78.89 | 79.61 | 83.40 |
FedProx | 72.50 | 74.06 | 76.11 | 78.51 | 79.15 | 82.94 | |
MOON | 83.12 | 85.54 | 86.28 | 89.67 | 87.33 | 92.87 | |
SCAFFOLD | 74.96 | 76.41 | 79.64 | 80.90 | 84.36 | 87.37 | |
FedNova | 73.93 | 75.32 | 76.82 | 79.22 | 79.47 | 83.18 | |
FedSAM | 76.53 | 79.19 | 76.96 | 79.31 | 80.17 | 83.79 | |
FedSMOO | 78.21 | 81.85 | 79.52 | 82.62 | 81.13 | 83.22 | |
FedGAM | 84.88 | 86.07 | 86.62 | 88.88 | 87.53 | 89.35 | |
FedGAM-CV | 88.20 | 89.24 | 90.05 | 91.70 | 92.40 | 93.83 |
Algorithm | E = 5 | E = 10 | E = 15 | E = 20 | (E) |
---|---|---|---|---|---|
FedAvg | 73.80 | 80.42 | 83.05 | 84.05 | |
FedProx | 71.56 | 75.98 | 77.86 | 79.11 | |
FedSAM | 74.19 | 81.00 | 83.40 | 84.46 | |
SCAFFOLD | 75.29 | 82.02 | 83.72 | 84.75 | |
FedNova | 74.32 | 81.23 | 83.47 | 81.40 | |
MOON | 84.41 | 87.13 | 87.17 | 86.00 | |
FedSMOO | 84.43 | 86.97 | 87.62 | 88.39 | |
FedGAM | 84.40 | 86.99 | 87.81 | 88.17 | |
FedGAM-CV | 85.90 | 88.20 | 88.61 | 88.91 |
Algorithm | C = 0.2 | C = 0.4 | C = 0.6 | C = 0.8 | C = 1.0 | (C) |
---|---|---|---|---|---|---|
FedAvg | 80.43 | 81.62 | 82.02 | 82.80 | 83.05 | |
FedProx | 75.04 | 76.19 | 76.60 | 76.73 | 77.86 | |
FedSAM | 80.70 | 83.25 | 83.28 | 83.36 | 83.40 | |
SCAFFOLD | 81.90 | 82.95 | 83.10 | 83.64 | 83.72 | |
FedNova | 81.87 | 82.73 | 83.26 | 83.34 | 83.47 | |
MOON | 85.22 | 86.38 | 86.48 | 86.94 | 87.17 | |
FedSMOO | 84.85 | 85.16 | 86.02 | 86.95 | 87.62 | |
FedGAM | 86.22 | 87.09 | 87.38 | 87.73 | 87.81 | |
FedGAM-CV | 87.29 | 88.38 | 88.43 | 88.54 | 88.61 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Y.; Ma, W.; Dai, C.; Wu, Y.; Zhou, H. Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables. Mathematics 2024, 12, 2644. https://doi.org/10.3390/math12172644
Xu Y, Ma W, Dai C, Wu Y, Zhou H. Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables. Mathematics. 2024; 12(17):2644. https://doi.org/10.3390/math12172644
Chicago/Turabian StyleXu, Yicheng, Wubin Ma, Chaofan Dai, Yahui Wu, and Haohao Zhou. 2024. "Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables" Mathematics 12, no. 17: 2644. https://doi.org/10.3390/math12172644
APA StyleXu, Y., Ma, W., Dai, C., Wu, Y., & Zhou, H. (2024). Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables. Mathematics, 12(17), 2644. https://doi.org/10.3390/math12172644