# A High-Performance Federated Learning Aggregation Algorithm Based on Learning Rate Adjustment and Client Sampling

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Traditional learning rate strategies lack adaptability and cannot adjust the learning rate based on dynamic changes during the training process. The proposed cyclic adaptive learning rate adjustment algorithm replaces the traditional approach of fixed learning rates for clients. Experimental results on various datasets show that it improves the training effectiveness of local models and enhances the performance of the global model;
- Addressing the issue of slow aggregation caused by traditional random sampling of clients, this paper introduces a client sampling strategy to balance the frequency of client sampling and their contributions, effectively enhancing the efficiency of model training in federated learning. The proposed federated learning client-weighted sampling method eliminates the impact of a single randomly selected client on global weights, addressing existing issues in client sampling algorithms;
- This paper conducts experimental evaluations on two representative datasets. The experimental results on these datasets demonstrate that, compared to baseline algorithms, the enhanced algorithm achieves the same test accuracy with an average reduction of 27.65% in training rounds on the MNIST dataset and an average reduction of 27.75% in training rounds on the CIFAR-10 dataset.

## 2. Related Work

## 3. Theoretical Knowledge

#### 3.1. Federated Learning

- There are two or more participants who aim to cooperatively build a consensus model that can be shared;
- During the federated learning training process, each participant’s local dataset is strictly kept on their device;
- Model-related information of federated learning participants is transmitted and exchanged in an encrypted manner, ensuring that no participant can infer the local dataset of other participants based on their model-related information;
- The performance of the joint model obtained through federated learning should closely approximate the performance of traditional centralized training machine learning models.

- At the beginning of the training process, the central server sends initial parameters to the local clients;
- Each local client uses the received model parameters to update its own model and then performs local model training. After local training, each client obtains its local model parameters, which are then encrypted using techniques such as homomorphic encryption or differential privacy;
- All clients send their encrypted data to the central server;
- The server receives the encrypted data without decrypting them. It uses secure federated learning aggregation algorithms to aggregate the parameters uploaded by the participants.

#### 3.2. MLP and ResNet

## 4. High-Performance Aggregation Mechanism

#### 4.1. Cyclic Adaptive Learning Rate Strategy

Algorithm 1 Learning Rate Adjustment. |

function ADJUST_LEARNING_RATE(Passing values: communication round number $RoundNum$, current loss $\mathrm{loss}[i]$, historical loss $\mathrm{loss}[i-1]$, and loss ratio threshold threshold) |

Calculate loss ratio: |

$\mathrm{loss}\_r=\frac{loss[i]}{loss[i-1]}$ |

Calculate rate of change: |

$ChangeRate=|{loss\_r-1|}^{2}$ |

if $ChangeRate<1$ then |

$ChangeRate=ChangeRate+1$ |

end if |

Calculate learning rate adjustment factor: |

$\upsilon =\frac{1}{C\mathrm{hange}Rat{e}^{\sqrt{RoundNum}}}$ |

if $R\mathrm{ound}N\mathrm{um}\%100=0$ then |

Set new learning rate: |

${\eta}_{\mathrm{i}+1}=0.001$ |

else if $R\mathrm{ound}N\mathrm{um}\%100\ne 0$ and $\mathrm{threshold}>|loss\_r|$ then Calculate ${\eta}_{i+1}$ : |

${\eta}_{i+1}={\eta}_{i}\times (1-\upsilon )$ |

else if $R\mathrm{ound}N\mathrm{um}\%100\ne 0$ and $loss\_r>\mathrm{max}orloss\_r<\mathrm{min}$ then Calculate ${\eta}_{i+1}$: |

${\eta}_{i+1}={\eta}_{i}\times (1+\upsilon )$ |

else |

Set new learning rate: |

${\eta}_{i+1}={\eta}_{\mathrm{i}}$ |

Unchanged |

end if |

#### 4.2. Weighted Random Sampling Strategy

- Initialization: For each client, its sampling weight is determined based on the number of times it has been sampled. Therefore, a global list $count[]$ is defined to record the sampling count for each client. The sampling count for all clients in the $count[]$ list is initially set to 0;
- Initial Weight Assignment: ${\omega}_{id}$ represents the weight of random sampling for each client. The initial sampling weights for each client are defined as equal, with the initial unnormalized value for each client being $\overline{{\omega}_{(id,0)}}=1.0$;
- Sampling Round Update: The sampling count in the $count[]$ list is increased by 1 for the selected clients in each sampling round. ${\omega}_{(id,i)}$ represents the weight of the client during the ith interaction between the client and the central server;
- Weight Adjustment: Based on the number of times each client has been sampled, the client’s sampling weight is adjusted. Typically, clients with more sampling will receive lower weights to balance the sampling results. The client weight adjustment formula is as follows:$$\overline{{\omega}_{(id,i)}}=\frac{\overline{{\omega}_{(id,i-1)}}}{count[id]},$$
- Weight Normalization: To ensure that the total sum of sampling weights for all clients is equal to 1, the client’s sampling weights are normalized. The weight normalization formula is as follows:$${\omega}_{(id,i)}=\frac{\overline{{\omega}_{(id,i)}}}{{\displaystyle \sum _{\mathrm{i}=0}^{{\mathrm{num}}_{clients}-1}\overline{{\omega}_{(id,i)}}}},$$
- Random Sampling: Based on the sampling ratio $k$ and the client’s sampling weights ${\omega}_{(id,i)}$, random sampling is performed. The sampling ratio determines the probability of selecting each client;
- Updating Sampling Counts: The sampling count for the selected clients is increased by 1 to reflect their participation;
- Returning Sampling Results: The finally selected clients are assembled into a list and returned to the aggregation algorithm for further parameter aggregation.

Algorithm 2 Sample Clients. |

function Sample_Clients(Passing values: sample ratio $\mathrm{sample}\_ratio$) |

if sampler=None then |

Sampler <- RandomSampler($\mathrm{num}\_clients$) |

end if |

Define $count[]$:this list records the number of times each client has been sampled. |

When a client is selected multiple times, update the sampling count for each client in the $count[]$: |

$\overline{{\omega}_{(id,i)}}=\frac{\overline{{\omega}_{(id,i-1)}}}{count[id]}$ |

After that, normalize the weights of each client: |

${\omega}_{(id,i)}=\frac{\overline{{\omega}_{(id,i)}}}{{\displaystyle \sum _{\mathrm{i}=0}^{{\mathrm{num}}_{clients}-1}\overline{{\omega}_{(id,i)}}}}$ |

Perform weighted random sampling with the given total number of clients $\mathrm{num}\_clients$, individual client weights ${\omega}_{(id,i)}$, and sampling ratio $\mathrm{sample}\_ratio$: |

Sampled <- random.choices(range($\mathrm{num}\_clients$), $\mathrm{weights}={\omega}_{\left(\mathrm{id},\mathrm{i}\right)}$, $\mathrm{k}=\mathrm{sample}\_ratio$) |

for $clients$ in Sampled do |

$\mathrm{count}\left[\mathrm{id}\right]=\mathrm{count}\left[\mathrm{id}\right]+1$ |

end for |

assert $\mathrm{num}\_clients\_per\_round=len(sampled)$ |

return
$\mathrm{sorted}\left(\mathrm{sampled}\right)$ |

end function |

#### 4.3. Complexity Analysis of CALR-WRS Algorithm

- Analysis of Client-Side Computational Complexity:

- Analysis of Server-Side Aggregation Complexity:

- Analysis of Communication Cost:

## 5. Experiment and Performance Evaluation

#### 5.1. Cyclic Adaptive Learning Rate Algorithm

#### 5.2. Weighted Random Sampling Based on Sampling Times

#### 5.3. High-Performance Federated Learning Aggregation Algorithm

## 6. Discussion

## 7. Evaluation

## 8. Conclusions

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Harika, J.; Baleeshwar, P.; Navya, K.; Shanmugasundaram, H. A Review on artificial intelligence with deep human reasoning. In Proceedings of the 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India, 9–11 May 2022; IEEE Press: Piscataway, NJ, USA, 2022; pp. 81–84. [Google Scholar]
- Wang, J.Z.; Kong, L.W.; Huang, Z.; Chen, L.; Liu, Y.; He, A.; Xiao, J. Research review of federated learning algorithms. Big Data
**2020**, 6, 64–82. [Google Scholar] - McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. arXiv
**2017**, arXiv:1602.05629. [Google Scholar] - Zhang, X.; Li, X.; Tang, W.; Hao, Y.; Xue, J. A Verifiable Privacy-Preserving Cross-Domain Federated Learning Scheme for Cloud-Edge Fusion. Comput. Eng.
**2023**, 1–11. [Google Scholar] - Cao, Z.; Shao, L.; Zhao, W. Federated Optimization Algorithm for Heterogeneous Networks. Ind. Control. Comput.
**2023**, 36, 10–12. [Google Scholar] - Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
- Xie, Y. Privacy-Preserving Federated Learning Method Based on Local Differential Privacy. Inf. Technol. Inform.
**2023**, 2023, 160–163. [Google Scholar] - McMahan, H.B.; Ramage, D.; Talwar, K.; Zhang, L. Learning differentially private recurrent language models. arXiv
**2017**, arXiv:1710.06963. [Google Scholar] - Li, Y.; Long, C.; Wei, J.; Li, J.; Yang, F.; Li, J. Privacy-Preserving Face Recognition Method Based on Homomorphic Encryption. Inf. Secur. Res.
**2023**, 9, 843–850. [Google Scholar] - Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Papadopoulos, D.; Yang, Q. Secure boost: A lossless federated learning framework. IEEE Intell. Syst.
**2021**, 36, 87–98. [Google Scholar] [CrossRef] - Briggs, C.; Fan, Z.; Andras, P. Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA; pp. 1–9. [Google Scholar] [CrossRef]
- Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
- Ye, J.; Wei, T.; Hu, L.; Luo, S.; Li, X. An Efficient Federated Learning Algorithm for the Internet of Intelligent Things. Comput. Eng.
**2023**, 1–11. [Google Scholar] [CrossRef] - Geyer, R.C.; Klein, T.; Nabi, M. Differentially Private Federated Learning: A Client Level Perspective. arXiv
**2017**, arXiv:1712.07557. [Google Scholar] - Chen, Y.; Sun, X.; Jin, Y. Communication-efficient federated deep learning with layer wise asynchronous model update and temporally weighted aggregation. IEEE Trans. Neural Netw. Learn. Syst.
**2020**, 31, 4229–4238. [Google Scholar] [CrossRef] [PubMed] - Haddadpour, F.; Kamani, M.M.; Mokhtari, A.; Mahdavi, M. Federated learning with compression: Unified analysis and sharp guarantees. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, San Diego, CA, USA, 13–15 April 2021; pp. 2350–2358. [Google Scholar]
- Zhang, W.; Li, X.; Ma, H.; Luo, Z.; Li, X. Federated learning for machinery fault diagnosis with dynamic validation and self-supervision. Knowledgeased Syst.
**2021**, 213, 106679. [Google Scholar] [CrossRef] - Meng, X.; Liu, T.; Xie, R. A Privacy-preserving Scheme of Learning Rate Clipping Gradient Optimization for Federated Learning. J. Beijing Electron. Sci. Technol. Inst.
**2023**, 31, 45–53. [Google Scholar] - Mercier, Q.; Poirion, F.; Desideri, J.A. A stochastic multiple gradient descent algorithm. Eur. J. Oper. Res.
**2018**, 271, 808–817. [Google Scholar] [CrossRef] - Zhou, Y.; Zhang, M.; Zhu, J.; Zheng, R.; Wu, Q. Arandomized block-coordinate adam online learning optimization algorithm. Neural Comput. Appl.
**2020**, 32, 12671–12684. [Google Scholar] [CrossRef] - Shi, H. Research on Multi-Factor Short-Term Load Forecasting Based on AdaBelief Optimized Deep Learning Models. Ph.D. Thesis, Shaanxi University of Technology, Hanzhong, China, 2023. [Google Scholar] [CrossRef]
- Rodríguez-Barroso, N.; Jiménez-López, D.; Luzón, M.V.; Herrera, F.; Martínez-Cámara, E. Survey on Federated Learning Threats: Concepts, taxonomy on attacks and defences, experimental study and challenges. Inf. Fusion
**2023**, 90, 148–173. [Google Scholar] [CrossRef] - Liu, Y.; Kang, Y.; Xing, C.; Chen, T.; Yang, Q. A secure federated transfer leaning framework. IEEE Intell. Syst.
**2020**, 35, 70–82. [Google Scholar] [CrossRef] - Lee, S.; Sahu, A.K.; He, C.; Avestimehr, S. Partial model averaging in federated learning: Performance guarantees and benefits. arXiv
**2022**, arXiv:2201.03789. [Google Scholar] [CrossRef] - Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv
**2014**, arXiv:1408.5882. [Google Scholar] - He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Image net classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE
**1998**, 86, 2278–2324. [Google Scholar] [CrossRef] - Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.researchgate.net/publication/306218037_Learning_multiple_layers_of_features_from_tiny_images (accessed on 12 September 2023).

Model | Parameters | Network Structure | Suitable Datasets |
---|---|---|---|

MLP | Few | Input Layer—Hidden Layers (Multiple)—Output Layer | Small-sized datasets |

AlexNet [27] | Large | Convolutional Layers (Multiple)—Fully Connected Layers (Multiple)—Output Layer | Large image datasets |

ResNet | Moderate | Convolutional Layers (Multiple)—Residual Blocks (Multiple)—Fully Connected Layers | Large image datasets |

CNN | Moderate | Convolutional Layers (Multiple)—Pooling Layers (Multiple)—Fully Connected Layers | Image datasets |

Algorithm | Time Complexity | Communication Cost |
---|---|---|

Proposed Algorithm | $O(QW)+O(EM)+O(C)$ | $O(QW+CW)$ |

FedAvg | $O(QW)+O(EM)$ | $O(QW+CW)$ |

FedProx | $O(QW)+O(EM)$ | $O(QW+CW)$ |

Equipment | Parameter |
---|---|

Operating system | Windows 11 |

CPU | AMD Ryzen 7 5700X 8-Core Processor @3.40 GHz, China |

Memory | 16 GB |

Hard disk | SSD 1TB |

GPU | NVIDIA GeForce RTX 4080, USA |

Torch | 11.8 |

FedLab | 1.3.0 |

Dataset | Metric | Fixed Learning Rate and Random Client Sampling | Cyclic Learning Rate and Random Client Sampling | CALR-WRS Algorithm |
---|---|---|---|---|

MNIST | Communication Rounds | 1709 Rounds | 1046 Rounds | 677 Rounds |

Accuracy Change Rate | Slow | Medium | Fast | |

Loss Reduction Stability | Loss Reduction Stability | Moderate Stability | Moderate Stability | |

CIFAR-10 | Communication Rounds | 384 Rounds | 462 Rounds | 261 Rounds |

Accuracy Change Rate | Medium | Slow | Fast | |

Loss Reduction Stability | Loss Reduction Stability | Moderate Stability | Moderate Stability |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gao, Y.; Lu, G.; Gao, J.; Li, J.
A High-Performance Federated Learning Aggregation Algorithm Based on Learning Rate Adjustment and Client Sampling. *Mathematics* **2023**, *11*, 4344.
https://doi.org/10.3390/math11204344

**AMA Style**

Gao Y, Lu G, Gao J, Li J.
A High-Performance Federated Learning Aggregation Algorithm Based on Learning Rate Adjustment and Client Sampling. *Mathematics*. 2023; 11(20):4344.
https://doi.org/10.3390/math11204344

**Chicago/Turabian Style**

Gao, Yulian, Gehao Lu, Jimei Gao, and Jinggang Li.
2023. "A High-Performance Federated Learning Aggregation Algorithm Based on Learning Rate Adjustment and Client Sampling" *Mathematics* 11, no. 20: 4344.
https://doi.org/10.3390/math11204344