DRL-Based Adaptive Time Threshold Client Selection FL
Abstract
1. Introduction
- Asymmetric data contribution: Devices update in varying amounts of data at different times, such as in IoT, smart thermostats update a large amount of data and remain active for an extended period. Meanwhile, smartwatch only updates a slight amount of data and with the most active updating period during the daytime, while a few exceptions during sleep time [14].
- Automatically federated strategy: Traditional strategy adapts with environment is limited. To adapt to the dynamic changes in device conditions over time, the automatic strategy is a suitable approach.
- Adaptive number of client selections: Most of the existing federated client selection schemes select a fixed number of clients. However, to adapt to the dynamic change in the FL environment, the number of client selections should be dynamically adjusted based on the real-time environment.
- Fairness selection: Most existing approaches select devices to participate based on resources and computing capacity as well as data contribution, which creates another issue of fairness selection as those approaches prioritize devices with strong capacity or high contribution to the training. As a result, devices with limited capacity may be excluded from participation throughout the training process.
- Automatic deep reinforcement learning models: We propose DRL-based algorithms to determine optimal time thresholds for each round adaptively. Our model learns to recognize dynamic changes in situations to provide optimal times that can dynamically adjust to adapt to the FL environment. Then, the server identifies and selects a subset of clients who take less time than the time threshold to participate.
- Adaptive number of client selection: Our model dynamically adjusts the number of selected clients in each round based on the time threshold provided by the DRL agent, enabling efficient adaptation to resource and data variability.
- ATCS-FL design efficiency, low latency and alongside with long waiting time for aggregation: DRL-optimized time threshold and selection of a subset of clients enables effective time management to reflect the reward function effective that prioritized the subset of clients that take less time threshold in every communication round. Also reflecting from using the time threshold setting that of our maximum threshold equals the average time of random selection approach in the same environment setup result in reducing latency by avoiding long waiting times. The clients that finish training within the limited time threshold are chosen.
- Our proposed experiment typically reduces latency and improves accuracy under non-IID datasets. Specifically, ATCS-FL can reduce latency by approximately 75 percents in dynamic situations compared to the baseline algorithm and state-of-the-art solutions. With the increasing level of non-IID, our proposed approach can learn adaptively in the non-IID environment.
2. Related Works
2.1. Time-Based Client Selection FL
2.2. DRL-Based Client Selection
3. System Model and Problem Formulation
3.1. Basic Setup of FL
3.2. System Heterogeneous
- Local computing time: During the iteration of training gradient descent, the required time is estimated based on client-specific resources: CPU frequency , number of cores , CPU cycles process , and the size of local data in bits , and the calculated local computing is then given by
- Transmission time: During communication with the central server, each client takes time to transfer local model parameters to the server; the transmission time is calculated based on the bandwidth of clients at round t, and size of local model . The transmission time is defined as
3.3. Problem Formulation
4. Proposed ATCS-FL Approach
4.1. Adaptive Time Threshold Client Selection for Federated Learning
Algorithm 1 Adaptive time threshold client selection federated learning | |||
Central server: | |||
1 | Initialization: Global parameter , Threshold time | ||
2 | Fordo | ||
3 | Compute expected training time | ||
4 | If clients have , considered as ; is the selected client current round | ||
5 | ; is subset of selected clients | ||
6 | End if | ||
7 | Sent global parameter to | ||
8 | Waiting for all client in update | ||
9 | Received all local parameter update | ||
10 | Perform global model aggregation | ||
11 | Update for next round | ||
12 | Update | ||
End for | |||
Client: | |||
1 | Initialization: | ||
2 | Receive global parameter from server | ||
3 | For | ||
4 | Perform local training | ||
5 | Update to server | ||
6 | End for |
4.2. DRL-Based Adaptive Time Threshold
- (1)
- DDQN training: This stage is responsible for determining the optimal time threshold. The agent interacts with the environment by observing the current state input and receiving a reward as feedback to evaluate the quality of its actions. The Q-network, deep neural network (DNN), takes the state as input and estimates Q-values to determine the optimal actions. A separate target network periodically updated with the parameters of the Q-network to provide more stable target Q-values to guide learning and improve convergence. By learning to maximize expected cumulative rewards, the agent ultimately selects the optimal action (i.e., the best time threshold) that can improve training performance in dynamic environments.
- (2)
- Client selection: After receiving the optimal time threshold from the DDQN agent, the server estimates the training time for each client. Clients that expect to finish the training time within the threshold are selected to participate in the training round.
- (3)
- Distribute global model: After identifying the selected clients, the server processes the global model parameter and sends it to the selected clients for computing the local model.
- (4)
- Local training: After receiving the global model parameter, participants start local training and update their local model to the server.
- (5)
- Global aggregation: After receiving local model parameter updates from all selected clients, the server processes global aggregation updates for the subsequent communication round.
- State: We use vector to denote the state of round . Here is training round index. is the local parameter of client at round , and denotes the data amount of client at round . denote as number of cores, CPU cycles process , CPU frequency of client at round , is bandwidth of client at round .
- Action: At each round , a threshold time , which is used as a limit threshold time for aggregation, will be determined by the agent, called an action . Given the current state, the DDQN agent chooses an action based on its policy, which is expressed as a probability distribution over the entire action space. We use neural networks to represent the policy , where mapping from state to action. -greedy is the probability of taking the action at the state .
- Reward: When an action is applied at round , the agent will receive a reward as a combination of threshold time and difference loss value. , . Our reward function combines negative time threshold emphasizes to the subset of clients that spend less training time every communication round and negative of loss value to ensure the performance improving resulted in less total training time and accuracy improvement.
Algorithm 2 DRL-based adaptive time threshold | |||
Input: Main network, Target network, Discount factor | |||
Output: Time threshold | |||
1 | Initialization: Experience replay , Mini-batch , Main network model , parameter , Target network model , parameter ψ | ||
2 | Set the probability value in the -greedy policy and updating rate | ||
3 | For do | ||
4 | Obtain current state , action space based on environment | ||
5 | Random select action with probability , or select the current optimal action according to the model | ||
6 | Perform action to obtain and reward | ||
7 | Deposit experience into | ||
8 | While exceeds the maximum number of iteration or is full do | ||
9 | Sampling experience from | ||
10 | Calculate the target -value | ||
11 | Perform gradient descent on the loss function and update the main network parameter | ||
12 | Update the target network parameter every round | ||
13 | End while | ||
14 | End for | ||
15 | Output the time threshold update to line 12 of Algorithm 1 in every round |
5. Simulation Evaluation
5.1. Evaluation Setting
5.2. Simulation Result
- Latency
- Accuracy
- Non-IID level
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
- Wang, D.; Shi, S.; Zhu, Y.; Han, Z. Federated analytics: Opportunities and challenges. IEEE Netw. 2021, 36, 151–158. [Google Scholar] [CrossRef]
- Berkani, M.R.; Chouchane, A.; Himeur, Y.; Ouamane, A.; Miniaoui, S.; Atalla, S.; Mansoor, W.; Al-Ahmad, H. Advances in federated learning: Applications and challenges in smart building environments and beyond. Computers 2025, 14, 124. [Google Scholar] [CrossRef]
- Tang, Z.; Zhang, Y.; Shi, S.; He, X.; Han, B.; Chu, X. Virtual homogeneity learning: Defending against data heterogeneity in federated learning. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, ML, USA, 17–23 July 2022; pp. 21111–21132. [Google Scholar]
- Deng, Y.; Lyu, F.; Ren, J.; Wu, H.; Zhou, Y.; Zhang, Y.; Shen, X. AUCTION: Automated and quality-aware client selection framework for efficient federated learning. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 1996–2009. [Google Scholar] [CrossRef]
- Lim, W.Y.; Luong, N.C.; Hoang, D.T.; Jiao, Y.; Liang, Y.C.; Yang, Q.; Niyato, D.; Miao, C. Federated learning in mobile edge networks: A comprehensive survey. IEEE Commun. Surv. Tutor. 2020, 22, 2031–2063. [Google Scholar] [CrossRef]
- Ye, M.; Fang, X.; Du, B.; Yuen, P.C.; Tao, D. Heterogeneous federated learning: State-of-the-art and research challenges. ACM Comput. Surv. 2023, 56, 1–44. [Google Scholar] [CrossRef]
- Li, J.; Chen, T.; Teng, S. A comprehensive survey on client selection strategies in federated learning. Comput. Netw. 2024, 251, 110663. [Google Scholar] [CrossRef]
- Fu, L.; Zhang, H.; Gao, G.; Zhang, M.; Liu, X. Client selection in federated learning: Principles, challenges, and opportunities. IEEE Internet Things J. 2023, 10, 21811–21819. [Google Scholar] [CrossRef]
- Mayhoub, S.; Shami, T.M. A review of client selection methods in federated learning. Arch. Comput. Methods Eng. 2024, 31, 1129–1152. [Google Scholar] [CrossRef]
- Bouaziz, S.; Benmeziane, H.; Imine, Y.; Hamdad, L.; Niar, S.; Ouarnoughi, H. Flash-rl: Federated learning addressing system and static heterogeneity using reinforcement learning. In Proceedings of the 2023 IEEE 41st International Conference on Computer Design (ICCD), Washington, DC, USA, 6–8 November 2023; pp. 444–447. [Google Scholar]
- Nishio, T.; Yonetani, R. Client selection for federated learning with heterogeneous resources in mobile edge. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20 May 2019; pp. 1–7. [Google Scholar]
- Liu, X.; Chen, T.; Qian, F.; Guo, Z.; Lin, F.X.; Wang, X.; Chen, K. Characterizing smartwatch usage in the wild. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 16 June 2017; pp. 385–398. [Google Scholar]
- Zhang, T.; Gao, L.; He, C.; Zhang, M.; Krishnamachari, B.; Avestimehr, A.S. Federated learning for the internet of things: Applications, challenges, and opportunities. IEEE Internet Things Mag. 2022, 5, 24–29. [Google Scholar] [CrossRef]
- Saputra, Y.M.; Nguyen, D.N.; Hoang, D.T.; Pham, Q.V.; Dutkiewicz, E.; Hwang, W.J. Federated learning framework with straggling mitigation and privacy-awareness for AI-based mobile application services. IEEE Trans. Mob. Comput. 2022, 22, 5296–5312. [Google Scholar] [CrossRef]
- Hard, A.; Girgis, A.M.; Amid, E.; Augenstein, S.; McConnaughey, L.; Mathews, R.; Anil, R. Learning from straggler clients in federated learning. arXiv 2024, arXiv:2403.09086. [Google Scholar] [CrossRef]
- Lang, N.; Cohen, A.; Shlezinger, N. Stragglers-aware low-latency synchronous federated learning via layer-wise model updates. IEEE Trans. Commun. 2024, 73, 3333–3346. [Google Scholar] [CrossRef]
- Xia, W.; Quek, T.Q.; Guo, K.; Wen, W.; Yang, H.H.; Zhu, H. Multi-armed bandit-based client scheduling for federated learning. IEEE Trans. Wirel. Commun. 2020, 19, 7108–7123. [Google Scholar] [CrossRef]
- Lee, J.; Ko, H.; Pack, S. Adaptive deadline determination for mobile device selection in federated learning. IEEE Trans. Veh. Technol. 2021, 71, 3367–3371. [Google Scholar] [CrossRef]
- Liu, W.; Cui, T.; Shen, B.; Huang, X.; Chen, Q. Adaptive waiting time asynchronous federated learning in edge computing. In Proceedings of the 2023 International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 2–4 November 2023; pp. 540–545. [Google Scholar]
- Zhai, S.; Jin, X.; Wei, L.; Luo, H.; Cao, M. Dynamic federated learning for GMEC with time-varying wireless link. IEEE Access 2021, 9, 10400–10412. [Google Scholar] [CrossRef]
- Chen, B.; Ivanov, N.; Wang, G.; Yan, Q. DynamicFL: Balancing communication dynamics and client manipulation for federated learning. In Proceedings of the 2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Madrid, Spain, 11–14 September 2023; pp. 312–320. [Google Scholar]
- Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep reinforcement learning assisted federated learning algorithm for data management of IIoT. IEEE Trans. Ind. Inform. 2021, 17, 8475–8484. [Google Scholar] [CrossRef]
- Zhao, Z.; Li, A.; Li, R.; Yang, L.; Xu, X. FedPPO: Reinforcement learning-based client selection for federated learning with heterogeneous data. IEEE Trans. Cogn. Commun. Netw. 2025, 1. [Google Scholar] [CrossRef]
- Liu, J.; Xu, H.; Wang, L.; Xu, Y.; Qian, C.; Huang, J.; Huang, H. Adaptive asynchronous federated learning in resource-constrained edge computing. IEEE Trans. Mob. Comput. 2021, 22, 674–690. [Google Scholar] [CrossRef]
- Shi, Y.; Liu, Z.; Shi, Z.; Yu, H. Fairness-aware client selection for federated learning. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 324–329. [Google Scholar]
- Zhang, M.; Zhao, H.; Ebron, S.; Xie, R.; Yang, K. Ensuring fairness in federated learning services: Innovative Approaches to Client Selection, Scheduling, and Rewards. In Proceedings of the 2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS), Jersey City, NJ, USA, 23–26 July 2024; pp. 762–773. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 2002, 86, 2278–2324. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated learning with non-iid data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- O’shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar] [CrossRef]
- Yurochkin, M.; Agarwal, M.; Ghosh, S.; Greenewald, K.; Hoang, N.; Khazaeni, Y. Bayesian nonparametric federated learning of neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7252–7261. [Google Scholar]
- Wang, H.; Yurochkin, M.; Sun, Y.; Papailiopoulos, D.; Khazaeni, Y. Federated learning with matched averaging. arXiv 2020, arXiv:2002.06440. [Google Scholar] [CrossRef]
- Steiner, B.; DeVito, Z.; Chintala, S.; Gross, S.; Paske, A.; Massa, F.; Lerer, A.; Chanan, G.; Lin, Z.; Yang, E.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Zeng, D.; Liang, S.; Hu, X.; Wang, H.; Xu, Z. Fedlab: A flexible federated learning framework. J. Mach. Learn. Res. 2023, 24, 1–7. [Google Scholar]
Notation | Description |
---|---|
Total number of clients | |
Number of clients selected at round | |
Total number of samples | |
Local dataset with number of sample size | |
Number of sample data | |
Local loss function | |
Global loss function | |
Aggregation weight of client , | |
Global parameter at round t | |
Learning rate | |
Local parameter of client at round t | |
Number of samples of client at round | |
Number of process cores of client | |
CPU frequency of client at round | |
Bandwidth of client at round | |
Time threshold at round | |
Latency of client at round | |
Local computing time | |
Transmission time for transferring the local model parameters | |
State (, , , , ) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sam, S.; Iv, T.; Mom, R.; Kang, S.; Song, I.; Ros, S.; Riel, S.; Kim, S. DRL-Based Adaptive Time Threshold Client Selection FL. Symmetry 2025, 17, 1700. https://doi.org/10.3390/sym17101700
Sam S, Iv T, Mom R, Kang S, Song I, Ros S, Riel S, Kim S. DRL-Based Adaptive Time Threshold Client Selection FL. Symmetry. 2025; 17(10):1700. https://doi.org/10.3390/sym17101700
Chicago/Turabian StyleSam, Sreyleak, Taikuong Iv, Rothny Mom, Seungwoo Kang, Inseok Song, Seyha Ros, Sovanndoeur Riel, and Seokhoon Kim. 2025. "DRL-Based Adaptive Time Threshold Client Selection FL" Symmetry 17, no. 10: 1700. https://doi.org/10.3390/sym17101700
APA StyleSam, S., Iv, T., Mom, R., Kang, S., Song, I., Ros, S., Riel, S., & Kim, S. (2025). DRL-Based Adaptive Time Threshold Client Selection FL. Symmetry, 17(10), 1700. https://doi.org/10.3390/sym17101700