FedDCS: Semi-Asynchronous Federated Learning Optimization Based on Dynamic Client Selection
Abstract
1. Introduction
- We propose an adaptive method for predicting client completion times to optimize aggregation scheduling. This approach uses a refined exponential smoothing prediction model, with key innovations including outlier filtering and change-point detection. This combined design offers robust predictions that are resistant to outlier interference while maintaining continuous adaptability to dynamic environmental changes.
- We propose a two-stage waiting mechanism grounded in the prediction of client completion times. Building upon adaptive prediction techniques, the server orchestrates the aggregation process dynamically. It intelligently adjusts the waiting time across two distinct stages, thereby optimizing the trade-off between the number of aggregated client updates and the overall aggregation duration. This design allows FedDCS to independently balance time efficiency and model quality within heterogeneous environments.
- We provide a convergence analysis for FedDCS in the smooth non-convex setting and evaluate FedDCS across a variety of highly diverse and challenging environments. The experimental results show that FedDCS consistently outperforms leading synchronous, asynchronous, and semi-asynchronous baselines in both convergence speed and final model accuracy. It also demonstrates exceptional robustness under extreme conditions with severe client staleness or highly skewed data distributions.
2. Related Work
3. Methodology
3.1. Client Training Time Prediction
3.2. Early-Batch Segmentation for Dynamic Buffer Sizing
| Algorithm 1 Early-Batch Segmentation Algorithm |
| Input: Predicted training completion times of clients end_training_time = , Hyperparameter for segmentation threshold Output: Number of clients in the fastest batch , Maximum waiting time for the batch
|
3.3. The Two-Stage Waiting Mechanism
- First-Stage Waiting Mechanism: The first-stage wait is the primary period for client model collection, during which the server actively receives client updates. The value of is the output by Algorithm 1, set to the maximum predicted completion time among the clients in the identified fastest batch. As detailed in Algorithm 2, the server initiates a countdown timer for and begins receiving incoming client updates. This stage is designed to be adaptive through three core rules: (1) Upon receiving an update, the remaining wait time is reduced by a decay factor , dynamically shortening the wait as client update arrive; (2) The wait terminates immediately if the buffer reaches its target capacity , preventing unnecessary delay; (3) If the timer expires before the buffer is full, the stage ends forcefully to avoid indefinite stalling caused by extremely slow or unresponsive clients. These rules collectively ensure that the system is neither idle nor stalled, proactively advancing the training process.
- Second-Stage Waiting Mechanism: Empirical observation reveals that a cluster of clients may complete training shortly after the buffer is full. To harness these “following” clients, the second-stage wait is triggered upon completion of the first stage. is implemented as a short, resettable timer. Its core logic is to dynamically extend the collection window: whenever a new client update arrives within the current T2 window, the timer is reset. This creates a rolling window that persists as long as clients complete in close temporal succession, effectively capturing a naturally occurring cohort of clients. The stage concludes when a full duration elapses without any new arrival, signaling that the trailing cluster has been fully collected. The settings of are critical; an optimal value must balance the benefit of collecting additional clients against the cost of added latency.
| Algorithm 2 Two-Stage Waiting Algorithm for Client Selection |
| Input: First-stage waiting time , Second-stage waiting time , Buffer size , Time decay coefficient Output: Collected client set and the number of clients n
|
3.4. Asynchronous Aggregation
3.5. Convergence Analysis
4. Experimental Results and Discussion
4.1. Evaluation
4.1.1. Simulation of Heterogeneity
4.1.2. Datasets
4.1.3. Baselines
- FedAvg [1]: The foundational synchronous FL method.
- FedAnp [18]: An adaptive synchronous method that employs dynamic client sampling to improve efficiency against system heterogeneity.
- FedAsync [6]: A canonical asynchronous method where the server aggregates the global model immediately upon receiving any client update.
- FedFa [23]: An enhanced asynchronous method that maintains a buffer, enabling “single-client triggering, multi-client aggregation”.
- FedBuff [10]: A semi-asynchronous method that aggregates a subset of faster clients per round.
4.1.4. Evaluation Metrics
4.2. Experimental Setup
4.2.1. Implementation Setting
4.2.2. Model and Training Configuration
4.3. Results and Analysis
4.3.1. Convergence Performance
4.3.2. Accuracy and F1 Score
4.3.3. Time Efficiency
4.3.4. Ablation Study
- FedDCS: The full proposed strategy with the complete two-stage adaptive waiting mechanism.
- FedDCS-T1: A variant retaining only the first-stage waiting mechanism. This effectively reduces the method to a FedBuff variant with a dynamically sized buffer.
- FedDCS-T2: A variant retaining only the second-stage waiting mechanism, with a fixed buffer size of 20. This is equivalent to augmenting the standard FedBuff with the second-stage waiting mechanism as an optimization patch.
- FedBuff: The standard semi-asynchronous baseline, which employs a fixed buffer size and contains neither of the proposed waiting mechanisms.
- The first-stage waiting mechanism ensures baseline time efficiency. On simpler tasks such as Fashion-MNIST, FedDCS-T1 achieves higher early-stage efficiency than FedBuff and FedDCS-T2. However, on the complex CIFAR-100 task, it underperforms relative to FedDCS-T2 and the full FedDCS. This indicates that dynamically adjusting the buffer size based on client progress is effective at adapting to common performance variations and avoiding the inefficiencies of a fixed buffer. Nonetheless, its standalone contribution is limited, particularly in handling dynamic environmental changes.
- The second-stage waiting mechanism enhances efficiency on complex tasks. For the more challenging CIFAR-10 and CIFAR-100 datasets, FedDCS-T2 demonstrates significantly higher early-stage time efficiency than FedBuff and slightly outperforms FedDCS-T1. This validates that the second-stage waiting mechanism, by purposefully incorporating updates from “following” clients, enriches the information content of each communication round, thereby accelerating early-phase convergence. However, a fixed buffer size can result in excessive waiting or unnecessary delays when clients experience sudden slowdowns.
- Synergistic effect for optimal performance. The complete FedDCS strategy achieves the best overall performance across all three datasets. It leads comprehensively in both time efficiency and final accuracy on Fashion-MNIST and CIFAR-100. On CIFAR-10, it achieves the highest convergence rate while reaching peak accuracy. These results demonstrate a clear synergistic effect: the first-stage waiting mechanism establishes an efficient, adaptive scheduling foundation, while the second-stage waiting mechanism performs refined optimization on this basis. This combination is crucial for robust performance in environments with compounded device and data heterogeneity.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 9–11 May 2017; JMLR: Norfolk, MA, USA, 2017; Volume 54. [Google Scholar] [CrossRef]
- Asad, M.; Moustafa, A.; Ito, T. FedOpt: Towards Communication Efficiency and Privacy Preservation in Federated Learning. Appl. Sci. 2020, 10, 2864. [Google Scholar] [CrossRef]
- Carrillo, J.A.; Trillos, N.G.; Li, S.; Zhu, Y. FedCBO: Reaching Group Consensus in Clustered Federated Learning through Consensus-Based Optimization. J. Mach. Learn. Res. 2024, 25, 1–51. [Google Scholar] [CrossRef]
- Yang, Y.; Hui, B.; Yuan, H.; Gong, N.; Cao, Y. PrivateFL: Accurate, Differentially Private Federated Learning via Personalized Data Transformation. In Proceedings of the 32nd USENIX Security Symposium, Anaheim, CA, USA, 9–11 August 2023; pp. 1595–1612. [Google Scholar]
- Chen, J.; Tang, H.; Cheng, J.; Yan, M.; Zhang, J.; Xu, M.; Nie, L. Breaking Barriers of System Heterogeneity: Straggler-Tolerant Multimodal Federated Learning via Knowledge Distillation. In Proceedings of the International Joint Conference on Artificial Intelligence, Jeju Island, Republic of Korea, 3–9 August 2024. [Google Scholar] [CrossRef]
- Xie, C.; Koyejo, S.; Gupta, I. Asynchronous Federated Optimization. arXiv 2019, arXiv:1903.03934. [Google Scholar] [CrossRef]
- Li, Y.; Yang, S.; Ren, X.; Shi, L.; Zhao, C. Multi-Stage Asynchronous Federated Learning with Adaptive Differential Privacy. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 1243–1256. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.; Li, D.; Yan, Z. Wireless Federated Learning with Asynchronous and Quantized Updates. IEEE Commun. Lett. 2023, 27, 2393–2397. [Google Scholar] [CrossRef]
- Li, W.; Lv, T.; Ni, W.; Zhao, J.; Hossain, E.; Poor, H.V. Route-and-Aggregate Decentralized Federated Learning Under Communication Errors. IEEE TNNLS 2025, 36, 16675–16691. [Google Scholar] [CrossRef]
- Nguyen, J.; Malik, K.; Zhan, H.; Yousefpour, A.; Rabbat, M.; Malek, M.; Huba, D. Federated Learning with Buffered Asynchronous Aggregation. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Virtual Conference, 28–30 March 2022; pp. 3581–3607. [Google Scholar] [CrossRef]
- Wang, Y.; Cao, Y.; Wu, J.; Chen, R.; Chen, J. Tackling the Data Heterogeneity in Asynchronous Federated Learning with Cached Update Calibration. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Chen, M.; Mao, B.; Ma, T. FedSA: A Staleness-Aware Asynchronous Federated Learning Algorithm with Non-IID Data. Future Gener. Comput. Sy. 2021, 120, 1–12. [Google Scholar] [CrossRef]
- Su, N.; Li, B. How Asynchronous Can Federated Learning Be? In Proceedings of the 2022 IEEE/ACM 30th International Symposium on Quality of Service, Oslo, Norway, 10–12 June 2022; pp. 1–11. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
- Zhao, Y.; Li, M.; Lai, L.; Suda, N.; Civin, D.; Chandra, V. Federated Learning with Non-IID Data. arXiv 2018, arXiv:1806.00582. [Google Scholar] [CrossRef]
- Liu, B.; Lv, N.; Guo, Y.; Li, Y. Recent Advances on Federated Learning: A Systematic Survey. Neurocomputing 2024, 597, 128019. [Google Scholar] [CrossRef]
- Chai, Z.; Ali, A.; Zawad, S.; Truex, S.; Anwar, A.; Baracaldo, N.; Cheng, Y. TIFL: A Tier-Based Federated Learning System. In Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, Virtual Conference, 23–26 June 2020; pp. 125–136. [Google Scholar] [CrossRef]
- Reisizadeh, A.; Tziotis, I.; Hassani, H.; Mokhtari, A.; Pedarsani, R. Straggler-Resilient Federated Learning: Leveraging the Interplay Between Statistical Accuracy and System Heterogeneity. IEEE J. Sel. Areas Inf. Theory 2022, 3, 197–205. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar] [CrossRef]
- Liu, R.; Wu, F.; Wu, C.; Wang, Y.; Lyu, L.; Chen, H.; Xie, X. No One Left Behind: Inclusive Federated Learning over Heterogeneous Devices. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 3398–3406. [Google Scholar] [CrossRef]
- Hao, H.; Xu, C.; Zhang, W.; Chen, X.; Yang, S.; Muntean, G.M. Reliability-Aware Optimization of Task Offloading for UAV-Assisted Edge Computing. IEEE Trans. Comput. 2025, 74, 3832–3848. [Google Scholar] [CrossRef]
- Chen, Y.; Sun, X.; Jin, Y. Communication-Efficient Federated Deep Learning with Layerwise Asynchronous Model Update and Temporally Weighted Aggregation. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4229–4238. [Google Scholar] [CrossRef]
- Xu, H.; Zhang, Z.; Di, S.; Liu, B.; Alharthi, K.A.; Cao, J. FedFa: A Fully Asynchronous Training Paradigm for Federated Learning. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence, Jeju, Republic of Korea, 3–9 August 2024; pp. 5281–5288. [Google Scholar] [CrossRef]
- Zhou, C.; Tian, H.; Zhang, H.; Zhang, J.; Dong, M.; Jia, J. TEA-Fed: Time-Efficient Asynchronous Federated Learning for Edge Computing. In Proceedings of the 18th ACM International Conference on Computing Frontiers, Virtual Event, Catania, Italy, 11–13 May 2021; pp. 30–37. [Google Scholar] [CrossRef]
- Leconte, L.; Nguyen, V.M.; Moulines, E. FAVANO: Federated Averaging with Asynchronous Nodes. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Republic of Korea, 14–19 April 2024; pp. 5665–5669. [Google Scholar] [CrossRef]
- Yang, J.; Liu, Y.; Chen, F.; Chen, W.; Li, C. Asynchronous Wireless Federated Learning with Probabilistic Client Selection. IEEE Trans. Wirel. Commun 2024, 23, 7144–7158. [Google Scholar] [CrossRef]
- Hao, J.; Zhao, Y.; Zhang, J. Time Efficient Federated Learning with Semi-asynchronous Communication. In Proceedings of the 2020 IEEE 26th International Conference on Parallel and Distributed Systems, Hong Kong, China, 2–4 December 2020; pp. 156–163. [Google Scholar] [CrossRef]
- Shi, G.; Li, L.; Wang, J.; Chen, W.; Ye, K.; Xu, C. HySync: Hybrid Federated Learning with Effective Synchronization. In Proceedings of the 2020 IEEE 22nd International Conference on High Performance Computing and Communications; IEEE 18th International Conference on Smart City; IEEE 6th International Conference on Data Science and Systems, Yanuca Island, Cuvu, Fiji, 14–16 December 2020; pp. 628–633. [Google Scholar] [CrossRef]
- Chai, Z.; Chen, Y.; Anwar, A.; Zhao, L.; Cheng, Y.; Rangwala, H. FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, USA, 14–19 November 2021; pp. 1–16. [Google Scholar] [CrossRef]
- Liu, J.; Xu, H.; Xu, Y.; Ma, Z.; Wang, Z.; Qian, C.; Huang, H. Communication-Efficient Asynchronous Federated Learning in Resource-Constrained Edge Computing. Comput. Netw. 2021, 199, 108429. [Google Scholar] [CrossRef]
- Chen, Z.; Yi, W.; Shin, H.; Nallanathan, A. Adaptive Semi-Asynchronous Federated Learning Over Wireless Networks. IEEE Trans. Commun. 2025, 73, 394–409. [Google Scholar] [CrossRef]
- Yu, J.; Zhou, R.; Chen, C.; Li, B.; Dong, F. ASFL: Adaptive Semi-Asynchronous Federated Learning for Balancing Model Accuracy and Total Latency in Mobile Edge Networks. In Proceedings of the 52nd International Conference on Parallel Processing, Salt Lake City, UT, USA, 7–10 August 2023; pp. 443–451. [Google Scholar] [CrossRef]
- Singh, N.; Adhikari, M. A Hybrid Semi-Asynchronous Federated Learning and Split Learning Strategy in Edge Networks. IEEE Trans. Netw. Sci. Eng. 2025, 12, 1429–1439. [Google Scholar] [CrossRef]
- Gupta, M.; Gao, J.; Aggarwal, C.C.; Han, J. Outlier Detection for Temporal Data: A Survey. IEEE Trans. Knowl. Data Eng. 2014, 26, 2250–2267. [Google Scholar] [CrossRef]
- Hawkins, D.M.; Olwell, D.H. Cumulative Sum Control Charts and Charting for Quality Improvement; Springer: New York, NY, USA, 1998; pp. 1–247. [Google Scholar] [CrossRef]
- MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 21 June–18 July 1965 and 27 December 1965–7 January 1966. [Google Scholar]
- Metropolis, N.; Ulam, S. The Monte Carlo Method. J. Am. Stat. Assoc. 1949, 44, 335–341. [Google Scholar] [CrossRef]
- Mania, H.; Pan, X.; Papailiopoulos, D.; Recht, B.; Ramchandran, K.; Jordan, M.I. Perturbed Iterate Analysis for Asynchronous Stochastic Optimization. SIAM J. Optim. 2017, 27, 2202–2229. [Google Scholar] [CrossRef]
- Hsu, T.M.H.; Qi, H.; Brown, M. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification. arXiv 2019, arXiv:1909.06335. [Google Scholar] [CrossRef]
- Xiao, H.; Rasul, K.; Vollgraf, R. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv 2017, arXiv:1708.07747. [Google Scholar] [CrossRef]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis, University of Toronto, Toronto, ON, Canada, 2009. [Google Scholar]








| Notation | Description |
|---|---|
| The number of clients participating in aggregation | |
| Round of training | |
| The current version of the global model | |
| The local model version of the client | |
| The aggregation weight of the client | |
| The global model’s aggregation weight | |
| Data heterogeneity coefficient | |
| Predicted completion time of client in round t | |
| Actual completion time of client in round t | |
| Exponential smoothing coefficient | |
| The prediction residual of the client in round t | |
| Cumulative Sum sensitivity parameter | |
| The mean of | |
| The standard deviation of | |
| Dynamic buffer size | |
| The dynamic segment threshold coefficient | |
| Waiting time of the first stage | |
| Waiting time of the second stage | |
| The decay factor of | |
| The Monte Carlo reward weight of the second-stage waiting mechanism | |
| The staleness decay coefficient of the POLY-rule function |
| Dataset Title | Dataset Description | Data Scale |
|---|---|---|
| Fashion-MNIST [40] | Fashion-MNIST (FMNIST) is a widely used benchmark dataset for image classification tasks, often regarded as an alternative to the traditional MNIST handwritten digit dataset. | 70,000 28 × 28-pixel images across 10 categories. |
| CIFAR-10 [41] | CIFAR-10 is a benchmark dataset that encompasses a wide variety of scenes and objects, ranging from natural landscapes to man-made objects, offering high diversity. | 60,000 32 × 32-pixel color natural images across 10 categories. |
| CIFAR-100 [41] | CIFAR-100 is an extension of the CIFAR-10 dataset and is frequently used to evaluate algorithm performance on highly heterogeneous data that require strong generalization capabilities. It features a more granular and complex classification structure. | 60,000 32 × 32-pixel color images across 100 categories. |
| Dataset | Metrics * | FedAvg | FedAnp | FedAsync | FedFa | FedBuff | FedDCS |
|---|---|---|---|---|---|---|---|
| FMNIST ) | Acc (%) | 88.7 ± 0.2 | 88.0 ± 0.5 | 84.0 ± 0.5 | 83.1 ± 0.4 | 88.7 ± 0.2 | 89.0 ± 0.3 |
| F1 (%) | 88.6 ± 0.2 | 87.9 ± 0.5 | 83.7 ± 0.5 | 82.9 ± 0.5 | 88.7 ± 0.2 | 88.9 ± 0.3 | |
| CIFAR10 ) | Acc (%) | 70.9 ± 0.4 | 69.3 ± 0.4 | 54.5 ± 0.7 | 52.8 ± 1.8 | 68.9 ± 0.2 | 70.3 ± 0.3 |
| F1 (%) | 70.8 ± 0.5 | 69.1 ± 0.6 | 54.3 ± 0.9 | 52.4 ± 2.0 | 68.8 ± 0.3 | 70.2 ± 0.3 | |
| CIFAR10 ) | Acc (%) | 69.5 ± 0.4 | 67.7 ± 1.5 | 50.3 ± 0.6 | 49.7 ± 1.6 | 68.2 ± 0.5 | 69.7 ± 0.5 |
| F1 (%) | 69.4 ± 0.4 | 67.3 ± 1.5 | 49.9 ± 0.7 | 48.3 ± 1.6 | 68.0 ± 0.4 | 69.7 ± 0.5 | |
| CIFAR10 ) | Acc (%) | 64.4 ± 2.5 | 58.5 ± 7.7 | 34.5 ± 2.1 | 34.0 ± 2.8 | 59.6 ± 3.2 | 65.0 ± 1.5 |
| F1 (%) | 63.9 ± 2.4 | 57.6 ± 8.0 | 29.6 ± 3.5 | 28.6 ± 4.1 | 59.2 ± 3.8 | 64.9 ± 1.5 | |
| CIFAR100 ) | Acc (%) | 42.8 ± 0.8 | 39.1 ± 0.6 | 35.9 ± 0.4 | 36.2 ± 0.2 | 40.5 ± 0.4 | 42.9 ± 0.6 |
| F1 (%) | 43.6 ± 0.7 | 40.1 ± 0.6 | 35.9 ± 0.5 | 36.1 ± 0.3 | 41.2 ± 0.4 | 43.7 ± 0.6 |
| Dataset ) | Metrics | Training Time Prediction | Early-Batch Segmentation | Monte Carlo Simulation | Total Round Time |
|---|---|---|---|---|---|
| FMNIST | ) | 0.189 | 0.037 | 46.743 | 2175.929 |
| Proportion | 0.009% | 0.002% | 2.148% | 100% | |
| CIFAR10 | ) | 0.264 | 0.052 | 134.338 | 3753.548 |
| Proportion | 0.007% | 0.001% | 3.579% | 100% | |
| CIFAR100 | ) | 0.366 | 0.150 | 179.304 | 5998.491 |
| Proportion | 0.006% | 0.003% | 2.989% | 100% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Liu, R.; Zhang, L. FedDCS: Semi-Asynchronous Federated Learning Optimization Based on Dynamic Client Selection. Mathematics 2026, 14, 803. https://doi.org/10.3390/math14050803
Liu R, Zhang L. FedDCS: Semi-Asynchronous Federated Learning Optimization Based on Dynamic Client Selection. Mathematics. 2026; 14(5):803. https://doi.org/10.3390/math14050803
Chicago/Turabian StyleLiu, Ruilin, and Lili Zhang. 2026. "FedDCS: Semi-Asynchronous Federated Learning Optimization Based on Dynamic Client Selection" Mathematics 14, no. 5: 803. https://doi.org/10.3390/math14050803
APA StyleLiu, R., & Zhang, L. (2026). FedDCS: Semi-Asynchronous Federated Learning Optimization Based on Dynamic Client Selection. Mathematics, 14(5), 803. https://doi.org/10.3390/math14050803
