Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination
Abstract
1. Introduction
- We propose FedGH, a gradient-coordination aggregation strategy specifically designed for keypoint-regression tasks in human pose estimation, and demonstrate its effectiveness by integrating it into a federated learning framework. FedGH employs a gradient-discrepancy metric and the PCGrad projection-correction mechanism within the FedAW aggregation process, followed by parameter reconstruction via weighted averaging to resolve conflicting gradient directions and optimize global updates. We further integrate FedGH into the PoseResNet (ResNet50 backbone) training pipeline, augmented with asynchronous updates and dynamic client selection, to address network heterogeneity and communication delays—thereby enhancing overall system robustness and efficiency.
- We present a novel single-arm robotic rehabilitation pose dataset consisting of 2060 high-resolution images of a single subject, captured from four fixed viewpoints (anterior, posterior, left lateral, right lateral). The images cover both upright standing and arm-extended postures, with a single-arm manipulator placed over the limb—thereby covering the wrist, elbow and shoulder regions. Acquisition was performed under consistent lighting, background and camera settings to balance stability and scene variation. Three keypoints (shoulder, elbow, wrist) were annotated using a combined visual-marker and manual verification workflow to ensure high spatial accuracy. This dataset enables rigorous assessment of FedGH in practical robot-assisted rehabilitation contexts.
- We conduct extensive experiments on both the self-constructed single-arm robotic pose dataset and the MPII benchmark using the average PCK metric. We systematically compare FedAvg, FedBN, FedProx, FedDyn, FedAW, and FedGH to demonstrate the effectiveness and generalization of FedGH across heterogeneous scenarios.
2. Related Work
2.1. Federated Learning
2.2. Human Posture Estimation
3. Methodology
3.1. Framework Overview
3.2. Non-IID Data Partitioning
3.3. FedGH Aggregation Strategy
3.3.1. Gradient Computation
3.3.2. Gradient-Projection Correction
3.3.3. Parameter Reconstruction and Global Aggregation
Algorithm 1: FedGH aggregation algorithm. |
|
3.4. Federated Learning Framework for Human Pose Estimation
4. Experiment
4.1. Dataset
4.2. Experiment Setup
- Train using FedAvg, FedProx, FedDyn, FedBN, and FedAW on the self-constructed dataset;
- Train using FedAvg, FedProx, FedDyn, FedBN, and FedAW on the MPII dataset;
- Based on FedAW, apply the improved FedGH strategy to train on the self-constructed dataset;
- Based on FedAW, apply the improved FedGH strategy to train on the MPII dataset.
4.3. Results
4.3.1. Effectiveness of FedGH
4.3.2. Human Pose Estimation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef] [PubMed]
- Srinivasu, P.N.; Lakshmi, G.J.; Narahari, S.C.; Shafi, J.; Choi, J.; Ijaz, M.F. Enhancing medical image classification via federated learning and pre-trained model. Egypt. Inform. J. 2024, 27, 100530. [Google Scholar] [CrossRef]
- Liu, Y.; Huang, A.; Luo, Y.; Huang, H.; Liu, Y.; Chen, Y.; Feng, L.; Chen, T.; Yu, H.; Yang, Q. Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13172–13179. [Google Scholar]
- Wang, J.; Deng, H.; Wang, Y.; Xie, J.; Zhang, H.; Li, Y.; Guo, S. Multi-sensor fusion federated learning method of human posture recognition for dual-arm nursing robots. Inf. Fusion 2024, 107, 102320. [Google Scholar] [CrossRef]
- Zhuang, W.; Xu, J.; Chen, C.; Li, J.; Lyu, L. Coala: A practical and vision-centric federated learning platform. arXiv 2024, arXiv:2407.16560. [Google Scholar]
- He, C.; Shah, A.D.; Tang, Z.; Sivashunmugam, D.N.; Bhogaraju, K.; Shimpi, M.; Shen, L.; Chu, X.; Soltanolkotabi, M. Fedcv: A federated learning framework for diverse computer vision tasks. arXiv 2021, arXiv:2111.11066. [Google Scholar]
- Li, M.; Gjoreski, M.; Barbiero, P.; Slapničar, G.; Luštrek, M.; Lane, N.D.; Langheinrich, M. A Survey on Federated Learning in Human Sensing. arXiv 2025, arXiv:2501.04000. [Google Scholar]
- Zhang, X.; Sun, W.; Chen, Y. Tackling the non-iid issue in heterogeneous federated learning by gradient harmonization. IEEE Signal Process. Lett. 2024, 31, 2595–2599. [Google Scholar] [CrossRef]
- Efthymiadis, F.; Karras, A.; Karras, C.; Sioutas, S. Advanced Optimization Techniques for Federated Learning on Non-IID Data. Future Internet 2024, 16, 370. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. In Proceedings of the Machine Learning and Systems, Austin, TX, USA, 2–4 March 2020; Volume 2, pp. 429–450. [Google Scholar]
- Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. Fedbn: Federated learning on non-iid features via local batch normalization. arXiv 2021, arXiv:2102.07623. [Google Scholar]
- Tang, Y. Adapted weighted aggregation in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 23763–23765. [Google Scholar]
- Acar, D.A.E.; Zhao, Y.; Navarro, R.M.; Mattina, M.; Whatmough, P.N.; Saligrama, V. Federated learning based on dynamic regularization. arXiv 2021, arXiv:2111.04263. [Google Scholar]
- Andriluka, M.; Pishchulin, L.; Gehler, P.; Schiele, B. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 3686–3693. [Google Scholar]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawit, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning; Foundations and Trends® in Machine Learning: Berkeley, CA, USA, 2021; Volume 14, pp. 1–210. [Google Scholar]
- Karimireddy, S.P.; Kale, S.; Mohri, M. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
- Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient surgery for multi-task learning. Adv. Neural Inf. Process. Syst. 2020, 33, 5824–5836. [Google Scholar]
- Liu, W.; Bao, Q.; Sun, Y.; Mei, T. Recent advances of monocular 2D and 3D human pose estimation: A deep learning perspective. Acm Comput. Surv. 2022, 55, 1–41. [Google Scholar] [CrossRef]
- Toshev, A.; Szegedy, C. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1653–1660. [Google Scholar]
- Newell, A.; Yang, K.; Deng, J. Stacked hourglass networks for human pose estimation. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part VIII 14. pp. 483–499. [Google Scholar]
- Wei, S.E.; Ramakrishna, V.; Kanade, T.; Sheikh, Y. Convolutional pose machines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4724–4732. [Google Scholar]
- Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.E.; Sheikh, Y. Openpose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 43, 172–186. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Zhang, J.; Zhang, Q.; Tao, D. Vitpose: Simple vision transformer baselines for human pose estimation. Adv. Neural Inf. Process. Syst. 2022, 35, 38571–38584. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Xiao, B.; Wu, H.; Wei, Y. Simple baselines for human pose estimation and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 466–481. [Google Scholar]
- Brasó, G.; Kister, N.; Leal-Taixé, L. The center of attention: Center-keypoint grouping via attention for multi-person pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 11853–11863. [Google Scholar]
- Qi, P.; Chiaro, D.; Piccialli, F. FL-FD: Federated learning-based fall detection with multimodal data fusion. Inf. Fusion 2023, 99, 101890. [Google Scholar] [CrossRef]
- Pishchulin, L.; Andriluka, M.; Gehler, P.; Schiele, B. Strong appearance and expressive spatial models for human pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 3487–3494. [Google Scholar]
- Tompson, J.J.; Jain, A.; Lecun, Y.; Bregler, C. Joint training of a convolutional network and a graphical model for human pose estimation. In Proceedings of the NIPS’14: 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Ren, F. Distilling token-pruned pose transformer for 2D human pose estimation. arXiv 2023, arXiv:2304.05548. [Google Scholar]
Method | Avg | Hea | Sho | Elb | Wri | Hip | Kne | Ank |
---|---|---|---|---|---|---|---|---|
Pischulin et al. [29] | 44.1 | 74.3 | 49.0 | 40.8 | 34.1 | 36.5 | 34.4 | 35.2 |
Tompson et al. [30] | 79.6 | 95.8 | 90.3 | 80.5 | 74.3 | 77.6 | 69.7 | 62.8 |
Ren et al. [31] | 87.9 | 96.4 | 94.9 | 88.3 | 81.8 | 88.2 | 83.0 | 78.3 |
FedGH | 66.3 | 89.1 | 82.8 | 63.7 | 51.4 | 69.9 | 51.1 | 46.1 |
Method | t-Value | p-Value |
---|---|---|
FedGH vs. FedAvg | 3.07 | |
FedGH vs. FedBN | 6.95 | <0.001 |
FedGH vs. FedProx | 2.63 | |
FedGH vs. FedDyn | 2.73 | <0.001 |
FedGH vs. FedAW | 4.23 | <0.001 |
Method | t-Value | p-Value |
---|---|---|
FedGH vs. FedAvg | 25.44 | <0.001 |
FedGH vs. FedBN | 24.58 | <0.001 |
FedGH vs. FedProx | 37.51 | <0.001 |
FedGH vs. FedDyn | 41.49 | <0.001 |
FedGH vs. FedAW | 12.93 | <0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ni, P.; Xiang, D.; Jiang, D.; Sun, J.; Cui, J. Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination. Sensors 2025, 25, 4372. https://doi.org/10.3390/s25144372
Ni P, Xiang D, Jiang D, Sun J, Cui J. Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination. Sensors. 2025; 25(14):4372. https://doi.org/10.3390/s25144372
Chicago/Turabian StyleNi, Peng, Dan Xiang, Dawei Jiang, Jianwei Sun, and Jingxiang Cui. 2025. "Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination" Sensors 25, no. 14: 4372. https://doi.org/10.3390/s25144372
APA StyleNi, P., Xiang, D., Jiang, D., Sun, J., & Cui, J. (2025). Federated Learning for Human Pose Estimation on Non-IID Data via Gradient Coordination. Sensors, 25(14), 4372. https://doi.org/10.3390/s25144372