F-DRL: Federated Dynamics Representation Learning for Robust Multi-Task Reinforcement Learning
Abstract
1. Introduction
Contributions
- A federated dynamics-aware representation learning framework that learns shared state and action-conditioned embeddings across multiple robotic manipulation tasks without sharing raw data or policy parameters, explicitly targeting stability under task heterogeneity.
- An extension of robotics-prior-constrained representation learning through the incorporation of action-conditioned system dynamics, enabling embeddings that capture task-relevant transition structure rather than static state features.
- A similarity-weighted federated aggregation strategy based on second-order latent geometry, which selectively aligns clients according to dynamics similarity and mitigates negative transfer induced by uniform averaging.
- An extensive empirical evaluation on heterogeneous MetaWorld manipulation tasks demonstrating that the learned representations act as a stabilizing auxiliary signal for downstream model-free reinforcement learning, substantially reducing variance across random seeds while achieving performance comparable to centralized training.
2. Related Work
2.1. Deep Reinforcement Learning and Sample Inefficiency
2.2. Representation Learning for Reinforcement Learning
2.2.1. Robotics Priors and Physical Constraints
2.2.2. Dynamics Modelling via Self-Prediction
2.3. Federated Reinforcement Learning
3. Framework and Methodology
3.1. Design Rationale and Framework Overview
3.2. Federated Learning Setting and Assumptions
3.3. Overview of the Proposed Framework
3.4. Federated Representation Learning
3.4.1. Architecture of the Local Client Module
3.4.2. Similarity-Weighted Federated Aggregation
| Algorithm 1 Federated Dynamics-Aware Representation Learning |
| Require: Clients , initial encoder parameters |
|
3.5. Downstream Policy Optimization with Learned Representations
| Algorithm 2 TD3 Policy Learning with Frozen Dynamics-Aware Representations |
| Require: Frozen encoders with parameters , actor , critics |
|
3.6. Scope and Extensions
4. Experimental Evaluation and Discussion
4.1. Experimental Setup
4.2. Representation Networks
4.3. Downstream Reinforcement Learning Networks
4.4. Communication Overhead and Scalability
4.5. Baselines and Evaluation Protocol
- Centralized: Offline trajectories from all tasks are pooled to train a single state encoder and state–action encoder without privacy constraints. Training proceeds for 20 epochs per round over 100 rounds and serves as an upper bound on representation sharing.
- FedAvg: Client state encoder parameters are aggregated using equal weights for each client at each federated round following the standard FedAvg procedure. All other training components are identical to the proposed method.
- Local End-to-End Training: Representation learning and policy optimization are trained jointly for each task in an end-to-end manner, without federated aggregation.
- Raw States Only:Policies are trained directly on raw state observations without any learned representation or auxiliary embedding signals.
4.6. Downstream Reinforcement Learning Performance
4.7. Stability and Effect of Task Heterogeneity
4.7.1. Mean Seed Standard Deviation
4.7.2. Integrated Dispersion (AUC of Seed Standard Deviation)
4.8. Analysis of Similarity-Weighted Federated Aggregation
4.9. Discussion
5. Conclusions and Future Work
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef] [PubMed]
- Pitkevich, A.; Makarov, I. A Survey on Sim-to-Real Transfer Methods for Robotic Manipulation. In Proceedings of the 2024 IEEE 22nd Jubilee International Symposium on Intelligent Systems and Informatics (SISY), Pula, Croatia, 19–21 September 2024; pp. 259–266. [Google Scholar] [CrossRef]
- Bengio, Y.; Courville, A.; Vincent, P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef]
- Botteghi, N.; Poel, M.; Brune, C. Unsupervised Representation Learning in Deep Reinforcement Learning: A Review. IEEE Control Syst. 2025, 45, 26–68. [Google Scholar] [CrossRef]
- Echchahed, A.; Castro, P.S. A Survey of State Representation Learning for Deep Reinforcement Learning. Trans. Mach. Learn. Res. 2025. Available online: https://openreview.net/forum?id=gOk34vUHtz (accessed on 25 February 2026).
- Jonschkowski, R.; Brock, O. Learning state representations with robotic priors. Auton. Robot. 2015, 39, 407–428. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; de Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 20–22 June 2016; pp. 1995–2003. [Google Scholar]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016), Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. arXiv 2016, arXiv:1511.05952. [Google Scholar] [CrossRef]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 7–9 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-Dimensional Continuous Control Using Generalized Advantage Estimation. arXiv 2018, arXiv:1506.02438. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2019, arXiv:1509.02971. [Google Scholar] [PubMed]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor–Critic Methods. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Jonschkowski, R.; Brock, O. State Representation Learning in Robotics: Using Prior Knowledge about Physical Interaction. In Proceedings of the Robotics: Science and Systems, Berkeley, CA, USA, 12–16 July 2014. [Google Scholar] [CrossRef]
- Jonschkowski, R.; Hafner, R.; Scholz, J.; Riedmiller, M. Pves: Position-velocity encoders for unsupervised learning of structured state representations. arXiv 2017, arXiv:1705.09805. [Google Scholar] [CrossRef]
- Raffin, A.; Höfer, S.; Jonschkowski, R.; Brock, O.; Stulp, F. Unsupervised Learning of State Representations for Multiple Tasks. 2017. Available online: https://openreview.net/forum?id=r1aGWUqgg (accessed on 25 February 2026).
- Lesort, T.; Seurin, M.; Li, X.; Díaz-Rodríguez, N.; Filliat, D. Deep Unsupervised State Representation Learning with Robotic Priors: A Robustness Analysis. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2019), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Morik, M.; Rastogi, D.; Jonschkowski, R.; Brock, O. State Representation Learning with Robotic Priors for Partially Observable Environments. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), Macau, China, 3–8 November 2019; pp. 6693–6699. [Google Scholar] [CrossRef]
- Botteghi, N.; Obbink, R.; Geijs, D.; Poel, M.; Sirmacek, B.; Brune, C.; Mersha, A.; Stramigioli, S. Low Dimensional State Representation Learning with Reward-Shaped Priors. In Proceedings of the International Conference on Pattern Recognition (ICPR 2020), Milan, Italy, 10–15 January 2021; pp. 3736–3743. [Google Scholar]
- Botteghi, N.; Alaa, K.; Poel, M.; Sirmacek, B.; Brune, C.; Mersha, A.; Stramigioli, S. Low Dimensional State Representation Learning with Robotics Priors in Continuous Action Spaces. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 190–197. [Google Scholar] [CrossRef]
- Munk, J.; Kober, J.; Babuška, R. Learning state representation for deep actor-critic control. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 4667–4673. [Google Scholar] [CrossRef]
- Zhang, A.; Satija, H.; Pineau, J. Decoupling Dynamics and Reward for Transfer Learning. 2018. Available online: https://openreview.net/forum?id=H1aoddyvM (accessed on 25 February 2026).
- Gelada, C.; Kumar, S.; Buckman, J.; Nachum, O.; Bellemare, M.G. DeepMDP: Learning Continuous Latent Space Models for Representation Learning. In Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, USA, 9–15 June 2019; pp. 2170–2179. [Google Scholar]
- Fujimoto, S.; Chang, W.D.; Smith, E.J.; Gu, S.S.; Precup, D.; Meger, D. For SALE: State-action representation learning for deep reinforcement learning. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. [Google Scholar]
- Fujimoto, S.; D’Oro, P.; Zhang, A.; Tian, Y.; Rabbat, M. Towards General-Purpose Model-Free Reinforcement Learning. In Proceedings of the The Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- van Hoof, H.; Chen, N.; Karl, M.; van der Smagt, P.; Peters, J. Stable reinforcement learning with autoencoders for tactile and visual data. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; pp. 3928–3934. [Google Scholar] [CrossRef]
- Lee, A.X.; Nagabandi, A.; Abbeel, P.; Levine, S. Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Adv. Neural Inf. Process. Syst. 2020, 33, 741–752. [Google Scholar]
- Zintgraf, L.; Schulze, S.; Lu, C.; Feng, L.; Igl, M.; Shiarlis, K.; Gal, Y.; Hofmann, K.; Whiteson, S. VariBAD: Variational bayes-adaptive deep RL via meta-learning. J. Mach. Learn. Res. 2021, 22, 1–39. [Google Scholar]
- Finn, C.; Tan, X.Y.; Duan, Y.; Darrell, T.; Levine, S.; Abbeel, P. Deep Spatial Autoencoders for Visuomotor Learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA 2016), Stockholm, Sweden, 16–21 May 2016; pp. 512–519. [Google Scholar]
- Schrittwieser, J.; Antonoglou, I.; Hubert, T.; Simonyan, K.; Sifre, L.; Schmitt, S.; Guez, A.; Lockhart, E.; Hassabis, D.; Graepel, T.; et al. Mastering atari, go, chess and shogi by planning with a learned model. Nature 2020, 588, 604–609. [Google Scholar] [CrossRef] [PubMed]
- Schrittwieser, J.; Hubert, T.; Mandhane, A.; Barekatain, M.; Antonoglou, I.; Silver, D. Online and offline reinforcement learning by planning with a learned model. Adv. Neural Inf. Process. Syst. 2021, 34, 27580–27591. [Google Scholar]
- Ye, W.; Liu, S.; Kurutach, T.; Abbeel, P.; Gao, Y. Mastering atari games with limited data. Adv. Neural Inf. Process. Syst. 2021, 34, 25476–25488. [Google Scholar]
- Hansen, N.; Su, H.; Wang, X. Td-mpc2: Scalable, robust world models for continuous control. arXiv 2023, arXiv:2310.16828. [Google Scholar]
- Watter, M.; Springenberg, J.T.; Boedecker, J.; Riedmiller, M. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images. In Proceedings of the Neural Information Processing Systems (NeurIPS 2015), Montreal, QC, Canada, 7–12 December 2015; pp. 2746–2754. [Google Scholar]
- Karl, M.; Soelch, M.; Bayer, J.; Van der Smagt, P. Deep variational bayes filters: Unsupervised learning of state space models from raw data. arXiv 2016, arXiv:1605.06432. [Google Scholar]
- Hafner, D.; Pasukonis, J.; Ba, J.; Lillicrap, T.P. Mastering Diverse Domains through World Models. arXiv 2023, arXiv:2301.04104. [Google Scholar]
- Qi, J.; Zhou, Q.; Lei, L.; Zheng, K. Federated reinforcement learning: Techniques, applications, and open challenges. Intell. Robot. 2021, 1, 18–57. [Google Scholar] [CrossRef]
- Jin, H.; Peng, Y.; Yang, W.; Wang, S.; Zhang, Z. Federated Reinforcement Learning with Environment Heterogeneity. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2022), Valencia, Spain, 28–30 March 2022; pp. 18–37. [Google Scholar]
- Tehrani, P.; Restuccia, F.; Levorato, M. Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks. In Proceedings of the 2021 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN), Online, 13–15 December 2021; pp. 248–253. [Google Scholar]
- Wong, Y.J.; Tham, M.L.; Kwan, B.H.; Owada, Y. FedDdrl: Federated Double Deep Reinforcement Learning for Heterogeneous IoT with Adaptive Early Client Termination and Local Epoch Adjustment. Sensors 2023, 23, 2494. [Google Scholar] [CrossRef]
- Rengarajan, D.; Ragothaman, N.; Kalathil, D.; Shakkottai, S. Federated Ensemble-Directed Offline Reinforcement Learning. 2024. Available online: https://openreview.net/forum?id=XdSYtriYfI (accessed on 25 February 2026).
- Woo, J.; Shi, L.; Joshi, G.; Chi, Y. Federated offline reinforcement learning: Collaborative single-policy coverage suffices. arXiv 2024, arXiv:2402.05876. [Google Scholar] [CrossRef]
- Zhao, H.; Li, Y.; Pang, Z.; Ma, Z. Federated Multi-Agent DRL for Task Offloading in Vehicular Edge Computing. Electronics 2025, 14, 3501. [Google Scholar] [CrossRef]
- Na, S.; Rouček, T.; Ulrich, J.; Pikman, J.; Krajník, T.; Lennox, B.; Arvin, F. Federated Reinforcement Learning for Collective Navigation of Robotic Swarms. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 2122–2131. [Google Scholar] [CrossRef]
- Yuan, Z.; Xu, S.; Zhu, M. Federated reinforcement learning for robot motion planning with zero-shot generalization. Automatica 2024, 166, 111709. [Google Scholar] [CrossRef]
- Liu, B.; Wang, L.; Liu, M. Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems. IEEE Robot. Autom. Lett. 2019, 4, 4555–4562. [Google Scholar] [CrossRef]
- An, X.; Lin, Y.; Lin, M.; Wu, C.; Murase, T.; Ji, Y. Federated Reinforcement Learning Framework for Mobile Robot Navigation Using ROS and Gazebo. IEEE Internet Things Mag. 2025, 8, 45–51. [Google Scholar] [CrossRef]
- Lu, S.; Cai, Y.; Liu, Z.; Lian, Y.; Chen, L.; Wang, H. A Preference-Based Multi-Agent Federated Reinforcement Learning Algorithm Framework for Trustworthy Interactive Urban Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2025, 26, 10131–10145. [Google Scholar] [CrossRef]
- Fu, Y.; Li, C.; Yu, F.R.; Luan, T.H.; Zhang, Y. A Selective Federated Reinforcement Learning Strategy for Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2023, 24, 1655–1668. [Google Scholar] [CrossRef]
- Hafid, A.; Hocine, R.; Guezouli, L.; Abdessemed, M.R. Centralized and Decentralized Federated Learning in Autonomous Swarm Robots: Approaches, Algorithms, Optimization Criteria and Challenges: The Sixth Edition of International Conference on Pattern Analysis and Intelligent Systems (PAIS’24). In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS), El Oued, Algeria, 24–25 April 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Gutierrez, G.M.; Rincon, J.A.; Julian, V. Federated Learning for Collaborative Robotics: A ROS 2-Based Approach. Electronics 2025, 14, 1323. [Google Scholar] [CrossRef]
- Kong, X.; Peng, L.; Rahim, S. Dynamic Service Function Chain (SFC) Deployment for Autonomous Intelligent Systems. IEEE Trans. Consum. Electron. 2025, 71, 10776–10785. [Google Scholar] [CrossRef]
- Wang, Y.; Zhong, S.; Yuan, T. Grasp Control Method for Robotic Manipulator Based on Federated Reinforcement Learning. In Proceedings of the 2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 1–3 March 2024; pp. 1513–1519. [Google Scholar] [CrossRef]
- Yue, S.; Hua, X.; Deng, Y.; Chen, L.; Ren, J.; Zhang, Y. Momentum-Based Contextual Federated Reinforcement Learning. IEEE Trans. Netw. 2025, 33, 865–880. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- McLean, R.; Chatzaroulas, E.; McCutcheon, L.; Röder, F.; Yu, T.; He, Z.; Zentner, K.; Julian, R.; Terry, J.K.; Woungang, I.; et al. Meta-World+: An Improved, Standardized, RL Benchmark. In Proceedings of the The Thirty-Ninth Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track, San Diego, CA, USA, 2–7 December 2025. [Google Scholar]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Nitin Bhagoji, A.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and Open Problems in Federated Learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2017), Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Bardes, A.; Ponce, J.; LeCun, Y. VICReg: Variance–Invariance–Covariance Regularization for Self-Supervised Learning. In Proceedings of the International Conference on Learning Representations (ICLR 2022). Available online: https://openreview.net/forum?id=xm6YD62D1Ub (accessed on 25 February 2026).
- Schweighofer, K.; Dinu, M.-C.; Radler, A.; Hofmarcher, M.; Patil, V.P.; Bitto-Nemling, A.; Eghbal-Zadeh, H.; Hochreiter, S. A Dataset Perspective on Offline Reinforcement Learning. In Proceedings of the Conference on Lifelong Learning Agents (CoLLAs 2022), Montreal, QC, Canada, 22–24 August 2022; pp. 470–517. [Google Scholar]
- Belkhale, S.; Cui, Y.; Sadigh, D. Data quality in imitation learning. Adv. Neural Inf. Process. Syst. 2023, 36, 80375–80395. [Google Scholar]





| Aspect | Existing FRL | F-DRL (Ours) |
|---|---|---|
| Federated object | Policy/value networks | Representation encoder |
| Shared signal | Control parameters | Latent dynamics embeddings |
| Aggregation strategy | Uniform/proximal averaging | Similarity-weighted aggregation |
| Sensitivity to task heterogeneity | High | Reduced |
| Training stability focus | Implicit | Explicit |
| Local control optimisation | Federated | Fully local |
| Privacy exposure | Policy gradients | Latent summaries only |
| Extensibility | Limited | Modular |
| Task | Method | Final Success (%) | Mean Seed Std ↓ | AUC of Seed Std ↓ |
|---|---|---|---|---|
| Drawer-Open | Centralized | 100.00 | 7.57 | 211.83 |
| FedAvg | 99.79 | 20.11 | 562.94 | |
| Local | 79.43 | 18.93 | 530.10 | |
| Ours | 98.65 | 7.72 | 216.12 | |
| Window-Open | Centralized | 100.00 | 10.11 | 283.04 |
| FedAvg | 100.00 | 19.06 | 533.67 | |
| Local | 100.00 | 3.97 | 111.10 | |
| Ours | 100.00 | 6.49 | 181.79 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Upadhyay, A.; Lu, X.; Baradaranshokouhi, Y.; Li, J.; Jing, Y. F-DRL: Federated Dynamics Representation Learning for Robust Multi-Task Reinforcement Learning. Information 2026, 17, 232. https://doi.org/10.3390/info17030232
Upadhyay A, Lu X, Baradaranshokouhi Y, Li J, Jing Y. F-DRL: Federated Dynamics Representation Learning for Robust Multi-Task Reinforcement Learning. Information. 2026; 17(3):232. https://doi.org/10.3390/info17030232
Chicago/Turabian StyleUpadhyay, Anurag, Xin Lu, Yashar Baradaranshokouhi, Jun Li, and Yanguo Jing. 2026. "F-DRL: Federated Dynamics Representation Learning for Robust Multi-Task Reinforcement Learning" Information 17, no. 3: 232. https://doi.org/10.3390/info17030232
APA StyleUpadhyay, A., Lu, X., Baradaranshokouhi, Y., Li, J., & Jing, Y. (2026). F-DRL: Federated Dynamics Representation Learning for Robust Multi-Task Reinforcement Learning. Information, 17(3), 232. https://doi.org/10.3390/info17030232

