A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies
Abstract
1. Introduction
2. Related Work
3. Process of Digital Twin Creation
- Requirement Definition. The Production Company specifies the robot’s tasks, constraints, and performance targets. These requirements include robot dynamics, environment layouts, operational data, task objectives, and communication interfaces (As an example see: https://github.com/aalgirdas/roboGen-LLM/tree/main/pdf_sources (accessed on 26 October 2025)).
- Model Construction. The Simulation Engineer uses this information to build both the robot model and its environment inside the simulator. The result is a high-fidelity digital twin that reflects the robot’s geometry, sensors, actuators, and workspace.
- Simulation and Analysis. The Production Company can then run simulations to evaluate layouts, identify bottlenecks, and collect performance data without disrupting physical production.
- Training and Optimization. The RL Engineer defines reward functions and algorithms, and trains policies directly in the digital twin. Performance metrics are fed back to the company to assess readiness.
- Deployment. Once trained, the robot model can be transferred to the physical robot, with optional real-time synchronization to refine behavior after deployment.
4. Design Patterns for Dual-Twin Reinforcement Learning
4.1. Pattern 1: The Dual Digital Twin for Accelerated Training and Synchronization
- WebotsRobot: A high-fidelity twin representing the robot in a visually and physically realistic environment like Webots, ideal for validation and fine-tuning.
- MuJoCoRobot: A computationally efficient twin optimized for rapid simulation, allowing for massively parallel training on GPU-accelerated hardware.
4.2. Pattern 2: Supervised Data Collection for Learned Perception
- It provides the RLModel with ground-truth information about the environment (e.g., object positions, velocities) that is inaccessible to the SlaveRobot’s sensors.
- It captures video footage from its camera, time-stamping each frame and pairing it with the corresponding ground-truth data from the Supervisor.
4.3. Pattern 3: Supervisor-Free Deployment with an AI-Enhanced Robot Server
4.4. Automating the Workflow with Generative AI
- The Second Design Pattern: Including the camera setup, data logging, and synchronization logic.
- The Third Design Pattern: Involving the creation of the RobotServer, API endpoints, and the integration of the Supervised Learning Model.
5. Physics-Based Alignment and Validation Framework
5.1. Framework Architecture and Workflow
- Global Parameters: Simulation settings like timestep and failure conditions that are common across all tests.
- Individual Scenarios: A list of discrete tests, each with a specific name, duration, and a set of initial conditions (e.g., initial_angle for the pole) and prescribed actions (e.g., wheel_velocity for the cart). This structured approach allows engineers—or a generative AI—to design targeted tests that probe specific aspects of the robot’s dynamics, such as step responses, impulse disturbances, or stability under constant actuation.
- TestRobot script is the robot’s main controller. It reads the test_scenarios.json file, applies the prescribed actuator commands (e.g., setting wheel velocity), and logs data from its on-board sensors, such as the pole’s position sensor.
- TestOrchestrator script runs as a Webots Supervisor, a “privileged” entity that can observe and manipulate the entire simulation. Its role is to provide “ground truth” data that the robot cannot sense itself, such as its absolute position in the world. It also reads the scenario file to set the initial state of the environment (e.g., setting the pole’s starting angle) and logs its observations in a separate data file.
5.2. Component Design and Process Flow
- ScenarioConfiguration: This class loads the unified test definitions from the JSON file. It holds the global_settings and a list of individual Scenario objects, each detailing conditions and actions for a specific test.
- DataLogger: A utility class used by the controllers to record time-stamped data to a log file during simulation.
- TestRobot: This class acts as the robot’s main controller. It reads its position_sensor and applies prescribed actions by controlling its wheels (e.g., via set_robot_speed).
- TestOrchestrator: This class represents a privileged “supervisor” entity. It has access to the entire simulation (robot_node, pole_node) and is responsible for setting the environment’s set_initial_state to match the scenario’s conditions.
- Quantifying Divergence: The agent computes statistical measures of divergence, such as the Mean Squared Error (MSE) or Dynamic Time Warping (DTW) distance, between the time-series data from the two simulators.
- Identifying Root Causes: By correlating divergence with specific test scenarios, the agent can hypothesize the cause of the mismatch (e.g., “divergence is highest in high-velocity tests, suggesting a discrepancy in friction coefficients”).
- Suggesting Corrections: The agent proposes specific changes to the robot’s model files (e.g., URDF or Webots .proto files), such as adjusting mass, inertia, joint damping, or calculating proportional coefficients to scale actuator signals.
6. Results and Discussion
Preliminary Results on LLM-Based Model Generation
- Contextual grounding is crucial. The availability of valid MuJoCo syntax examples was decisive for the success of the CartPole case, confirming that carefully selected context substantially enhances reliability.
- Iterative refinement is essential. Multiple feedback cycles were necessary to achieve convergence, supporting the principle that LLM-guided model synthesis benefits from structured, iterative alignment rather than one-shot generation.
- Complexity scaling remains a challenge. For robots with multiple degrees of freedom or compound geometries, the LLM’s reasoning accuracy decreases when limited to textual cues. Nevertheless, the generation of syntactically valid models provides a valuable initialization for downstream physics-based optimization.
7. Conclusions
- Automated Model Generation: We demonstrated that LLMs can automate the creation of a secondary digital twin, successfully translating a Webots CartPole model into a functional MuJoCo equivalent through an iterative, guided process.
- Workflow Acceleration: We proposed a GAI-assisted workflow where an AI agent, given a simple implementation of our first design pattern, can automatically generate the more complex code for data collection and supervisor-free deployment patterns.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ayala, A.; Cruz, F.; Campos, D.; Rubio, R.; Fernandes, B.; Dazeley, R. A comparison of humanoid robot simulators: A quantitative approach. In Proceedings of the Joint IEEE 10th International Conference on Development and and Learning and Epigenetic Robotics (ICDL-EpiRob), Valparaiso, Chile, 26–30 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Yadav, P.; Mishra, A.; Kim, S. A comprehensive survey on multi-agent reinforcement learning for connected and automated vehicles. Sensors 2023, 23, 4710. [Google Scholar] [CrossRef] [PubMed]
- Sivamayil, K.; Rajasekar, E.; Aljafari, B.; Nikolovski, S.; Vairavasundaram, S.; Vairavasundaram, I. A systematic study on reinforcement learning based applications. Energies 2023, 16, 1512. [Google Scholar] [CrossRef]
- Qian, C.; Ren, H. Deep reinforcement learning in surgical robotics: Enhancing the automation level. In Handbook of Robotic Surgery; Academic Press: Cambridge, MA, USA, 2025; pp. 89–102. [Google Scholar]
- Liu, W.; Wu, M.; Wan, G.; Xu, M. Digital twin of space environment: Development, challenges, applications, and future outlook. Remote Sens. 2024, 16, 3023. [Google Scholar] [CrossRef]
- Laukaitis, A.; Šareiko, A.; Mažeika, D. Facilitating Robot Learning in Virtual Environments: A Deep Reinforcement Learning Framework. Appl. Sci. 2025, 15, 5016. [Google Scholar] [CrossRef]
- Todorov, E.; Erez, T.; Tassa, Y. Mujoco: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 5026–5033. [Google Scholar]
- Michel, O.; Cyberbotics Ltd. Webots: Professional mobile robot simulation. Int. J. Adv. Robot. Syst. 2004, 1, 5. [Google Scholar]
- Šareiko, A.; Mažeika, D.; Laukaitis, A. Framework for deep reinforcement learning in Webots virtual environments. New Trends Comput. Sci. 2025, 3, 49–63. [Google Scholar] [CrossRef]
- Chen, S.; Lopes, P.V.; Marti, S.; Rajashekarappa, M.; Bandaru, S.; Windmark, C.; Skoogh, A. Enhancing Digital Twins with Deep Reinforcement Learning: A Use Case in Maintenance Prioritization. In Proceedings of the 2024 Winter Simulation Conference (WSC), Orlando, FL, USA, 15–18 December 2024; pp. 1611–1622. [Google Scholar]
- Mazumder, A.; Sahed, M.F.; Tasneem, Z.; Das, P.; Badal, F.R.; Ali, M.F.; Islam, M.R. Towards next generation digital twin in robotics: Trends, scopes, challenges, and future. Heliyon 2023, 9, e13359. [Google Scholar] [CrossRef]
- Mu, Y.; Chen, T.; Chen, Z.; Peng, S.; Lan, Z.; Gao, Z.; Liang, Z.; Yu, Q.; Zou, Y.; Xu, M.; et al. Robotwin: Dual-arm robot benchmark with generative digital twins. In Proceedings of the Computer Vision and Pattern Recognition Conference, Nashville, TN, USA, 1–15 June 2025; pp. 27649–27660. [Google Scholar]
- Matulis, M.; Harvey, C. A robot arm digital twin utilising reinforcement learning. Comput. Graph. 2021, 95, 106–114. [Google Scholar] [CrossRef]
- Malik, A.A.; Brem, A. Digital twins for collaborative robots: A case study in human-robot interaction. Robot. Comput.-Integr. Manuf. 2021, 68, 102092. [Google Scholar] [CrossRef]
- Lei, Z.; Zhou, Z.; Yin, S.; Chen, Y.; Xu, Q.; Li, W.; Wang, Y.; Tang, B.; Jing, W.; Chen, S. PolySim: Bridging the Sim-to-Real Gap for Humanoid Control via Multi-Simulator Dynamics Randomization. arXiv 2025, arXiv:2510.01708. [Google Scholar]
- Memmel, M.; Wagenmaker, A.; Zhu, C.; Yin, P.; Fox, D.; Gupta, A. Asid: Active exploration for system identification in robotic manipulation. arXiv 2024, arXiv:2404.12308. [Google Scholar] [CrossRef]
- Gu, X.; Wang, Y.J.; Chen, J. Humanoid-gym: Reinforcement learning for humanoid robot with zero-shot sim2real transfer. arXiv 2024, arXiv:2404.05695. [Google Scholar]
- Korada, L. Role of generative AI in the digital twin landscape and how it accelerates adoption. J. Artif. Intell. Mach. Learn. Data Sci. 2024, 2, 902–906. [Google Scholar] [CrossRef] [PubMed]
- Mikołajewska, E.; Mikołajewski, D.; Mikołajczyk, T.; Paczkowski, T. Generative AI in AI-based digital twins for fault diagnosis for predictive maintenance in Industry 4.0/5.0. Appl. Sci. 2025, 15, 3166. [Google Scholar] [CrossRef]
- Hajdu, C.; Hegyi, N. Modeling Kinematic and Dynamic Structures with Hypergraph-Based Formalism. Appl. Mech. 2025, 6, 74. [Google Scholar] [CrossRef]
- Holt, S.; Luyten, M.R.; Berthon, A.; van der Schaar, M. G-sim: Generative simulations with large language models and gradient-free calibration. arXiv 2025, arXiv:2506.09272. [Google Scholar]
- Lin, Y.Z.; Shi, Q.; Yang, Z.; Latibari, B.S.; Satam, S.; Shao, S.; Salehi, S.; Satam, P. Ddd-gendt: Dynamic data-driven generative digital twin framework. IEEE Trans. Artif. Intell. 2025; Early Access. [Google Scholar] [CrossRef]
- Liu, W.; Fu, Y.; Wang, Y.G.F.L.; Sun, W.; Zhang, Y. Two-timescale synchronization and migration for digital twin networks: A multi-agent deep reinforcement learning approach. IEEE Trans. Wirel. Commun. 2024, 23, 17294–17309. [Google Scholar] [CrossRef]
- Yang, L.; Luo, S.; Cheng, X.; Yu, L. Leveraging Large Language Models for Enhanced Digital Twin Modeling: Trends, Methods, and Challenges. arXiv 2025, arXiv:2503.02167. [Google Scholar] [CrossRef]
- Deng, M.; Fu, B.; Li, L.; Wang, X. Integrating LLMs and Digital Twins for Adaptive Multi-Robot Task Allocation in Construction. arXiv 2025, arXiv:2506.18178. [Google Scholar] [CrossRef]
- Ravik, O.E. Integrating Large Language Models with Digital Twins for Autonomous Control. Master’s Thesis, Norwegian University of Science and Technology, Trondheim, Norway, 2025. [Google Scholar]
- Li, N.; Ma, Z.; Yu, R.; Li, L. LSDTs: LLM-Augmented Semantic Digital Twins for Adaptive Knowledge-Intensive Infrastructure Planning. arXiv 2025, arXiv:2508.06799. [Google Scholar]











| Episode | MuJoCo Reward | Webots (Transferred, Before Alignment) | Webots (After Alignment) |
|---|---|---|---|
| 1 | 1000 | 673 | 914 |
| 2 | 1000 | 519 | 877 |
| 3 | 993 | 462 | 856 |
| 4 | 1000 | 593 | 848 |
| 5 | 987 | 577 | 879 |
| 6 | 1010 | 632 | 845 |
| 7 | 963 | 560 | 873 |
| 8 | 971 | 631 | 921 |
| 9 | 1000 | 472 | 866 |
| 10 | 995 | 495 | 818 |
| Robot | Manual Modeling Time (h) | LLM-Assisted Modeling Time (h) | Reduction |
|---|---|---|---|
| CartPole | 3.4 | 1.2 | 0.647 |
| Pioneer 3-AT | 7.4 | 2.9 | 0.608 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Laukaitis, A.; Šareiko, A.; Mažeika, D. A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies. Electronics 2025, 14, 4806. https://doi.org/10.3390/electronics14244806
Laukaitis A, Šareiko A, Mažeika D. A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies. Electronics. 2025; 14(24):4806. https://doi.org/10.3390/electronics14244806
Chicago/Turabian StyleLaukaitis, Algirdas, Andrej Šareiko, and Dalius Mažeika. 2025. "A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies" Electronics 14, no. 24: 4806. https://doi.org/10.3390/electronics14244806
APA StyleLaukaitis, A., Šareiko, A., & Mažeika, D. (2025). A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies. Electronics, 14(24), 4806. https://doi.org/10.3390/electronics14244806

