Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite

Open AccessArticle

Peer-Review Record

A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies

Electronics 2025, 14(24), 4806; https://doi.org/10.3390/electronics14244806 (registering DOI)

by Algirdas Laukaitis^*

, Andrej Šareiko and Dalius Mažeika

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2025, 14(24), 4806; https://doi.org/10.3390/electronics14244806 (registering DOI)

Submission received: 26 October 2025 / Revised: 4 December 2025 / Accepted: 5 December 2025 / Published: 6 December 2025

(This article belongs to the Special Issue Generative AI and Its Transformative Potential, 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes an innovative dual digital twin framework for training and validating reinforcement learning policies between the two simulation environments of Webots and MuJoCo, introducing generative AI to assist with model generation and alignment. The research problem is of practical significance, the method design is systematic and scalable, and the experimental section provides preliminary validation. It is recommended for acceptance after minor revisions, pending the resolution of the following key issues:

Regarding the Generalizability of Reinforcement Learning Experiments and Support for Conclusions: The finial set of experiments in Section 6 demonstrates the impact of different simulators (Webots vs. MuJoCo) and action spaces (discrete vs. continuous) on training performance only on the single benchmark task of CartPole. While these results qualitatively reveal the existence of a "sim-to-sim gap", they are insufficient to demonstrate the effectiveness and potential of the proposed dual digital twin framework. These experiments merely involve independent training in the two environments, while the core value of the framework – effectively transferring policies trained in MuJoCo to Webots for validation or deployment – remains unverified. It is recommended to supplement the experiments with cross-simulator policy transfer tests and to repeat these experiments on at least one other, more complex robot model (e.g., the Pioneer 3-AT) to demonstrate the framework's generalizability.
Regarding Quantitative Evidence for Reduced Engineering Overhead via Generative AI: The abstract claims that generative AI significantly reduces engineering overhead, but the experimental evidence in Section 6.1 does not align with this claim. Firstly, the CartPole model required 14 iterations, and the Pioneer 3-AT model required 5 iterations plus manual modification, a process itself involving considerable engineering effort. Secondly, the paper lacks any quantitative comparison (e.g., against the time/effort required for fully manual modeling) to substantiate significant reduction. It is recommended that the authors provide specific, quantifiable data to support this core value proposition, otherwise, the corresponding conclusions should be tempered.
Regarding the Core Missing Element of Simulator Alignment and Policy Transfer: While Section 5 proposes a systematic physics-based alignment framework, the results in Section 6 only demonstrate the necessity for alignment (i.e., by identifying discrepancies), but not its effectiveness. The manuscript provides no data or case study proving the successful alignment of the two simulators' behaviors through the proposed framework, nor does it demonstrate that a policy trained in MuJoCo can run effectively in Webots after alignment. This constitutes one of the most serious shortcomings of the current manuscript. The authors must supplement the manuscript with the results of transfer tests applied after using the alignment method, providing quantitative metrics of behavioral differences before and after alignment to verify the framework's core functionality.
Regarding Figure Quality and Standardization: The clarity of Figures 7 and 10 is insufficient, hindering readability. Particularly for the three subplots in Figure 10, the titles and data labels are identical, making it impossible for the reader to clearly distinguish which experimental configuration each corresponds to. The authors are advised to provide high-resolution images and add clear, specific subtitles to each subplot in Figure 10.

This paper conceptualizes an intriguing and promising research direction but falls short in the areas of execution and value demonstration. The current experiments are closer to a preliminary feasibility study rather than a complete validation of a mature and effective framework. If the authors can successfully address the issues above, particularly by supplementing the crucial policy transfer and alignment effectiveness experiments, the contribution and impact of this paper would be substantially enhanced.

Comments for author File: Comments.pdf

Author Response

Regarding the Generalizability of Reinforcement Learning Experiments and Support for Conclusions: The finial set of experiments in Section 6 demonstrates the impact of different simulators (Webots vs. MuJoCo) and action spaces (discrete vs. continuous) on training performance only on the single benchmark task of CartPole. While these results qualitatively reveal the existence of a "sim-to-sim gap", they are insufficient to demonstrate the effectiveness and potential of the proposed dual digital twin framework. These experiments merely involve independent training in the two environments, while the core value of the framework – effectively transferring policies trained in MuJoCo to Webots for validation or deployment – remains unverified. It is recommended to supplement the experiments with cross-simulator policy transfer tests and to repeat these experiments on at least one other, more complex robot model (e.g., the Pioneer 3-AT) to demonstrate the framework's generalizability.

Thank you for this important comment. We have revised the manuscript to include a dedicated cross-simulator policy transfer experiment, as you suggested. In the new subsection (Section 6), we evaluate a policy trained in MuJoCo by executing it directly in the Webots CartPole environment without any retraining.

We report 10-episode results in the new Table 1. The MuJoCo-trained policy obtains a mean reward of 991.9 ± 14.5 in MuJoCo. When transferred directly to Webots, its initial performance is 561.4 ± 72.7. After applying our physics-based alignment procedure (Section 5), including adjustments to joint damping and friction parameters identified through divergence analysis, the transferred policy's Webots performance increases to 869.7 ± 31.1. This represents a 55% improvement compared to the unaligned transfer.

These results demonstrate that the proposed dual digital-twin framework enables practical cross-simulator transfer and that the alignment loop meaningfully reduces the sim-to-sim gap. We also discuss the remaining differences between simulators (e.g., contact handling and low-velocity numerical damping) to provide context for the remaining performance gap. We believe these additions fully address the reviewer’s request for empirical evidence of effective policy transfer between MuJoCo and Webots.

Additionally we appreciate reviewer’s recommendation to validate framework on a second, more complex robot. In the revised manuscript, we added a new experiment based on the Pioneer 3-AT mobile robot . To keep the focus on simulator alignment rather than controller design, we used a simple forward-motion test where both simulators executed an identical constant wheel-velocity command for 3 s. Prior to alignment, the forward-displacement difference between MuJoCo and Webots was 8.7%, with noticeable variation in lateral drift. After applying friction and wheel-geometry adjustments suggested by our divergence analysis, the displacement difference decreased to 3.2% and lateral drift was substantially reduced. This experiment demonstrates that the proposed alignment methodology generalizes beyond the CartPole benchmark and remains effective for more complex robot models, as requested.

----------------------

Regarding Quantitative Evidence for Reduced Engineering Overhead via Generative AI: The abstract claims that generative AI significantly reduces engineering overhead, but the experimental evidence in Section 6.1 does not align with this claim. Firstly, the CartPole model required 14 iterations, and the Pioneer 3-AT model required 5 iterations plus manual modification, a process itself involving considerable engineering effort. Secondly, the paper lacks any quantitative comparison (e.g., against the time/effort required for fully manual modeling) to substantiate significant reduction. It is recommended that the authors provide specific, quantifiable data to support this core value proposition, otherwise, the corresponding conclusions should be tempered.

Thank you for pointing out the need for quantitative support of the generative-AI efficiency claims. In the revised manuscript, we added a new part presenting empirical data from a small student-based evaluation. Seven students constructed two robot models—CartPole and Pioneer 3-AT—first manually and then using our LLM-assisted workflow. Table 2 summarizes the average time expenditure. Manual modeling required 3.4 hours for CartPole and 7.4 hours for Pioneer 3-AT, while the LLM-assisted workflow reduced this to 1.2 hours and 2.9 hours, respectively, corresponding to a 60–65% reduction in engineering effort. These results provide concrete quantitative evidence that generative AI meaningfully reduces modeling overhead, as suggested in the abstract.

---------------------

Regarding the Core Missing Element of Simulator Alignment and Policy Transfer: While Section 5 proposes a systematic physics-based alignment framework, the results in Section 6 only demonstrate the necessity for alignment (i.e., by identifying discrepancies), but not its effectiveness. The manuscript provides no data or case study proving the successful alignment of the two simulators' behaviors through the proposed framework, nor does it demonstrate that a policy trained in MuJoCo can run effectively in Webots after alignment. This constitutes one of the most serious shortcomings of the current manuscript. The authors must supplement the manuscript with the results of transfer tests applied after using the alignment method, providing quantitative metrics of behavioral differences before and after alignment to verify the framework's core functionality.

Thank you for highlighting the need to demonstrate the effectiveness of the proposed alignment methodology. In the revised manuscript, we added a quantitative before–after alignment study based on the CartPole free-fall scenario.

Before alignment, the two simulators exhibited clear discrepancies, including a 0.54 s difference in fall time and substantial trajectory divergence (DTW = 0.184, MSE = 0.029). Using the divergence analysis described in Section 5, we adjusted two parameters in the MuJoCo model (joint damping and cart friction). After alignment, the MuJoCo trajectory became significantly closer to Webots, reducing DTW to 0.061 and MSE to 0.009—an improvement of approximately 67–69%.

These results provide the requested quantitative evidence that the alignment loop materially reduces the sim-to-sim gap and improves consistency between the two digital twins.

In addition, our experiments revealed an important insight regarding the complementary roles of the two simulators. MuJoCo’s physics engine consistently produced more internally consistent joint, contact, and constraint dynamics, which enabled us to treat MuJoCo as a higher-fidelity reference when evaluating deviations in the Webots implementation. This relationship emerged naturally during alignment: adjustments derived from MuJoCo’s behavior not only reduced sim-to-sim divergence but also highlighted modeling inaccuracies that would likely influence real-world transfer. We have added a corresponding discussion in the manuscript to clarify this observation and to emphasize the value of MuJoCo as a model-verification tool for Webots-based robotic workflows.

---------------------

Regarding Figure Quality and Standardization: The clarity of Figures 7 and 10 is insufficient, hindering readability. Particularly for the three subplots in Figure 10, the titles and data labels are identical, making it impossible for the reader to clearly distinguish which experimental configuration each corresponds to. The authors are advised to provide high-resolution images and add clear, specific subtitles to each subplot in Figure 10.

Thank you for noting the issues regarding the readability of Figures 7 and 10. In the revised manuscript, we have replaced both figures with high-resolution versions exported at 300 DPI and improved the subplot titles, axis labels, and layout to ensure clarity. In addition to upgrading the figures within the manuscript, we have also prepared a dedicated Python notebook that reproduces all plots at full resolution. This notebook has been added to the public GitHub repository associated with this work, providing the figures in significantly higher detail than the printed paper format allows. This ensures full transparency and enables readers to examine the underlying results at a level of precision suitable for replication and further analysis.

-------------

Reviewer 2 Report

Comments and Suggestions for Authors

Please see the attached document.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

The manuscript would benefit from a thorough English language revision to ensure scientific precision, clarity, and consistency. While the technical content is solid, several sentences contain grammatical errors, fragmented phrasing, and inconsistent terminology that affect readability. The writing style should be made more formal and concise, avoiding colloquial expressions and redundant wording. It is also recommended to harmonize verb tenses (preferably past for methods and results, present for conclusions) and to maintain consistent terminology throughout the text—particularly for key concepts such as “dual digital twin,” “generative AI,” and “alignment framework.”

Author Response

We thank the reviewer for the constructive and thoughtful feedback.

We provide below a detailed clarification describing how each concern was addressed during the revision

-----

“Provide full methodological details (metric definitions, hyperparameters, seeds, simulator settings).”

We agree with the reviewer that methodological transparency is essential for reproducibility. In the revised manuscript, additional experiment details were introduced:

RL hyperparameters (learning rate, discount factor, batch size, network architecture, number of training steps) are now documented in a dedicated paragraph within Section 6.

Divergence metrics (MSE, DTW) used for alignment evaluation are defined directly in the alignment subsection.

Simulator parameters relevant to alignment (damping, friction scaling, wheel-base adjustments) are explicitly described in the before–after alignment experiment.

A consistent seed was used for each training configuration, and this is stated in the methodological description.

Although not all internal simulator settings could be listed exhaustively due to length constraints, the key components required to reproduce the experiments are now included.

Additionally, all configuration files and scripts have been uploaded to the project’s GitHub repository to support reproducibility.

---------

“Include quantitative results showing divergence before and after alignment.”

We fully agree with this recommendation and therefore added a specific before–after alignment experiment in Section 6. The updated manuscript now reports:

DTW distance reduced from 0.184 to 0.061

MSE reduced from 0.029 to 0.009

Fall-time discrepancy reduced from 0.54 s to 0.09 s

These results provide the requested quantitative demonstration of alignment effectiveness.

----------

“Expand validation to additional tasks beyond CartPole.”

To address this comment, a minimal but meaningful second experiment with the Pioneer 3-AT robot was added. It demonstrates the generalizability of the alignment framework to a more complex multi-body system.

The experiment evaluates a simple straight-line motion in both simulators and reports:

Forward-displacement difference reduced from 8.7% to 3.2% after alignment

Sideways deviation reduced in both simulators

This shows that the alignment pipeline generalizes beyond CartPole, fulfilling the reviewer’s suggestion to expand validation to an additional task.

--------

“Clarify the generative-AI process and model-generation criteria.”

The revised manuscript expands the explanation of the generative-AI modeling process in Section 6. Specific additions include:

The number of LLM refinement iterations used for model generation (14 for CartPole, 5 for Pioneer 3-AT)

The nature of corrections (syntactic fixes, parameter adjustments, structural alignment)

Criteria used to determine model validity (successful compilation, correct joint structure, qualitative dynamic behavior)

A clearer workflow description now outlines how the LLM was guided, how examples were used, and how convergence was evaluated.

-------

“Improve figure clarity and consistency.”

All figures were recreated at high resolution (≥300 DPI), including Figures 2–7 and Figure 10.

Figure 10 was redesigned into three separate, clearly labeled sub-images.

A Python notebook containing all plots in full resolution has been added to the project’s GitHub repository.

--------

“Refine conclusions to reflect the preliminary nature of the findings.”

The Conclusion section was revised to more accurately reflect the preliminary nature of the study. The updated text emphasizes:

The feasibility-study character of the work

The value of sim-to-sim alignment as a prerequisite for future sim-to-real investigations

The limitations in scope (limited tasks, no physical robot verification)

The forward-looking potential of the dual-twin approach rather than claiming final robustness improvements

----------

“Strengthen reproducibility through documented scripts, fixed repository versions, and configuration files.”

To support reproducibility:

The GitHub repository was updated with all configuration files (test_scenarios.json, model XML files, RL scripts).

A reproducibility-oriented folder structure was introduced, including fixed versions of the MuJoCo and Webots models used in the experiments.

Additional documentation was added explaining how to run the alignment tests, RL training, and figure-generation scripts.

This ensures that readers can replicate all experiments even without modifying the manuscript further.

--------------

English language and presentation quality of the manuscript.

The manuscript was reread and edited to ensure clearer formulation of technical statements, removal of ambiguous expressions, and improved sentence structure.

All grammatical inconsistencies, punctuation issues, and fragmented phrasing have been corrected throughout the text. Informal phrases and colloquial constructs were removed. Redundant wording was eliminated by reworking paragraphs for conciseness and academic tone. Several descriptions of the workflow, experimental methodology, and results were rewritten to achieve a more formal and compact presentation.

Overall, the manuscript has undergone a thorough linguistic and editorial refinement.

We believe the revised version fully addresses the reviewer’s concerns regarding English-language quality.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Multiple images in the article (Figures 2, 3, 4, 5, 6, 7, 10) need to be uploaded in clear versions, especially Figure 10, which requires clear display of three sub images separately.

The writing format of the references does not comply with the MDPI writing standards and needs to be thoroughly checked, modified, and standardized.

There are too many "key words" in the article, and it is necessary to condense to provide the five most distinctive ones.

There is a lack of performance comparison between the proposed dual digital twin framework and other existing newer methods in this study, making it difficult to demonstrate its advantages. Therefore, it is necessary to supplement the quantitative comparative analysis of this part.

The detailed settings such as hyperparameters, network structure, and reward functions in RL training have not been fully explained. Please provide additional clarification.

The differences in training results between different simulators require in-depth analysis of their reasons.

This article lacks actual robot verification experiments and only verifies in simulated environments without conducting real robot experiments. How to prove that the proposed framework truly contributes to the "sim to real" transfer?

The comparative analysis between this study and the latest related work needs to be supplemented by a more in-depth comparative analysis at the methodological level and performance quantification indicators, clearly explaining the differences and progress of this work compared to existing research in recent years.

Author Response

We thank the reviewer for the thoughtful and constructive feedback.
Below we provide a point-by-point response and describe the revisions incorporated into the manuscript.
All changes have been implemented in the revised version.

---

Multiple images in the article (Figures 2, 3, 4, 5, 6, 7, 10) need to be uploaded in clear versions, especially Figure 10, which requires clear display of three sub images separately.

We appreciate the reviewer’s observation.

All figures in the manuscript have been re-exported at to improve clarity.

Figure 10 has been fully redesigned to contain three clearly separated sub-images, each with its own title and labels, resolving the readability issues identified.

Additionally, we have prepared a Python notebook that generates all figures in full resolution.

This notebook is included in the public GitHub repository associated with this paper, enabling readers to view the figures at significantly higher resolution than the manuscript format allows.

------

The writing format of the references does not comply with the MDPI writing standards and needs to be thoroughly checked, modified, and standardized.

We have thoroughly reviewed all references and reformatted them according to MDPI citation guidelines. This includes consistent formatting of author lists, article titles, journal names, volume/issue numbers, page ranges, and DOIs. We also used the official reference template provided by the journal to ensure compliance with MDPI standards. Further refinements will be applied during the final production stage if the manuscript is accepted, in accordance with the journal’s formatting checks.

-----------

There are too many "key words" in the article, and it is necessary to condense to provide the five most distinctive ones.

The keyword list has been condensed to the five most distinctive and representative keywords:

Reinforcement Learning

Digital Twin

Webots

MuJoCo

Sim-to-Sim Transfer

------------

There is a lack of performance comparison between the proposed dual digital twin framework and other existing newer methods in this study, making it difficult to demonstrate its advantages. Therefore, it is necessary to supplement the quantitative comparative analysis of this part.

Thank you for raising this point. In the revised manuscript, we expanded the Related Work and Discussion sections to provide a clearer methodological comparison between our dual digital twin framework and recent state-of-the-art approaches such as domain-randomization methods, multi-simulator training frameworks, and active system-identification pipelines. Rather than adding new experiments, we clarified the conceptual and performance-level differences using works already cited in our review. We also added several notions explicitly discussing how our alignment-based improvements relate to the types of gains reported in contemporary methods (e.g., PolySim, ASID, Humanoid-Gym). This expanded discussion now makes the relative advantages of our approach more explicit, particularly in showing that our lightweight alignment loop can achieve measurable reductions in divergence and improved transfer performance without requiring multi-simulator training or real-world data collection.

We also strengthened the manuscript by adding quantitative experiments that demonstrate the practical effectiveness of the proposed framework. These new results include a cross-simulator policy-transfer evaluation and a before–after analysis of alignment effectiveness using trajectory divergence metrics. Together, they provide concrete numerical evidence that the dual-twin architecture reduces simulator mismatch and improves transfer performance. These additions directly support the reviewer’s request for clearer demonstration of the framework’s advantages relative to existing approaches and help clarify the contribution of alignment within the broader landscape of sim-to-sim and sim-to-real research.

----------------------

The detailed settings such as hyperparameters, network structure, and reward functions in RL training have not been fully explained. Please provide additional clarification.

To address the reviewer’s request, we have added a dedicated paragraph in Section 6 providing all necessary RL training details, including the learning algorithm (PPO), network architecture, hyperparameters (learning rate, batch size, discount factor), action representations (discrete vs. continuous), training duration, and the full reward formulation. These additions ensure transparency and reproducibility of the reinforcement learning experiments, and they clarify that identical settings were used across simulators to enable meaningful comparison.

----------------------

The differences in training results between different simulators require in-depth analysis of their reasons.

We appreciate the reviewer’s request for a deeper explanation of the differences in RL training outcomes across simulators. In the revised manuscript, we added a dedicated part in Section 6 discussing the underlying reasons for these discrepancies. The added text explains how differences in friction modeling, contact resolution, solver formulations (ODE in Webots vs. compliant-contact and implicit integration in MuJoCo), and wheel–ground micro-slip influence the stability and controllability of the CartPole system. We also describe how action-space discretization interacts with these dynamics, leading to slower convergence or instability in some MuJoCo configurations. This analysis provides the requested deeper insight into the sim-to-sim differences observed in the RL experiments.

---------

This article lacks actual robot verification experiments and only verifies in simulated environments without conducting real robot experiments. How to prove that the proposed framework truly contributes to the "sim to real" transfer?

We agree with the reviewer that real-robot verification is an important next step for demonstrating the full sim-to-real potential of the proposed framework. In response, we have added to the project’s GitHub repository the physical-world CartPole design that we plan to replicate in hardware in future work. However, based on our preliminary experiments with Arduino and Raspberry Pi controllers, and given our current resource limitations, we are not yet able to construct and instrument a full physical prototype. We therefore consider this an active direction for future research and are seeking collaboration with groups that specialize in hardware development and robot fabrication. At present, our expertise and primary contribution lie in modeling, analysis, and cross-simulator alignment within virtual environments, which we view as a necessary foundation for subsequent real-world experiments.

---------

The comparative analysis between this study and the latest related work needs to be supplemented by a more in-depth comparative analysis at the methodological level and performance quantification indicators, clearly explaining the differences and progress of this work compared to existing research in recent years.

We agree with the reviewer’s recommendation to provide a deeper comparison with recent related work. In the revised manuscript, we expanded the Discussion and Related Work sections to include a clearer methodological and performance-oriented comparison. Specifically, we now contrast our dual digital twin workflow with recent approaches such as PolySim, ASID, and Humanoid-Gym, highlighting differences in simulator usage, dependence on physical robot data, alignment mechanisms, and generative-modeling capabilities. We also reference the new quantitative results added in Section 6—particularly the cross-simulator policy transfer experiment and the before–after alignment analysis—which demonstrate measurable performance improvements. These additions clarify the methodological advancements of our approach and provide the requested quantitative context relative to existing state-of-the-art techniques.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I appreciate the effort invested in preparing this revised version of the manuscript. The overall organization, clarity of exposition, and coherence of the narrative have improved noticeably, and the description of the dual digital-twin concept is now easier to follow. However, several important aspects still require further refinement to meet the methodological rigor expected for publication. In particular, the validation of the alignment framework remains largely qualitative, and the manuscript would benefit from clearer quantitative evidence such as pre- and post-alignment metrics, repeated trials, and statistical analyses to substantiate the claims regarding improved consistency and transferability between simulators. Additional experimental detail (including RL hyperparameters, seeds, and simulator settings) is still needed to ensure reproducibility, and the conclusions should be calibrated to reflect the preliminary nature of the current results.

Author Response

We sincerely thank reviewer for the second round of review and for acknowledging the improvements in the organization and clarity of the dual digital-twin concept. We value feedback regarding the methodological rigor and have updated the manuscript and our external documentation to address your specific concerns.

Quantitative Evidence and Alignment Validation. You noted that the validation remained largely qualitative and requested clearer quantitative evidence.

We have revised Section 6 to explicitly incorporate the statistical metrics derived from our experiments. Instead of a general description, the text now reports specific pre- and post-alignment values, including Dynamic Time Warping (DTW) distances (reduced from 0.184 to 0.061) and Mean Squared Error (MSE) (reduced from 0.029 to 0.009). We explicitly state the calculated reduction in divergence (approx. 69%) to substantiate our claims of improved consistency.

Experimental Detail and Reproducibility. You requested additional details regarding RL hyperparameters, seeds, and simulator settings.

We have rewritten the experimental setup description in Section 6 to provide a strict specification of the PPO hyperparameters (learning rate, batch size, network architecture) and simulator settings used.

Furthermore, to fully answer your reproducibility concerns, we have significantly improved the documentation in our dedicated GitHub repositories:

MuJoCoRL (https://github.com/aalgirdas/MuJoCoRL): We added details on the validated MuJoCo environment for the CartPole-on-Car system, specifically highlighting the sim-to-sim alignment assets that ensure policies transferred to Webots behave predictably.

WebotsRL (https://github.com/aalgirdas/WebotsRL/tree/main): We expanded the documentation to showcase the specific implementation of the reinforcement learning algorithms within the Webots environment.

roboGen-LLM (https://github.com/aalgirdas/roboGen-LLM): We added details on how the framework automatically extracts geometric and material data from technical documentation to generate the robot models.

Calibration of Conclusions. You suggested calibrating the conclusions to reflect the preliminary nature of the results.

We have revised Section 7 to explicitly acknowledge the scope of the current validation. We now clarify that while the framework successfully quantifies and reduces divergence in the tested systems, these results represent a preliminary validation step paving the way for future research on more complex, high-DoF systems.

Reviewer 3 Report

Comments and Suggestions for Authors

The title of the vertical axis in the first subgraph of Figure 10 is missing and needs to be supplemented.

Author Response

The title of the vertical axis in the first subgraph of Figure 10 is missing and needs to be supplemented.

Thank you for pointing this out. The missing vertical-axis label in the first subgraph of Figure 10 has been corrected in the revised version of the manuscript.

Article Menu

A Dual Digital Twin Framework for Reinforcement Learning: Bridging Webots and MuJoCo with Generative AI and Alignment Strategies

Further Information

Guidelines

MDPI Initiatives

Follow MDPI