Review Reports - Measurement of Cognitive and Kinematic Adaptation in Exoskeleton-Assisted Locomotion: Validation of an XR-Based Framework

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper explores the use of a mixed-reality headset with eye-tracking sensors to quantify cognitive load during adaptation to an exoskeleton. The method has been tested on 30 healthy participants. The paper is interesting and well written in general. Limitations of the study are mentioned. The authors may wish to consider the following before proceeding with publication.

(1) Some statements in the Introduction should be supported by a citation. For example, on line 76, the source of the statistic "only a small fraction of cognitive assessment studies (approximately 20%) focus on exoskeletons compared to prostheses" must be provided. If this statistic was obtained from a review paper, the paper should be cited; otherwise, the parameters of the literature review that was performed (e.g., search terms, date range) should be provided.

(2) Information about the participants (lines 327-328) is typically provided as mean and standard deviation, sometimes accompanied by range if pertinent. It is unclear why only the upper bound of body mass ("not exceeding 80 kg") has been provided.

(3) In Figure 8, the data have been plotted as if the horizontal axis is a continuous independent variable, but it seems that these are buckets (as in Figures 4-7). Using continuous lines for this plot is misleading. Perhaps the group mean and confidence interval could be shown as boxes, with dashed lines showing individual participant trends.

(4) The authors may wish to consider whether individual participants who might be expected to adapt more quickly (e.g., athletes) partially explains the inter-individual differences (e.g., Figure 8). These differences may also be due to individual participant height or mass being better/worse for the specific exoskeleton. If these data are available, a correlation could be checked.

(5) The Acknowledgments states that Technaid provided the exoskeleton and provided "valuable technical support throughout the experimental phase." This constitutes the potential for a perceived conflict of interest. Because of the involvement of Technaid during the study, the authors should explicitly state (in the Acknowledgments or Conflict of Interest statement) that the company had no role in designing the study, analyzing the data, writing the paper, or decision to publish (if true).

Minor comments:

(6) The paper contains some typos and inaccuracies. For example, "Multiple Correlation Coefficient" on line 11 would make more sense as "coefficient of multiple correlation" if the initialism "CMC" is used. On line 17, "a measurements protocol" should be "a measurement protocol." On line 63, "SWAVF" stands for "Standing and Walking Attention Visual Field." Line 205 should not be indented. On line 225, "et Al." should be "et al." On line 275, "Defined" should be "Defining." On line 328, "weight" should be "mass." On line 422, "analyzes" should be "analyses." On line 507, the word "external" in "external assistive device" does not seem to have any meaning. On line 526, the sentence "At the same time, walking results unstable." seems to be missing a word.

(7) The keyword "measurements" on line 19 does not seem to add value as a search term.

(8) In Figure 8, the thin lines are very difficult to see in this color against the white background. A different color can be used to make these data visible.

Comments on the Quality of English Language

Please see comment (6) above.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper presents an interesting experimental protocol combining kinematic and cognitive measurements for the assessment of human-exoskeleton integration. Overall, the paper is well written and presented, and the topic is of interest for the research community. However, there are some issues that the authors should consider to improve the scientific soundness and clarity of their work:

Which control paradigm of the exoskeleton was considered? Is it a position control with full assistance over a pre-defined trajectory? Is it a torque-based assistance following the user's natural movement? How was the assistance of the exoskeleton activated? This information is extremely relevant to understand the effect of the human-robot interaction and how the found results can be translated to other robotic platforms.
What is the positioning of the 17 IMUs?
According to the experimental protocol, did you ask the participants to find and say the green number as fast as they could? If yes, since you recorded the voice activity synchronized with the other data, it would be interesting also to analyze the speed of the response to understand if it varies with the training.
Section 2.3.3: Is the procedure used to identify HS and TO from shank angular velocity a standard one? If yes, please provide the reference to this method. Otherwise, a more detailed explanation on why this procedure was selected is necessary.
Section 2.3.4: How is wc identified in equation 4? This value is critical for a correct measurement of the SPARC metric.
Section 2.3.4: Why only the shank angular velocity was considered for the SPARC and CMC, and not other joints' velocity? This seems to me quite an arbitrary decision if not properly motivated.
Section 2.6: Which post-hoc test was used after the ANOVA?
Section 3.1.1: If I understood correctly, you have found statistically significant differences between all the DT and ST-1 and ST-3, but no significant differences between the DT conditions and ST-2 or ST-4. If that is the case, despite visually an overall increase in pupil dilation is observable in the DT conditions, it seems that the major contribution to pupil dilation is provided by the concurrent walking activity together with cognitive task, rather than to the presence of the exoskeleton. Please, comment more on this aspect.
Section 4.1: the authors mention that after the initial familiarization, the SPARC and SLV were worse once introduced the dual task. However, no comparison between DT-0 and the kinematic data recorded during the familiarization was reported in support of this claim. Considering the clear session effect visible on the kinematic measurements, it seems to me that initial lower SPARC and higher STV are more related to the "novelty" of the task for the user rather than a clear dual-task effect. Indeed, just after the first session, the statistically significant differences between DT and ST disappear suggesting a rapid abituation effect. In this sense, it would be critical the comparison with the kinematic data during the familiarization in order to see quantitatively the drop of kinematic performance in DT-0 with respect to a standard use of the exoskeleton system.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript presents an interesting and timely study on the measurement of motor and cognitive adaptation during exoskeleton-assisted walking by integrating IMU-based kinematics, eye-tracking, and an XR-based dual-task paradigm. The topic is relevant for rehabilitation engineering and human–robot interaction, and the proposed framework is original in that it combines overground exoskeleton use, mixed reality, visuospatial attention stimulation, and concurrent kinematic/pupillometric monitoring in a single experimental architecture. The study is conducted on 30 healthy participants and reports significant session effects for SLV, SPARC, CMC, and TEPR across repeated sessions.

Overall, the manuscript has merit and the experimental platform is promising. However, in its current form, I do not think the paper yet provides a sufficiently rigorous validation of the proposed measurement framework. At present, the study mainly demonstrates feasibility and sensitivity to session-related adaptation, rather than true metrological validation of the sensing architecture. For this reason, I recommend major revision before the manuscript can be considered for publication.

Major comments

The central claim of “validation” is currently stronger than what the manuscript demonstrates.
The paper describes gaze calibration via a Kabsch-based spatial alignment and synchronization via cross-correlation plus optimization, but it does not report quantitative validation metrics for these procedures, such as calibration error, XR spatial registration accuracy, synchronization residual error, repeatability, drift across sessions, or percentage of unusable data. Since the manuscript is framed as a validation study of a measurement framework, these quantitative system-level performance indicators are essential. Otherwise, the paper is better described as a feasibility or sensitivity study.
Pupillometry is highly sensitive to luminance, yet the protocol was conducted without dedicated light control.
The authors explicitly state that the experiments were performed in a standard corridor under ecological conditions and without special brightness-control equipment. This is a major methodological concern because both tonic pupil diameter and phasic pupillary responses can be influenced by ambient illumination and headset display conditions. The manuscript should report how luminance variability was minimized or monitored, whether virtual stimulus brightness was fixed, and how the authors excluded illumination as a confound. At minimum, this limitation should be discussed much more explicitly.
The manuscript does not report behavioral performance in the cognitive task, although those data appear to have been collected.
The system logs target/distractor timing, gaze coordinates, head position, and continuous audio responses, yet the Results focus almost exclusively on pupil-based inference for the cognitive component. Accuracy, omission rate, response latency, fixation-to-target confirmation, or target identification success would substantially strengthen the interpretation of “cognitive adaptation.” Without behavioral task outcomes, the claim that users adapted cognitively to the visuospatial task remains only partially supported.
The use of English as a non-native verbal response adds an uncontrolled source of inter-subject variability.
The manuscript states that participants had to report the target number in English specifically to increase cognitive demand. This is an interesting idea, but it also introduces a confound related to language proficiency, familiarity, and stress. The authors should report participants’ English proficiency level or explain why this factor is unlikely to bias the results. Otherwise, part of the measured pupillary response may reflect linguistic variability rather than only visuospatial-cognitive load.
The experimental design does not fully disentangle learning, order, and fatigue effects.
The sequence is fixed: initial ST sessions, donning/familiarization, DT-0, then alternating TR/DT blocks, followed by final ST sessions. Because the order is not randomized or counterbalanced, improvements across sessions may reflect general familiarization, decreasing anxiety, headset adaptation, or fatigue-related compensation, not only true human–exoskeleton adaptation. The authors partially acknowledge fatigue through rest periods, but this issue should be discussed more critically, and additional analysis would be beneficial if possible.
Some interpretations in the Discussion are overstated relative to the presented evidence.
Statements about a “mechanical ceiling,” “overshoot of baseline motor performance,” and the system “proving” readiness for future clinical applications go beyond what can be concluded from a healthy-subject study with one exoskeleton model and short repeated sessions. These interpretations should be softened and more clearly separated from what was directly demonstrated by the data.
The limitations are acknowledged, but the title and conclusions should reflect them more conservatively.
The study involves 30 healthy subjects and one commercial exoskeleton model, which the authors themselves recognize as a limitation. This is acceptable for a preliminary framework study, but the conclusions should more carefully restrict the scope of generalization to healthy users and to this specific platform unless broader validation data are provided.

Minor comments

There is an inconsistency in the naming of the dual-task sessions in the cognitive results section. The protocol and figures use DT-0 to DT-3, but the text on pupil dilation refers to DT-1 to DT-4. This should be corrected for consistency.
Section title “4. Discussions” should be changed to “4. Discussion.”
The manuscript would benefit from careful English editing. There are several awkward or grammatically weak expressions throughout the text.
The statistical reporting should be made more uniform. In particular, it would help to provide confidence intervals and degrees of freedom where appropriate, and to report effect sizes consistently across analyses.
More practical detail would improve reproducibility: exact rest durations, how often manual refinement of gait events was needed, how blink-contaminated trials were handled, and whether any participants or trials were excluded.

Comments on the Quality of English Language

The manuscript is generally understandable, but the English should be carefully revised by a fluent speaker or professional editing service, as several sentences are grammatically incorrect or phrased in a non-native and stylistically awkward manner.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

This study presents a novel framework to assess both kinematic performance and cognitive workload during exoskeleton-assisted walking by integrating inertial motion capture (Xsens), eye-tracking (Pupil Neon), and mixed reality (Meta Quest 3) into a unified experimental platform . The authors developed a dual-task paradigm where participants walk with an exoskeleton while performing a visuospatial attention task involving holographic stimuli. The framework was validated on 30 healthy individuals across repeated training sessions. Results showed significant improvements in gait stability and smoothness (e.g., reduced step length variability and improved SPARC/CMC metrics), alongside a decrease in task-evoked pupillary response (TEPR), indicating reduced cognitive demand over time . Overall, the study demonstrates that the proposed system can effectively capture motor adaptation and cognitive load simultaneously, providing a promising tool for evaluating human–robot interaction in rehabilitation contexts.

Major Comments

The integration of XR, eye-tracking, and kinematics in a dual-task paradigm is a strong and meaningful contribution. However:

The manuscript would benefit from a clearer positioning against existing frameworks, especially EUROBENCH and similar multimodal systems.
The authors claim novelty in combining all components, but a more structured comparison table with prior work would strengthen this claim.
The study is limited to healthy participants (n=30) .
While appropriate for validation, the conclusions occasionally imply clinical applicability.
The authors should:

Clarify that results do not directly translate to clinical populations (e.g., stroke, SCI).
Discuss expected differences in adaptation (slower learning, higher variability).

The protocol is well designed (Figure 3 shows a clear sequence of ST and DT conditions on page 9). However:

The duration of training (only 3 cycles) may be insufficient to fully capture adaptation dynamics.
The plateau observed after DT-1 may be protocol-limited rather than physiological.

Discuss whether longer training would reveal additional phases of adaptation.

The cognitive task (identifying numbers in a non-native language) is interesting but raises concerns:

It introduces language-related cognitive load, not purely visuospatial.
This may confound interpretation of cognitive demand.

How much of TEPR reduction reflects task learning vs. language familiarity vs. motor adaptation?

The choice of SLV, SPARC, and CMC is appropriate and justified.
However:

These metrics mainly capture variability and smoothness, but not energetics or coordination strategies.

Discuss why other metrics (e.g., COM variability, phase coordination, or symmetry indices) were excluded.

The use of LMM is appropriate. However:

The manuscript lacks effect size interpretation beyond reporting values.
No discussion of statistical power or sample size justification.

The explanation of a “posture-first strategy” is insightful. However:

The claim that dual-task performance eventually outperforms training condition is strong.
This may be due to:

learning,
familiarity,

or experimental bias.

Tone down causal claims or provide stronger justification.

Some sections are overly long and dense, especially:

Synchronization methodology (Section 2.3.2)
Event detection description

Figures (e.g., Figures 4–6 on pages 11–12) are clear, but:

Axis labels and units could be enlarged for readability.

The explanation of SPARC could be simplified for broader readership.
How do you expect this framework to perform in neurological populations (e.g., stroke or SCI)?
What modifications would be required?
Why was non-native language reporting selected?
Did you control for participants’ English proficiency?
Can you distinguish whether improvements are due to:

motor learning,
cognitive habituation,
or exoskeleton constraint adaptation?

Does wearing the XR headset itself alter gait biomechanics?
Was there a control for headset-only walking beyond baseline?
How feasible is this setup in clinical environments, considering:

equipment complexity,
calibration requirements,
synchronization pipeline?

Could this framework be used for real-time feedback or adaptive control of exoskeletons?

Quality of English

Overall, the English quality is good and professional, and the manuscript is clearly understandable. However, there are some areas for improvement:

Strengths:

Technical terminology is used appropriately.
Sentences are generally clear and well-structured.
The narrative flow in the Introduction and Discussion is strong.

Issues:

Occasional grammatical inconsistencies:

Example: “has lead” → should be “has led”

Some sentences are overly long and complex, particularly in the Methods section.
Minor punctuation issues and spacing inconsistencies appear throughout.
Redundancy in phrasing (e.g., repeated explanations of concepts like cognitive load).

Suggestions:

Perform light professional proofreading to:

correct minor grammar errors,
simplify long sentences,
improve conciseness.

Comments on the Quality of English Language

Quality of English

Overall, the English quality is good and professional, and the manuscript is clearly understandable. However, there are some areas for improvement:

Strengths:

Technical terminology is used appropriately.
Sentences are generally clear and well-structured.
The narrative flow in the Introduction and Discussion is strong.

Issues:

Occasional grammatical inconsistencies:

Example: “has lead” → should be “has led”

Some sentences are overly long and complex, particularly in the Methods section.
Minor punctuation issues and spacing inconsistencies appear throughout.
Redundancy in phrasing (e.g., repeated explanations of concepts like cognitive load).