Next Article in Journal
Present and Future of the Use of Artificial Intelligence in Orthodontics
Previous Article in Journal
A Review of Synthetic Bone Grafts in Lumbar Interbody Fusion
Previous Article in Special Issue
Examining the Acceptance and Use of AI-Based Assistive Technology Among University Students with Visual Disability: The Moderating Role of Physical Self-Esteem
 
 
Article
Peer-Review Record

Measuring Assistive Technology Outcomes via AI-Based Kinematic Modeling of Individualized Routine Learning in Elite Boccia Athletes with Severe Cerebral Palsy: A Longitudinal Case Series

Bioengineering 2026, 13(3), 261; https://doi.org/10.3390/bioengineering13030261
by Se-Won Park 1 and Young-Kyun Ha 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5:
Bioengineering 2026, 13(3), 261; https://doi.org/10.3390/bioengineering13030261
Submission received: 1 January 2026 / Revised: 8 February 2026 / Accepted: 14 February 2026 / Published: 25 February 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This manuscript presents a valuable and innovative application of an AI-enabled markerless motion-capture routine-learning system for monitoring assistive technology (AT) outcomes in elite Boccia athletes with severe cerebral palsy. The longitudinal design and high-volume trial collection are strengths, particularly given the rarity of this population. However, several points should be addressed to improve methodological clarity and strengthen the scientific contribution.

The generalizability of the findings is inherently limited by the single-case series design (n = 3), and the manuscript relies primarily on descriptive summaries (early vs. late trials, learning curves, variability inspection). While this approach is understandable for a rare elite population, the authors should strengthen the within-athlete analytical framework by adding at least one established single-case statistical method (e.g., Tau-U, randomization tests, or other effect estimation methods suitable for time-series data) to quantify the magnitude and consistency of changes across repeated trials beyond visual inspection.

The manuscript introduces two outcome domains—expert-rated performance scores and objective kinematic variability indicators, but the relationship between these domains is not sufficiently quantified or discussed. In particular, it remains unclear whether reduced variability consistently coincides with improved expert scores, or whether stabilization is an independent signal of learning, especially in the most severely impaired athlete. The authors should include clearer interpretation of concordance or divergence between subjective performance scoring and objective variability metrics, as this is central to the validity of the system as an AT outcome measurement tool.

The practical deployment aspects of the proposed system require further clarification. Although the manuscript describes a multi-camera environment including Azure Kinect (with setup illustrated in Figures 1–2), the reproducibility and scalability of this setup in real-world disability sport environments are not fully addressed. The authors should provide additional details on the minimum technical requirements, sensor placement constraints, lighting sensitivity, calibration needs, feasibility in smaller facilities, and the level of expertise required to operate the system, since these factors directly affect translation into accessible AT practice.

The reference list is strong in Boccia performance science and adapted physical education, but it would benefit from reinforcement with methodological studies focusing on AI evaluation in rare neurological populations with limited and heterogeneous datasets. In particular, the authors are encouraged to cite Trabassi et al. ('Optimizing rare disease gait classification through data balancing and generative AI: insights from hereditary cerebellar ataxia.' Sensors. 2024), which discusses optimizing motion-based clinical classification through data balancing and generative AI in a rare neurodegenerative context. Although the disease and task differ, the methodological relevance is high, as both studies address motion analytics in rare neurological cohorts, high intra-subject variability, and the need for robust AI-supported interpretation under small-sample constraints. This citation would strengthen the methodological framing of the paper in terms of how AI-based AT systems can remain reliable and meaningful in rare and severe disability contexts.

Finally, the manuscript would benefit from a clearer articulation of how this AI-based routine-learning system supports actionable individualized coaching and educational planning. The authors correctly interpret different learning patterns (maintenance, improvement, stabilization-first trajectories), but the practical implications could be made more concrete by explicitly stating how coaches should adjust training intensity, task difficulty, or feedback strategies based on the observed learning curves and variability patterns. This would improve the translational value of the work for elite disability sport settings and accessibility-oriented outcome monitoring.

Author Response

Dear Reviewer,

Thank you very much for your thoughtful and constructive comments on our manuscript. We truly appreciate the time and effort you dedicated to providing feedback that has significantly enhanced the methodological clarity and technical rigor of our study. We have carefully revised the manuscript to address each of your points, and the changes are detailed below.

Regarding your suggestion to strengthen the within-athlete analytical framework, we have incorporated more established statistical measures to complement our visual inspection. In the revised Section 2.6 and Table 3, we now report Cohen’s d effect sizes to quantify the magnitude of change between early and late practice phases. Furthermore, in Section 3.4, we have added linear slope estimations and R2 values to provide a statistically grounded representation of the improvement trends. While our current study utilizes these indices to support educational interpretation, we have also explicitly noted in the "Limitations and Future Directions" (Section 4.4) that future research will integrate advanced single-case statistics such as Tau-U and randomization tests to further enhance analytical robustness.

To address the relationship between expert-rated scores and objective kinematic variability, we have significantly expanded our discussion in Section 4.1. We now provide a detailed interpretation of how these two domains relate across different learning stages. Specifically, we discuss the observed divergence in the most severely impaired athlete (P3), where a reduction in kinematic variance served as an early, independent signal of learning within the cognitive stage, even when absolute performance scores remained stable. This revision clarifies the system’s unique diagnostic value in detecting subtle motor refinements that traditional scoring systems might overlook.

In response to your request for clarification on practical deployment and scalability, we have added specific technical details in Section 2.2. The revised text now outlines the minimum hardware requirements (such as a PC with an NVIDIA RTX 30-series GPU), environmental constraints regarding lighting and floor space (4m x 4m), and the calibration procedures necessary for session-to-session consistency. We have also specified that the system features a user-friendly GUI designed for coaches with basic computer literacy, requiring only a 30-minute orientation to operate, which directly addresses the feasibility of its use in real-world disability sport environments.

We also sincerely appreciate your recommendation of the methodological study by Trabassi et al. (2024). We have incorporated this citation as Reference [40] in both the Introduction and Discussion (Section 4.4). This addition has been instrumental in strengthening our methodological framing, particularly in terms of justifying our approach to motion analytics and data interpretation within rare neurological populations characterized by high intra-subject variability and limited datasets.

Finally, to improve the translational value of our work, we have articulated clearer coaching implications in Section 4.2. We have added concrete strategies for how practitioners can adjust training based on the observed learning trajectories. For instance, we suggest that for "stabilization-first" patterns, coaches should focus on maintaining consistency through task simplification, whereas for "plateau" patterns, task difficulty should be increased to challenge the athlete's automated routines. These additions ensure that our AI-based system provides actionable insights for individualized coaching and individualized education program (IEP) planning.

We believe that these comprehensive revisions have addressed all your concerns and have resulted in a much stronger manuscript. Thank you once again for your valuable guidance.

Reviewer 2 Report

Comments and Suggestions for Authors

I have studied the submitted work. The use of AI and marker-free motion capture for elite athletes with cerebral palsy in Boccia is a rather poorly studied niche, so the research has value for adaptive sports. The structure of the article is logical.

After reading it, the following questions arose about the content.

1. The article focuses on the possibilities of measuring the results of an AI-based system, but clearly omits the technical validation of the AI model itself. The authors state that the accuracy of forecasting is reserved for future technical validation. However, I consider this to be a disadvantage. If an AI system is the main tool for generating objective tracking of results, its reliability, accuracy, and error should be established within the framework of this article.
2. The selection also raises questions. The authors noted the rarity of elite athletes with cerebral palsy, but the study design lacks a control mechanism. The observed improvements in the overall scores of P1 and P2 are attributed to the AI-based routine training program, but without a control group, it is impossible to separate the effect of AI intervention from the natural effects of regular practice volume. It seems to me that all the conclusions should be confirmed using various statistical tools capable of processing data on such a small scale.
3. The study uses a group of experts to evaluate the attempts, and these estimates serve as a benchmark for training the model and the main measure of the result. If the AI provides feedback based on a model trained to mimic these trainers, and then the trainers evaluate the performance, there is a risk of a closed feedback loop. Moreover, although the authors mention that the AI model provides automatic estimates, for this article they decided to analyze only expert estimates (if I'm not mistaken). The authors should have analyzed the correlation between the internal AI assessment and the human assessment in order to demonstrate the consistency of the system.
4. The manuscript describes feedback as visual augmented feedback... includes visualization of the trajectory... and quantitative information. However, a specific visual interface is not shown. The article lacks a drawing showing the real feedback screen that athletes see. It is worth adding it for a better understanding.

Author Response

Dear Reviewer,

We sincerely appreciate your constructive and critical review of our manuscript. Your comments regarding the technical validation of the AI model, the study design, and the necessity of visual interface documentation have been invaluable in improving the quality and transparency of our work. We have carefully addressed each of your points in the revised manuscript.

  1. Regarding technical validation (accuracy, error, and reliability) of the AI model:

We fully agree that technical validation is a cornerstone of any AI-based measurement system. While our primary focus in this study was on the pedagogical "outcomes" of the system as an assistive technology, we recognize that the underlying model's reliability must be established. In the revised Section 2.5, we have provided a much more detailed description of the Bidirectional LSTM architecture and the 29 kinematic features used for regression. To address your concern about the lack of error metrics, we have explicitly stated in the "Limitations and Future Directions" (Section 4.4) that while this study prioritizes pedagogical feasibility, follow-up technical validation will report standardized engineering metrics such as Mean Squared Error (MSE) and R-squared (R2) for the regression model. We believe that providing the detailed architecture in this version serves as the first step toward the technical rigor you rightly requested.

  1. Regarding the lack of a control group and small-scale statistical tools:

The challenge of establishing a control group is particularly acute when working with an elite population of GMFCS level IV athletes, as their rarity makes large-sample recruitment practically impossible. To mitigate this limitation and separate the AI's effect from general practice, we have strengthened our statistical approach in Section 2.6 and Section 3.4. We have added Cohen’s d effect sizes for early-vs-late comparisons and linear slope estimation for each learning curve. These tools are specifically designed to quantify trends and magnitudes of change in small-scale, longitudinal data. Furthermore, we have expanded our Discussion (Section 4.1) to interpret these gains through the lens of motor learning stages, providing a theoretical framework to support the observed impact.

  1. Regarding the "closed feedback loop" and the correlation between AI and human assessment:

This is a very insightful observation. To avoid the risk of a closed feedback loop, the AI was designed to provide external focus feedback (e.g., ball trajectory), which is distinct from the internal kinematic components rated by the experts. To address the need for consistency, we have clarified in Section 2.3 that our expert ratings were derived from a consensus-based procedure involving two specialists, which served as the "Gold Standard" for the AI's supervised learning. In response to your suggestion, we have added a commitment in Section 4.4 to perform explicit agreement analysis (e.g., Intraclass Correlation Coefficients) between AI-generated scores and expert ratings in our next phase of research, which will further demonstrate the system’s diagnostic consistency.

  1. Regarding the visual interface of the feedback system:

We agree that documenting the visual interface is important for clarity. However, as the real-time interface involves dynamic overlays that are difficult to capture in a static, high-quality image, we have instead significantly enriched the textual description in Section 2.2. The revised text now details the specific components—such as the projected ball trajectory and numerical velocity deviations—that athletes use for self-regulation. This provides the necessary clarity on how the "visual augmented feedback" was operationalized without compromising the visual quality of the manuscript.

We believe these revisions have significantly strengthened the manuscript’s technical and methodological foundation. Thank you once again for your professional guidance.

Reviewer 3 Report

Comments and Suggestions for Authors

You claim this study focuses on "educational application" and "pedagogical" goals rather than rigorous ML engineering validation. However, your manuscript title explicitly emphasizes "Measuring Assistive Technology Outcomes," and you submitted to a special issue titled "Measuring Outcomes and Impact Related to Assistive Technology and Accessibility for Disability." This framing is inconsistent. If you are measuring outcomes, the measurement tool itself must be validated.

You trained a BiLSTM neural network on 694 throws, yet you provide no information about your data splitting strategy. You do not report any train, validation, or test set partitioning. You do not report standard regression metrics such as MSE, RMSE, or R squared on held out data. To train any neural network, you must split data in some form, yet you are silent on this matter. This is a fundamental omission.

With only three patients, proper patient level splitting becomes nearly impossible. If you split within patients, temporally adjacent throws from the same athlete share similar kinematic patterns, introducing data leakage. If you use leave one patient out cross validation, you train on only two patients and test on one. You do not address this limitation adequately.

The learning curves you present in Figures 5 through 7 show athlete performance trajectories across practice trials. These are not ML model learning curves. For proper ML validation, you should show training accuracy/loss versus validation accuracy/loss across training epochs. You should show two lines that both evolve, plateau together, with the training curve slightly above the validation curve. You show neither.

You appear to conflate athlete performance trajectories with evidence of system validity. Showing that an athlete's scores changed over practice sessions does not demonstrate that your AI model can accurately predict performance or generalize to unseen data. These are fundamentally different questions requiring different types of evidence. The athlete curves address whether learning occurred; the model curves address whether your measurement tool works. You provide the former but not the latter, yet your title and stated aims concern outcome measurement, which depends entirely on the validity of the measurement tool itself.

Regarding the athlete performance curves you do present, even these provide weak evidence for your claims. A healthy learning curve should demonstrate evolution from a starting point, rise to a new level, and then plateau. Out of your seven player task combinations, only one curve (P2 T3) demonstrates this healthy pattern. P1 T2 and P2 T2 are flat from the start due to ceiling effects. P3 T2 and P3 T3 remain flat at low levels throughout with no demonstrated learning. P1 T3 shows improvement but never plateaus within your study window. If you claim to evaluate a "routine learning system," the learning curves should actually demonstrate learning. Having only one of seven curves show a healthy pattern undermines your claims.

You reframe P3's flat curves as "stabilization" rather than improvement, but this appears to be a post hoc reinterpretation. For a study submitted to a special issue on measuring outcomes, the outcome evidence is weak for most of your player task combinations.

Your results cannot be considered generalizable. At the model level, you provide no evidence that your AI model generalizes to unseen data because you report no splitting strategy and no validation metrics. At the outcome level, only one of seven learning curves demonstrates the healthy learning pattern that would support your claims about the system facilitating skill acquisition. The combination of these issues severely compromises any claims about generalizability or external validity.

You acknowledge in your limitations that "future work should separate dedicated test sets for performance evaluation." This acknowledgment confirms that you did not implement proper splitting in this study. This should not be deferred to future work when the current manuscript claims to measure outcomes using an AI enabled system.

Author Response

Dear Reviewer,

We would like to express our deepest gratitude for the exceptionally rigorous and constructive review of our manuscript. Your insightful critique regarding the technical validation of our AI model and the consistency of our research framing has been invaluable in enhancing the scientific integrity of our work. We have carefully addressed each of your concerns through extensive revisions, ensuring that our response is both transparent and grounded in the pedagogical context of our study.

Regarding the consistency of our framing, we agree that the distinction between the "Assistive Technology (AT) intervention" and the "Outcome Measurement Tool" required greater clarity. In this study, the AI-based system itself is the AT being evaluated, whereas the primary outcome measurement tool is the performance rating scale administered by our expert panel. To ensure the highest level of technical credibility for these measurements, as detailed in the revised Section 2.3, the evaluation was conducted by a panel consisting of a national-level head coach with over 10 years of professional coaching experience and a certified assistant coach. By utilizing a consensus-based procedure where these two experts aligned their scores based on standardized operational definitions, we established a robust "Gold Standard" to substantiate the behavioral changes in the athletes.

In response to your concerns about the missing machine learning engineering metrics, such as data splitting and regression errors, we acknowledge that these are fundamental requirements for a purely engineering-focused manuscript. However, since the primary objective of this study is to analyze the longitudinal pedagogical outcomes for athletes with severe disabilities, we prioritized the analysis of human-expert scores. To provide greater technical transparency, we have clarified in Section 2.5 that during the internal development of our model, we utilized Leave-One-Subject-Out (LOSO) cross-validation to mitigate data leakage risks despite our small sample size. While we have focused this manuscript on the educational significance, we have added a commitment in Section 4.4 to publish a separate technical validation report including Mean Squared Error (MSE) and training loss curves. Furthermore, to clarify the system's feedback mechanism without the clutter of a static software interface, we have enriched the textual description in Section 2.2, detailing the specific parameters—such as the projected ball trajectory and numerical velocity deviation—that athletes use for self-regulation.

Additionally, we appreciate your correction regarding the learning curves presented in Figures 5 through 7. We recognize that these figures represent athlete performance trajectories across practice trials rather than the optimization logs of the AI model. To avoid any further ambiguity, we have updated the captions and the corresponding text in Section 3.4 to clearly label these as "Athlete Performance Learning Curves." We have also included a discussion in Section 4.4 to explicitly distinguish between the behavioral improvement of the athletes and the technical optimization of the model.

Furthermore, we would like to address your observation regarding the heterogeneous patterns of the learning curves. We believe that the presence of ceiling effects in elite performers (P1-T2, P2-T2) and the flat trajectories in novice athletes (P3) reflect the high ecological validity of our study. For the most severely impaired athlete, we have strengthened our theoretical defense in Section 4.1 using Bernstein’s degrees of freedom problem. We argue that the reduction in kinematic variance, which was detected by the AI system’s objective data, serves as an essential "silent signal" of learning within the cognitive stage, even before it manifests as improved performance scores.

Finally, we humbly acknowledge the inherent limitations in generalizability due to our small sample size. We have tempered our claims throughout the revised manuscript, framing this study as an exploratory case series intended to demonstrate the feasibility of AI-enabled monitoring. We have also explicitly noted in our limitations that the current analysis did not utilize a separate held-out test set for the AI scores, which necessitates a cautious interpretation. We believe that these clarifications and the detailed justification of our expert-led measurement process have significantly improved the rigor of our work.

Reviewer 4 Report

Comments and Suggestions for Authors

Review: bioengineering-4104718-peer-review-v1

The manuscript entitled “Measuring Assistive Technology Outcomes of an AI-Enabled Markerless Motion-Capture Individualized Routine-Learning System in Elite Boccia Athletes with Severe Cerebral Palsy: A Longitudinal Case Series” aims to introduce an innovative outcome-measurement framework for disability sport by integrating expert-based performance scores with longitudinal motor variability indicators. Overall, the topic is timely and relevant, and the manuscript is clearly written and engaging. The study addresses an important methodological gap in elite disability sport; however, several aspects require clarification and refinement to strengthen the paper.

First, although the system is described as AI-enabled, the manuscript would benefit from a clearer and more detailed explanation of the AI’s functional role. Specifically, the authors should clarify which components of the system are learned or adapted through AI, how temporal movement data are modeled and processed across repeated trials, and whether the underlying model is intended primarily for classification, regression, or pattern-recognition purposes. Improving transparency in this area would enhance both the technical rigor and the reproducibility of the study.

Second, the credibility of the performance outcomes would be strengthened by providing additional details about the expert evaluation process. Information regarding the number of raters, their level of expertise, and procedures used to ensure scoring consistency (e.g., inter-rater agreement) is currently limited. Even brief reporting of reliability metrics would substantially improve methodological robustness and confidence in the subjective performance scores.

Third, while the use of motor learning stage terminology (e.g., cognitive and autonomous stages) is appropriate and theoretically grounded, its application could be more consistent across the results and discussion sections. Clearer alignment between learning-stage interpretations and reported outcomes would improve conceptual coherence.

In addition, the discussion could be expanded slightly to consider how the proposed framework might be applied beyond elite athletes, for example in non-elite or youth disability sport contexts. Such reflection would help clarify the broader applicability and practical relevance of the approach.

Finally, the authors appropriately acknowledge the limited generalizability of the findings due to the small sample size. Given the rarity of the population studied, this limitation is unavoidable and does not detract from the study’s methodological contribution. Framing the contribution in terms of transferability of the measurement framework, rather than population-level generalization, is an effective and appropriate strategy.

In summary, this is a well-conceived and promising study that makes a meaningful contribution to assistive technology and disability sport research. Addressing the points outlined above would further strengthen the manuscript and support its publication.

 

Author Response

Dear Reviewer,

Thank you very much for your encouraging and constructive review of our manuscript. We appreciate your recognition of the timely nature and relevance of our study in the field of disability sport. We have carefully addressed each of your suggestions to enhance the methodological clarity and conceptual coherence of the paper. Our point-by-point responses are detailed below.

Regarding the functional role of the AI system, we have provided a more detailed explanation in the revised Section 2.5. We clarify that the core analytic engine utilizes a Bidirectional Long Short-Term Memory (Bi-LSTM) network architecture specifically chosen to model complex temporal movement dependencies from both forward and backward contexts. We explicitly state that the model is intended for regression to predict continuous performance scores (on a 0–10 scale) across eight core indicators, ensuring that the AI-generated scores match the nuanced standards of elite coaching. The 29 kinematic features extracted (joint distances, velocities, and accelerations) are also now listed to improve technical transparency.

To strengthen the credibility of the performance outcomes, we have added specific details about the expert evaluation process in Section 2.3. The evaluation was conducted by a panel of two experts: a national-level head coach with over 10 years of professional coaching experience and a certified assistant coach. To ensure scoring consistency and minimize subjective bias, these raters performed a joint consensus-based assessment procedure for every trial, discussing and aligning their scores based on standardized operational definitions for each item. Furthermore, internal consistency metrics (Cronbach’s α) are reported in Section 3.3.2 to substantiate the reliability of the ratings.

In response to your suggestion on motor learning stage terminology, we have systematically aligned our interpretations with the stages of cognitive, associative, and autonomous learning throughout the Results and Discussion sections. We have ensured that the observed performance trajectories and kinematic variability patterns are interpreted through this theoretical lens to improve the overall conceptual coherence of the manuscript.

Regarding the broader applicability of the framework, we have expanded our Discussion in Section 4.2. We now reflect on how the system can be applied beyond elite contexts, specifically highlighting its value as a diagnostic and monitoring tool for students in general adapted physical education settings. We have also explicitly connected the framework to Individualized Education Program (IEP) planning and the adoption of assistive technologies in school contexts to promote inclusive excellence across different skill levels.

Finally, we appreciate your positive feedback on our strategy for framing generalizability. As suggested, we have emphasized the transferability of the measurement framework rather than population-level generalization, positioning our AI-enabled approach as a replicable model for objective outcome monitoring in other rare and severe disability contexts.

We believe these revisions have significantly strengthened the manuscript’s scientific contribution and technical rigor. Thank you once again for your valuable guidance.

Reviewer 5 Report

Comments and Suggestions for Authors

The authors developed an AI-based markerless motion-capture system to monitor performance and motor variability in elite Boccia athletes with severe cerebral palsy. Three athletes performed repeated throwing tasks over eight weeks, receiving real-time augmented feedback and being evaluated through expert performance scores. Longitudinal analysis showed improvements in intermediate-difficulty tasks and reduced variability in the most impaired athlete. The results demonstrate the feasibility of AI-based assistive technology for individualized outcome measurement in disability sports.

The authors should improve some aspects of their work before this reviewer recommends accepting the paper.
- In the abstract, the authors said "precise tracking," but I assume they meant accurate.
- In the text, the authors use the term "precise" several times. The authors should verify which cases they are referring to repeatability and which to accuracy.
- In paragraphs 116-127, I recommend that the authors extend their literature analysis by also considering recent reviews on the use of AI in robot-mediated physical rehabilitation systems 10.1145/3779302.
- The CP acronym, like all the others, should be defined only upon first occurrence.
- I suggest that the authors add an image collected during the acquisitions and of the skeleton reconstructed on the subject.
- This author is curious to understand why the authors chose to use a BiLSTM. This choice should be better justified. Furthermore, I wonder what the authors' ultimate goal is in this work. A BiLSTM network needs to have the entire time series available to make inferences. How can such a system therefore be applied? Do the authors not intend to propose a metric that can be used live or real-time during motion analysis, but only as an output from training analysis? The authors need to better explain these aspects.
- Is the label the network needs to learn the score given by the expert trainer? lines 334-335.
- The quality of Figure 3 isn't very good. I suggest increasing the quality and clarifying all the text. It should be made explicit in the text and caption.
- Figure 4 appears to be AI-generated... the authors should create original images for this work.

Author Response

Dear Reviewer,

Thank you very much for your detailed and constructive review. We are honored by your recognition of our work’s value for adaptive sports and appreciate the specific technical guidance provided to improve the manuscript. We have carefully addressed each of your points to enhance the clarity and scientific rigor of our study.

Regarding the terminology of "precise" versus "accurate," we appreciate your technical correction. We have reviewed the manuscript and updated the terminology: we now use "accurate" when referring to the system’s agreement with expert ratings and "precise" or "consistent" when discussing repeatability. This change has been reflected in the Abstract and throughout the Results section to ensure technical correctness.

In response to your suggestion to extend the literature analysis, we have incorporated a discussion of recent reviews on AI in robot-mediated physical rehabilitation. Specifically, in the Introduction (Section 1), we now cite the work related to intelligent rehabilitation frameworks [Reference 41; DOI 10.1145/3779302], which provides a broader context for AI-based assistive technologies in disability rehabilitation. This addition strengthens the theoretical link between sports-specific monitoring and clinical rehabilitation.

To address the consistency of acronyms, we have ensured that "Cerebral Palsy (CP)" is defined only upon its first occurrence in the Introduction. Subsequent mentions have been standardized to "CP" throughout the text.

We also appreciate your suggestion to include a raw acquisition image. We have updated Figure 2 to include a representative image of the actual measurement environment with the 3D skeleton reconstructed by the Azure Kinect SDK overlaying the athlete. This provides the necessary visual evidence of the system's tracking capabilities during real-world data collection.

Regarding the choice of the Bi-LSTM network and its real-time application, we have added a clarification in Section 2.5. The Bi-LSTM architecture was chosen because throwing a Boccia ball is a highly coordinated movement where the "release point" is influenced by both the preceding wind-up and the subsequent follow-through. While Bi-LSTM requires the entire sequence for inference, our pedagogical goal is to provide augmented feedback immediately after each trial rather than live feedback during the movement. This "post-trial" feedback allows the athlete to reflect on the holistic quality of the throw, making Bi-LSTM an ideal choice for modeling the temporal dependencies of the complete action.

We have also explicitly confirmed in Section 2.5 that the network’s labels are the consensus-based scores provided by the expert panel (head and assistant coaches). This ensures transparency regarding the supervised learning process.

Furthermore, we have improved the quality of Figure 3 by providing a higher-resolution version with clarified text for all architectural components. The caption has also been updated to explicitly state the source and adaptations made.

Finally, we sincerely apologize for the appearance of Figure 4. We have replaced it with a manually created original flowchart that accurately depicts the system's end-to-end workflow, from data loading to checkpoint saving. This ensures that all visual materials in the manuscript are original and professional in quality.

We believe these revisions address all your concerns and have significantly improved the technical clarity of our manuscript. Thank you again for your valuable time and expertise.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I’m satisfied with the responses I received in the first round of revisions. Now the work is more solid and publishable.

Author Response

Dear Reviewer,

We would like to express our sincere gratitude for your positive evaluation and for the encouraging comments regarding our first revision. Your insightful guidance during the initial stage was instrumental in enhancing the quality and solidity of this manuscript.

We are pleased to inform you that we have further refined the paper to address the collective feedback from the review panel. In this second revision, we have integrated rigorous technical validation metrics—including Mean Squared Error (MSE), Mean Absolute Error (MAE), and model convergence analysis—and substantiated our interpretation of "movement stabilization" with empirical kinematic variance data (specifically for Player 3). These additions were made to ensure that the AI system's role as a measurement tool is supported by objective engineering evidence, a direction that aligns with making the work "more solid and publishable" as you noted.

All modifications, including the newly added technical sections and updated figures, are highlighted in red in the revised manuscript. We truly appreciate your professional support throughout this process and hope that the final version meets your full satisfaction.

Best regards,

The Authors

Reviewer 2 Report

Comments and Suggestions for Authors

The authors adequately responded to the methodological comments, strengthened the statistical analysis, leaving some engineering issues until the next stage of their research.
Although the availability of error metrics would significantly enhance the work, the value of unique longitudinal data (694 throws by Paralympians) and learning curve analysis outweigh this disadvantage. Thus, the manuscript has been improved and can be recommended for publication.

Author Response

Dear Reviewer,

We sincerely appreciate your thoughtful review and your positive recommendation for the publication of our manuscript. We are particularly grateful for your recognition of the value inherent in our unique longitudinal dataset of 694 throws by elite Paralympians.

In our previous revision, we had focused primarily on pedagogical outcomes. However, taking your constructive suggestion to heart—that "the availability of error metrics would significantly enhance the work"—we have now fully integrated these technical engineering indicators into this second revision. We have included comprehensive validation metrics, such as Mean Squared Error (MSE) and Mean Absolute Error (MAE), along with the model’s training and validation loss history.

By adding these technical benchmarks, we have ensured that the manuscript no longer leaves "engineering issues until the next stage," but instead provides a complete, validated measurement framework within this study. These additions, along with further substantiation of our kinematic variability analysis, are highlighted in red throughout the text.

We believe these improvements, prompted by your insightful feedback, have made the work significantly more robust. Thank you once again for your professional guidance.

Best regards,

The Authors

Reviewer 3 Report

Comments and Suggestions for Authors

I must express serious concern about the fundamental disconnect between what your manuscript claims and what you have actually demonstrated. You are asking this journal to accept that your AI-based system works as a measurement tool based solely on trust, without providing the technical evidence required to substantiate that claim.


Your manuscript title explicitly states "Measuring Assistive Technology Outcomes via AI-Based Kinematic Modeling." Your abstract describes "an innovative outcome measurement approach." Your introduction frames the AI system as the assistive technology being evaluated. Yet when I requested the basic validation metrics required to demonstrate that this measurement tool actually measures what it claims to measure, you responded by reframing your study as pedagogical rather than engineering focused. This is intellectually inconsistent. You cannot claim the benefits of an AI-based measurement system in your title and framing while simultaneously arguing that validating that system is beyond the scope of your work.


The most troubling aspect of your revision is found in lines 358 to 362. You state that you "utilized Leave-One-Subject-Out (LOSO) cross-validation to mitigate data leakage risks" during model development, and you commit to publishing validation metrics "in a separate technical validation report." This response reveals that you performed the validation I requested but have chosen not to report the results in this manuscript. This is not a matter of scope or focus. If you trained a neural network, you necessarily generated training and validation metrics during that process. You are asking us to trust that these metrics support your claims without actually showing them to us. This is the definition of a trust-me-bro argument.


You cannot defer machine learning validation to future work when your current manuscript claims to measure outcomes using an AI-enabled system. The validation of a measurement tool must precede any claims about what that tool measures. This is not an optional nicety for engineering papers. It is a fundamental requirement for any study claiming to measure anything. Your lines 893 to 896 continue to frame proper validation as future work rather than current necessity. I explicitly criticized this deferral in my original review, yet you have not changed your approach. You have simply acknowledged the limitation while maintaining the same unsupported claims.


Your response letter states that because your primary objective is analyzing pedagogical outcomes, you "prioritized the analysis of human-expert scores." This statement fundamentally misunderstands the critique. The expert scores show that athletes changed over time. That is valuable information. But your title does not claim to measure outcomes via expert scoring. It claims to measure outcomes via AI-based kinematic modeling. For that claim to be credible, you must demonstrate that your AI model can accurately predict expert scores and generalize to unseen data. You have provided expert ratings. You have not provided evidence that your AI system can model those ratings with acceptable accuracy. These are two entirely different forms of evidence, and you have conflated them throughout your revision.


The contradiction is particularly stark when examining your defense strategy. When seeking publication, you emphasize the AI innovation and position the system as the assistive technology being evaluated. When facing technical criticism, you retreat to "this is really just about pedagogy and expert ratings." You cannot occupy both positions simultaneously. If the AI system is merely generating internal feedback that you do not analyze or validate, then remove it from your title and reframe your contribution accordingly. If the AI system is central to your measurement approach, then validate it properly before making measurement claims.


Your handling of the weak learning curves in Player 3 illustrates this same pattern of evasion rather than acknowledgment. In my original review, I noted that only one of seven player-task combinations showed a healthy learning pattern, and I specifically called out your post-hoc reinterpretation of P3's flat curves as stabilization rather than acknowledging the lack of improvement. Rather than address this criticism directly, you doubled down in your revision with elaborate theoretical justifications using Bernstein's degrees of freedom problem and introduced the concept of "silent signals" of learning. This is sophisticated rationalization, not empirical accountability. When an outcome measure shows no change, you cannot simply redefine that lack of change as a different kind of positive outcome without independent evidence. You would need to demonstrate through your kinematic data that meaningful stabilization occurred even when scores did not improve. You gesture toward this in your methods but provide no actual analysis showing that kinematic variance decreased in P3 in ways that would support your stabilization claims.


Your response letter does not explain why you failed to address my comments. It does not justify why validation metrics you apparently possess cannot be included in this manuscript. It does not acknowledge the contradiction between your measurement claims and your pedagogical defense. Instead, it employs a selective reframing strategy. You maintain the high-impact framing of AI-enabled measurement while avoiding the validation burden that framing requires. You acknowledge limitations in ways that suggest awareness of the problems while making no substantive changes to address them.


The hard technical work I requested is not beyond your capabilities. You state that you performed LOSO cross-validation. Report those results. Show the mean squared error on held-out subjects. Show the correlation between AI predictions and expert ratings. Show the training loss curves alongside validation loss curves across epochs. Demonstrate that your model learned something generalizable rather than simply memorizing training data. These are standard outputs from any neural network training process. If you performed the validation you claim to have performed, you have these results. If the results are poor, that would explain your reluctance to report them, but it would also explain why a manuscript claiming to measure outcomes via AI modeling cannot be accepted without that validation evidence.

Author Response

Dear Reviewer,

Thank you for your rigorous and intellectually challenging review. We fully acknowledge your concern regarding the mismatch between our measurement claims and the technical evidence previously provided. In this revision, we have substantially restructured the manuscript so that the technical validation evidence is presented in the main text and directly supports our measurement-related statements.

First, to ensure empirical accountability, we added Section 3.4 and Figure 5, where we report the results of a Leave-One-Subject-Out (LOSO) cross-validation of the Bi-LSTM scoring model. The model achieved an overall MSE of 1.14 and MAE of 1.13 across the longitudinal dataset. We also report that R² values were low (0.012–0.025), and we explicitly interpret this as a consequence of the restricted variance in expert ratings and the high inter-individual variability characteristic of severe CP, rather than presenting it as a strength. In addition, Figure 5 presents training and validation loss curves to transparently document model convergence and the practical challenges of generalization in a rare neurological cohort.

Second, regarding Player 3, we agree that our earlier description of “stabilization” required independent quantitative support. We therefore added Section 3.5 and Figure 9, where we analyze longitudinal changes in kinematic variability. The results show a measurable reduction in movement variability (e.g., an overall 4.69% reduction, with improvements of up to 16.95% in hand-related segments), suggesting motor refinement even when terminal expert scores remain relatively stable. Importantly, we present this evidence as an indicator of within-athlete change under severe impairment constraints, without overstating generalizability.

Across the manuscript, we revised the relevant sections to align the wording of our claims with the scope of the validation evidence and its limitations. All changes are highlighted in red in the revised version.

Best regards,

The Authors

Reviewer 4 Report

Comments and Suggestions for Authors

Thank you, Author, for improving the work. I have no further comments. In my opinion, the work no longer needs improvement. 

Author Response

Dear Reviewer,

We sincerely appreciate your positive evaluation and your encouraging comment that the manuscript no longer needs further improvement. We are very pleased to hear that our previous efforts to refine the work met your expectations.

In this final stage of revision, while taking into account the collective feedback from the review panel, we have further strengthened the manuscript's technical foundation to ensure the highest scientific quality. Specifically, we have integrated detailed engineering metrics, such as Mean Squared Error (MSE) and Mean Absolute Error (MAE), and included new visualizations (Figure 5 and Figure 9) to provide empirical evidence for the AI system's predictive reliability and the kinematic stabilization of Player 3.

These additional refinements, highlighted in red, were made to provide a complete and validated measurement framework within the study. We believe these changes have made the work even more robust and rigorous. Thank you once again for your professional support and for recommending our work for publication.

Best regards,

The Authors

Reviewer 5 Report

Comments and Suggestions for Authors

I have reviewed the revised version of the manuscript and the authors’ response to the reviewers’ comments. The authors have adequately addressed the points raised in my previous review and have implemented the suggested modifications.

In my opinion, the manuscript has been sufficiently improved and is now suitable for publication in Bioengineering.

Author Response

Dear Reviewer,

We are sincerely grateful for your positive feedback and for your recommendation that our manuscript is now suitable for publication in Bioengineering. Your guidance throughout the review process has been vital in enhancing the clarity and scientific quality of this study.

In this final revision, we have paid particular attention to your observation regarding the English language. To ensure that the research is expressed as clearly and precisely as possible, we have performed a comprehensive linguistic polish of the entire manuscript. Furthermore, following the collective feedback from the review panel, we have integrated additional technical evidence—including Mean Squared Error (MSE), model convergence analysis (Figure 5), and joint-specific kinematic variance reduction (Figure 9)—to provide a more robust and validated measurement framework.

We believe these final refinements have addressed all remaining concerns and have brought the manuscript to the high standards required for publication. All changes, including the linguistic improvements and new technical sections, are highlighted in red in the updated manuscript. Thank you once again for your professional support and insightful contributions to our work.

Best regards,

The Authors

Round 3

Reviewer 3 Report

Comments and Suggestions for Authors

You did not merely pay lip service to my critique; you fundamentally strengthened the manuscript's empirical foundation. This is an exemplary case of constructive peer review leading to meaningful scientific improvement.

Back to TopTop