Proof-of-Concept of IMU-Based Detection of ICU-Relevant Agitation Motion Patterns in Healthy Volunteers
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe aim of the study was to evaluate methods for detecting agitated behavior that could be applied in real-world clinical settings, with a particular focus on the use of inertial measurement units (IMUs). The results of the study have practical implications for ensuring the safety of unconscious hospital patients. The authors modeled several movements that are observed in patients and lead to disruption of their life support: pulling out catheters, probes, unconscious movements, etc. The different phases of movement execution were analyzed using an IMU.
The study has a clearly structured design. Correct execution of all study stages allowed for reliable and valid results. The results of this pilot study show that attaching IMU sensors to the limbs and lumbar region enables realistic detection of pre-defined arousal-related movements.
The authors themselves identified limitations of the study, such as the small sample size. A more important limitation, in my opinion, is the standardized movement, which differs significantly in parameters from real movement. However, further research in this area could expand the capabilities of high-precision movement recognition.
The article is suitable for publication in the Bioengineering.
Author Response
Please see the attachment.
Author Response File:
Author Response.docx
Reviewer 2 Report
Comments and Suggestions for AuthorsI have the following concerns, which are offered for the authors’ consideration to guide substantive revisions.
-
The study is conducted exclusively on 15 healthy young volunteers performing simulated movements under controlled conditions; could the authors explicitly clarify how the trained CNN model is expected to generalize to ICU patients with delirium or agitation, whose movement characteristics, sensor stability, and noise conditions are fundamentally different, and whether any validation strategy or discussion addressing this domain shift can be added?
-
Each 8s trial is segmented into eight non-overlapping 1s windows that are treated as independent samples during training and evaluation; could the authors justify this assumption of independence and clarify whether correlations between adjacent windows from the same participant and trial may bias performance estimates, particularly under the leave-one-subject-out cross-validation framework?
-
Several key implementation details required for reproducibility are insufficiently specified. Could the authors provide a precise description of signal preprocessing and normalization, optimizer type, learning rate, batch size, loss function, regularisation techniques, and any class balancing strategy used during CNN training?
-
The manuscript reports results from a single representative model selected as the median accuracy model among 135 experiments; could the authors clarify why this selection is methodologically preferable to reporting aggregated subject-level performance statistics or confidence intervals across all experiments, and discuss how sensitive the conclusions are to this choice?
-
The system relies on seven IMU sensors and proprietary sensor fusion outputs from the BNO055 device. Could the authors quantitatively analyse the contribution of individual sensor locations and signal modalities to classification performance, and discuss how reliance on a closed-source fusion algorithm affects interpretability, reproducibility, and potential clinical deployment?
Author Response
Please see the attachment.
Author Response File:
Author Response.docx
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript presents a proof-of-concept system for detecting agitation-related movements relevant to ICU adverse events using multi-site IMU sensors and a CNN-based classifier trained on data from healthy volunteers. While the general idea of leveraging wearable inertial sensing to monitor patient movement in critical care settings is clinically relevant, the current study suffers from fundamental limitations in experimental design, validation strategy, and interpretive scope that significantly undermine the strength and reliability of its conclusions.
***Major Concerns***
- Most critically, the entire evaluation is performed on a small cohort of healthy participants executing scripted motions under controlled laboratory conditions. These motions differ substantially from real ICU agitation, which is often subtle, irregular, medication-modulated, and confounded by restraints, tubes, and sensor displacement. As a result, the presented data cannot meaningfully support claims about ICU disturbance detection beyond a highly preliminary feasibility demonstration. Even as a proof-of-concept, the manuscript does not adequately delimit its scope, and several statements implicitly overreach toward clinical applicability and robustness.
- From a methodological perspective, the segmentation strategy introduces serious concerns. Each 8-second trial is divided into eight 1-second windows that are treated as independent samples, despite their strong temporal correlation. Although leave-one-subject-out cross-validation is used, this window-level treatment risks inflating apparent performance by allowing the model to learn protocol-specific temporal structure rather than clinically realistic agitation patterns. This issue is not sufficiently acknowledged or mitigated, and no trial-level or event-level performance analysis is provided.
- The model evaluation and reporting strategy further weakens interpretability. The selection of a single “representative” model (the 68th out of 135 runs) is unconventional and obscures the true variability of model performance. While the authors state that this avoids cherry-picking, it does not replace standard reporting of aggregate results (e.g., distribution of fold-wise metrics). The confusion matrix shown appears to be based on a very limited test set, yet the manuscript does not clearly specify the number of windows or trials contributing to it, making it difficult to assess statistical reliability. Claims such as “absence of severe overfitting” are not convincingly supported given the data dependence structure and limited sample size.
- Several presentation and reporting issues also detract from the manuscript’s rigor. Figures and tables contain inconsistencies (e.g., duplicated captions, mismatched statistical notation), and the discussion of performance metrics does not sufficiently consider class imbalance or the clinical cost of false positives and false negatives in an ICU alarm context. While the citation set is broadly appropriate, the manuscript lacks a clear and critical comparison to existing ICU agitation or delirium monitoring approaches, leaving the reader uncertain about the true incremental contribution.
- The current manuscript does not provide sufficiently rigorous experimental validation, realistic modeling assumptions, or transparent performance reporting to justify publication, even as a proof-of-concept. Addressing these issues would require substantial redesign of the evaluation framework, clearer limitation of claims, and stronger justification of methodological choices.
Author Response
Please see the attachment.
Author Response File:
Author Response.docx
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI am willing to accept the paper in its current form.
