Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessSystematic Review

Peer-Review Record

A Systematic Review of Wearable Sensors in Rett Syndrome—What Physiological Markers Are Informative for Monitoring Disease States?

Sensors 2025, 25(21), 6697; https://doi.org/10.3390/s25216697

by Jatinder Singh^1,2,3,*

, Georgina Wilkins^1,2,3,†

, Athina Manginas^1,2,3,†, Samiya Chishti^1,2,3, Federico Fiori^1,2,3

, Girish D. Sharma⁴

, Jay Shetty⁵

and Paramala Santosh^1,2,3

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Sensors 2025, 25(21), 6697; https://doi.org/10.3390/s25216697

Submission received: 7 August 2025 / Revised: 16 September 2025 / Accepted: 28 October 2025 / Published: 2 November 2025

(This article belongs to the Section Wearables)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

General comments

Overall, I believe the manuscript assembles a niche and timely literature set but presently mixes (i) descriptive review of small, heterogeneous wearable studies, (ii) assertive claims about “proxy biomarkers” of disease progression, and (iii) extended background/methodological/clinical guidance content that is not directly supported by the included RTT studies. The biggest issues I see are methodological transparency (search reproducibility, screening reliability, PRISMA completeness), internal consistency (Table 1 cohort details; device labeling), and evidentiary alignment between claims and included data (especially for “disease progression” and seizure-related inferences). I suggest the authors clarify eligibility, tighten the boundary between background vs. findings, define “proxy biomarker” explicitly, and verify all numerical tallies and table entries so that the synthesis rests on unambiguous, reproducible ground. Please find below my specific comments:

Specific comments

Title: The title promises identification of “relevant” physiological markers for monitoring disease progression. I feel that, as currently written, this claim overreaches what a 12-study synthesis with heterogeneous devices/designs can substantiate. Please either justify that “relevance” with explicit criteria used in the review, or temper the language to match the evidence actually shown (e.g., pilot/feasibility studies and small ML datasets).

Abstract

This section asserts that “sustained high EDA, %HRmax and HR/LF ratio emerged as proxy biomarkers for monitoring disease progression.” Please explain what operational definition you used for “proxy biomarker,” what level of evidence qualifies a metric as such in this review, and whether any thresholds/effect sizes were consistent across studies. As written, I believe this reads as a conclusion rather than a neutral summary of findings from small, heterogeneous studies.

Machine-learning claims (“enabled the prediction of different sleep patterns and clinical severity”) should include basic performance framing to avoid optimism bias (e.g., cross-validation vs. hold-out, class balance, and per-class metrics rather than overall accuracy alone). Please elaborate in the abstract or ensure these guardrails are clearly stated in the main text.

You report “Twelve (12) articles were included” from “226 records.” Please align these counts with the PRISMA flow’s numbers and provide reasons for full-text exclusions in the abstract only if journal style requires, but ensure exact tallies match the main text/figure.

Introduction

The problem framing around COAs being “unable to capture disease progression in real time” is strong language. Please elaborate on whether this is specific to RTT instruments (RSBQ/RTT-CGI) and provide citations or qualifiers already in the manuscript text where you make this assertion.

There is a pivot from general limitations of COAs/sleep diaries to the promise of wearables. I suggest that the authors clearly separate feasibility (tolerability/compliance) from validity (agreement with gold standards) and clinical utility (decision impact). Right now these concepts are blended, which makes it difficult to follow what the review is evaluating.

The Introduction sets up “disease progression” but later much of the evidence involves cross-sectional correlates or short recording windows (e.g., sleep staging, EDA spikes). Please explain how the review distinguishes progression markers from state markers.

Methods

PRISMA adherence is claimed, but I don’t see a mention of protocol registration (e.g., PROSPERO) or availability of a PRISMA 2020 checklist. Please clarify whether a protocol was registered a priori and whether a checklist is available.

“No date restriction or other filters were applied,” yet the Eligibility Criteria exclude non-English records. Please reconcile this contradiction by explicitly acknowledging language restrictions in the Search Strategy. As written, these statements conflict.

The search terms are given in brief. I recommend the authors provide database-specific strategies (full strings, field tags, truncation/wildcards, controlled vocabulary where applicable) and the exact search dates (you note “July 2025,” but not the day or range) so the search is reproducible. Please also explain how generic tokens like “Rett” were handled to avoid surname noise.

Screening: Rayyan use is noted, and “independent and blinded” screening is claimed, but there are no inter-rater agreement statistics (e.g., κ) or a description of conflict resolution beyond “consensus.” Please elaborate.

Study selection flow: You report 226 identified; 184 after deduplication; 37 screened as eligible; 10 met criteria; 2 via snowball = 12. Please provide counts/reasons for full-text exclusions (PRISMA asks for this) and verify that these numbers are consistent across Abstract, Results, and Figure 1.

Data extraction: Per Methods, extraction was done by one author then “reviewed” by two others. Please explain whether dual independent extraction was performed or whether spot-checks were used; this matters for risk of bias in the review process.

Quality appraisal: You use the JBI checklist, but it’s unclear whether you piloted the tool, how disagreements were handled, and whether any scoring/weighting informed synthesis. Please elaborate on how JBI judgments fed into the evidence synthesis rather than just being tabulated.

Results

Study characteristics & Table 1

Please verify the sex distribution reported for study [30]: “45 subjects (44 male and 1 female)” for a Rett cohort is highly atypical and suggests either a transcription error, inclusion of a non-RTT cohort, or a disorder mix-up. I believe this requires immediate verification because it affects the credibility of downstream summaries.

Device heterogeneity is high (Empatica E4, ActiWatch 2, ActivPAL, StepWatch, BioStamp, Hexoskin, LifeShirt, YouCare). Please explain how you handled between-device variability in sampling rate, sensor modality (PPG vs. ECG vs. RIP vs. EDA), and placement when synthesizing results—especially when computing proportions like “E4 used in 33% of studies” and when deriving claims about biomarkers.

Machine-learning studies: For the sleep-staging study ([27]) you report “85.1% accuracy” vs. PSG. Please add per-stage performance (e.g., Awake/NREM/REM sensitivity/specificity/F1) and clarify whether accuracy refers to epoch-level classification with balanced classes. Similarly, for severity classification ([36]), please report sample sizes per class, cross-validation schema, and whether any external validation existed. As presented, I worry readers may over-interpret single-study metrics.

Ethnicity: You state only 4/12 studies reported ethnicity. Please double-check that those are indeed [29, 34, 36, 37] and specify the denominators in Table 1 to avoid ambiguity.

Terminology is inconsistent across rows (e.g., “Bio-stamp®” vs. “BioStamp® nPoint”; “ActiGraph wGT3XBT”; “SAM”). Please standardize brand/trademark capitalization and device model names.

Several rows in Table 1 include interpretive statements (e.g., feasibility conclusions) that would be better constrained to objective outcomes. Please ensure each “Relevant Findings” entry is strictly derived from the study’s reported results and is not mixed with your interpretation.

Quality appraisal (Table 2)

The JBI table mixes “Yes/No/Unclear/N/A,” but there’s no summary of overall risk of bias across domains, nor is there an explanation of how these judgments influenced narrative weighting. Please explain how you translated item-level judgments into confidence in the aggregated findings.

Some judgments appear permissive (e.g., “No power calculation” deemed acceptable without downgrading), while others highlight underpowering. Please provide consistent criteria for what counts as “sufficient to power the study,” especially for pilot/exploratory designs.

Conflict-of-interest coding: Table 2 marks some studies as “No” for COI, but it’s unclear if this means “authors declared no conflicts” or “no COI statement reported.” Please clarify and align with PRISMA’s requirement to report funding/COI of included studies.

Subsection “Proxy Biomarkers of Disease Progression” (within Results)

This subsection contains interpretive language (e.g., “suggest,” “provide additional insights”) that reads like Discussion. Please clarify whether statements about “proxy biomarkers” are strictly findings from included studies or your synthesis/interpretation; if the latter, they should be clearly framed as such and supported with explicit criteria.

The linkage of EDA and HRV ratios to “disease progression” appears to be inferred from cross-sectional associations and short-term recordings. Please explain the longitudinal evidence (if any) that justifies the progression framing.

Discussion

There are extended excursions into topics (e.g., FDA-cleared seizure detection with Embrace; SUDEP precursors; SHAP/SMOTE methodological frameworks) that are only tangentially connected to the included RTT studies. I suggest that the authors clearly separate background context from inferences supported by the included dataset, and avoid implying that external findings generalize to RTT without confirmatory evidence in this population.

The claim that “sustained high EDA, HRmax% and HR/LF ratio could serve as proxy biomarkers” needs a firmer evidentiary basis here. Please define what constitutes “sustained,” specify time windows and thresholds where possible, and discuss measurement error/skin conductance artefacts (placement, sweat, ambient temperature)—especially given the heterogeneity of sensor platforms summarized earlier.

The paragraph on ethnicity/PPG accuracy raises an important equity question but mixes conflicting literature signals. Please be explicit about what in your included RTT studies speaks to this (you note only 33% reported ethnicity) versus what is extrapolated from general wearable validation literature. As written, the reader may conflate external evidence with findings of this review.

The seizure-related EDA discourse cites external evidence, but the current review’s included studies largely did not investigate seizures; one study had unclear EEG-HRV relations. Please center the Discussion on what your included RTT studies show, and mark external context as such.

Limitations

This section is present and helpful, but I feel several key items are missing: (i) absence of protocol registration; (ii) potential publication bias (no assessment described); (iii) language restriction; and (iv) lack of dual independent extraction with agreement statistics. Please elaborate so the Limitations are proportionate to the scope and methods actually used.

Conclusions

This section again presents EDA/HR metrics as proxy biomarkers. Given the small, heterogeneous evidence base described, please ensure the modal verbs reflect uncertainty (e.g., “may be associated with…”) and avoid implying clinical readiness without replication and external validation in RTT. The forward-looking mention of the VIBRANT study is fine, but please avoid presupposing outcomes.

Figures & Tables

Figure 1 (PRISMA): I recommend that the authors ensure full compliance with PRISMA 2020: show reasons for full-text exclusions with counts. The text states counts at each stage; the figure should mirror those exactly.

Table 1: Please check internal consistency (e.g., sex distribution in [30]; consistent device naming; consistent reporting of whether MECP2 was genetically confirmed). Also, the “Relevant Findings” column sometimes includes interpretive or feasibility language without clear linkage to pre-specified outcomes.

Table 2: Define each JBI item in a footnote and explain how “N/A” was adjudicated. Right now readers cannot tell why some criteria are “N/A” for certain designs, nor how that affects overall confidence.

Figure 2 and extensive clinical notes: Large portions read like clinical management guidance (pharmacologic options for sleep, agitation, anxiety; GERD management; seizure PRN commentary). I feel this goes beyond the scope of a systematic review of wearable sensors in RTT unless every medication statement is explicitly linked to included studies or to a separate, clearly demarcated clinical guidance section. Please explain the rationale for including these recommendations in a wearables review and ensure all such statements are appropriately sourced and limited to evidence within scope.

Other line-item clarifications I’d explicitly ask for

Please reconcile “No date restriction or other filters” with exclusion of non-English studies in Eligibility Criteria. As a reader, I can’t tell whether non-English records were searched then excluded at screening, or filtered out at the query level.

Please verify the [30] cohort sex distribution (44 male / 1 female). This appears inconsistent with RTT epidemiology and raises concern about study eligibility or a transcription error.

Please provide database-specific search strings (PubMed, PsycINFO, Embase, Web of Science), exact dates searched in July 2025, and any de-duplication method used before Rayyan import.

Please report inter-rater agreement for title/abstract and full-text screening, and describe conflict resolution beyond “consensus.”

Please define “proxy biomarker” a priori and state how a metric earned that label in this review (number of studies, effect consistency, validation level).

For the ML sleep and severity papers: please add per-class metrics, dataset sizes per class, cross-validation approach, and whether any external/test set validation was used. Accuracy alone is insufficient for readers to judge robustness.

Please clearly separate what is shown by the included RTT studies from broader background claims (e.g., seizure detection algorithms, SUDEP markers, SHAP/SMOTE methodological commentary). As written, the boundary is blurry.

Please explain how JBI appraisal influenced the narrative synthesis (any sensitivity analyses by excluding low-quality studies? any weighting?).

Please clarify whether any assessment of publication bias or small-study effects was attempted (even qualitatively), given the very small N per study.

Author Response

Please see attached responses for Reviewer 1

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper’s writing or contracture is all fine, but the more important thing is that lack of article samples; only 12 articles were selected to do the statistics analysis.

After the first round searching, 226 papers were found and finally only considering 12 papers were analysis, that is to say, so many similar review papers were writing in this research field, so, what is the significance of this paper’s purpose?
In the conclusion figure 2, there are 3 kinds of remote signals were tracked and for further analysis of exercise (normal body healthy), HRV (emotions tracking) and EDA (also for track emotions), that’s all right, but all auxiliary intervention measures, and without giving more technique or measurement analysis. It’s only considering the motivation and without valuable results discussion indeed.
Why the finally conclusion results showed other types of diseases, such as Seizures, GERD, et al? As shown in 2.2 Search Terms section, the first condition is “Rett Syndrome”, which is only focus on discussion on Rett Syndrome disease.
The discussion section description looks like a Related Work; need to highlight the goals.
There are many technique terms are only giving the abbreviations and didn’t provide the full names.

Author Response

Please see attached responses for Reviewer 2

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Submitted manuscript is interesting but it is not exactly fitting into the keywords of the Journal. The style of writing makes this impression even stronger. Manuscript is written as a narrow-oriented work for specialists working in practical psychiatry, clinical psychology, pediatrics or neurology. However, the journal is focused on the science and technology of sensors and therefore submitted works must be accessible to general Sensors audience and provide particular focus on sensor implementation or technology, explain in simple and comprehensive way all details of medical implementation of the used devices. For example, in the abstract one found the abbreviation PRISMA criteria, which might be corresponding to “Preferred Reporting Items for Systematic Reviews and Meta-Analyses”. Apart from general rule that all abbreviations must be defined at their first appearance, it is especially important in the case of specific abbreviations from the remote area.

Rett syndrome is a genetic disorder which typically becomes apparent after 6–18 months of age being manifested as impairments in language and coordination, etc. It is lethal in males, with very rare exceptions, i.e. the study belongs to the female patience almost exclusively. For justification of the selected age range, one also could mention that the life expectancy for RTT patients is middle age. There is not a single word about this in the abstract. Which contains part of the Introduction, exceeds usual lengths and has no clear formulation of the goal of the study related to the science and technology of sensors.

Introduction must discuss at list minimal description of the types of wearable sensors adaptable for the studies under consideration and formulation of general requirements for these sensors’ application. For example, group of Prof. K. Mohri proposed the usage of magnetic impedance sensor for monitoring the state of the sleeping patience or obtaining magnetocardiogram. Is this type of the sensor useful for evaluation of the state of RTT patients?

The design of the tables is non-convenient for the journal – remove the circle indentation at each sub-paragraph and use single spacing. Table 2 has very mixed design, number 1, 2, 3 etc. are not necessary and the circle indentation must be removed.

Conclusions should not include figures and discission part – move them to the discussion part. Again, discussion and conclusions must contain some elements of the analysis (at least mentioning) the types/requirements for the sensors adaptable to this field of medical activity.

Author Response

Please see attached responses for Reviewer 3

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Actually, as a review paper, more detailed or more scienctific analysis is more important than only opinion sumarization.

Author Response

Please find attached my responses to Reviewer 2.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Previous version of submitted manuscript was not exactly fitting into the keywords of the Journal. This is different from the statement “the current manuscript is beyond the scope of the journal Sensors”. Interdisciplinary work must be adjusted to the style of particular journal in order to be accessible for particular audience namely the audience focused on the science and technology of sensors and it is necessary to explain in simple and comprehensive way all details of medical implementation of the used devices. Yes, submitted revised version become better, however it is still necessary to provide minimum analysis of the types of wearable sensors under consideration (“However, the clinical impact of wearable sensors remains uncertain”).

Please add arguments to the text “While genetics provide supporting evidence for a diagnosis, RTT remains a clinical diagnosis based on consensus clinical criteria. Based on clinical experience and our published work in this area, we feel that the time epoch for a seemingly ‘typical’ period up to 6 months in RTT needs to be reframed.”

From my point of view this design of the tables is non-convenient for the readers. It might be that previous work was published using this format but journal is interested in the progress and increase of the journal audience. Convenient and more compact format helps readers saving journal space. Here is example how you can make the table cell more compact:

Mean ± SD age of individuals was 18.3±9.4 years (range: 4.7 to 35.5 years)
Individuals had a pathogenic MECP2 gene and had diagnostic criteria for typical RTT.

Mean ± SD age of individuals was 18.3±9.4 years (range: 4.7 to 35.5 years)

Individuals had a pathogenic MECP2 gene and had diagnostic criteria for typical RTT.

Author Response

Please find attached my responses to Reviewer 3.

Author Response File: Author Response.pdf

Article Menu

A Systematic Review of Wearable Sensors in Rett Syndrome—What Physiological Markers Are Informative for Monitoring Disease States?

Further Information

Guidelines

MDPI Initiatives

Follow MDPI