Next Article in Journal
Short-Term Sulfurous Balneotherapy and Self-Reported Sleep Quality: An Exploratory Retrospective Real-World Pre–Post Observational Study at Terme di Saturnia (Italy)
Previous Article in Journal
Global Research Trends and Healthcare Innovations in Plantar Pressure Management for Diabetic Foot Ulcers: A 25-Year Bibliometric and Visual Analysis
Previous Article in Special Issue
Proof-of-Concept Machine Learning Framework for Arboviral Disease Classification Using Literature-Derived Synthetic Data: Methodological Development Preceding Clinical Validation
 
 
Article
Peer-Review Record

Prediction of Adherence to an Online Wellness Program for People with Mobility Limitations: A Machine Learning Approach

Healthcare 2026, 14(6), 781; https://doi.org/10.3390/healthcare14060781
by Salma Aly 1,*, Hui-Ju Young 2,3,4, James H. Rimmer 2,3,4,5 and Tapan Mehta 1,2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Healthcare 2026, 14(6), 781; https://doi.org/10.3390/healthcare14060781
Submission received: 2 February 2026 / Revised: 15 March 2026 / Accepted: 18 March 2026 / Published: 19 March 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This article aims to predict the level of participation in an online wellness program for individuals with mobility limitations using machine learning models. Data from 1,218 participants were used, and 13 regression algorithms were compared. The study addresses a current clinical problem, the dataset is relatively large, and the model interpretability (SHAP + synergy analysis) is present. However, significant improvements are needed in terms of methodological transparency, model performance interpretations, and claims of scientific contribution.

 

  1. Add cross-validation results.
  2. Provide confidence intervals.
  3. Perform feature importance stability analysis.
  4. Add external validation.
  5. Revise the model selection criteria.
  6. Add temporal usage data.

Author Response

* Comment 1: Add cross-validation results.

Thank you for this helpful suggestion. We have now implemented 5-fold cross-validation to evaluate the performance of the machine learning models.

The reported performance metrics represent the mean values obtained across the five validation folds. The description of the cross-validation (CV) procedure has been added to the Methods section (Section 2.7: Model Validation, lines 207-213). In addition, the values presented in Table 4 (line 266) have been updated from single estimates to the mean values obtained from the 5-fold CV procedure.

* Comment 2: Provide confidence intervals.

Thank you for referring to this point. We now report 95% confidence intervals (CI) for all performance metrics. The confidence intervals were calculated based on the variability of the metrics across the five cross-validation folds.

Accordingly, the performance metrics presented in the Results section in Table 4 (line 266) now include mean values with their corresponding 95% CI.

* Comment 3: Perform feature importance stability analysis.

Thank you for this suggestion. We have now included a feature importance stability analysis in the revised manuscript. A description of the method has been added to the Methods section under “2.9. Feature Importance Stability” (lines 223–229). The results of this analysis are summarized in Table 5 (line 277), which presents the top predictors ranked by their mean coefficients across the 5-fold cross-validation procedure and their variability across folds.

* Comment 4: Add external validation.

Thank you for this important suggestion. External validation using an independent dataset would indeed provide additional evidence regarding the generalizability of the proposed model. However, an independent dataset suitable for external validation is not currently available within the scope of the present study. The MENTOR program is ongoing, and additional patient cohorts will continue to be enrolled beyond the current study period. Once these future cohorts become available, we plan to evaluate the performance of the developed model using these newly collected data, which will allow temporal external validation.

Furthermore, future work will also explore incorporating additional predictors to improve model performance and generalizability. This point has been clarified in the Discussion section (lines 420-436).

* Comment 5: Revise the model selection criteria.

Thank you for this comment. We have revised the manuscript to clarify the model selection criteria. Specifically, we now explicitly state that the best-performing model was selected based on the lowest mean absolute error (MAE) obtained from the cross-validation procedure. We also added a brief justification for using MAE as the primary model selection metric. These clarifications have been incorporated in the Methods section under “2.8. Evaluation Criteria” (lines 214 - 222).

* Comment 6: Add temporal usage data.

Thank you for this insightful suggestion. The prediction models in the present study were intentionally designed using baseline demographic, social, and psychosocial characteristics collected prior to the start of the intervention in order to enable early identification of participants who may be at risk of low adherence before program participation begins. However, we agree that incorporating temporal usage information may further improve predictive performance.

To address this point, we have added a statement in the Discussion (lines 420–425) outlining our planned follow-up study. Specifically, once additional data from the ongoing MENTOR program become available, we will extend the current modeling framework to incorporate longitudinal usage variables (e.g., weekly attendance) to evaluate whether combining baseline characteristics with temporal engagement patterns improves adherence prediction.

* Overall recommendations: English proofread and quality of figures

Thank you for this valuable comment. The manuscript has been carefully revised to improve clarity and language throughout. The entire document was reviewed and proofread by native English-speaking researcher to ensure grammatical accuracy, readability, and overall language quality.

In addition, the figures were regenerated using Python with improved visualization settings to enhance clarity and resolution. Specifically, the plots were exported using higher-resolution settings (≥600 DPI), larger figure dimensions, and improved font scaling.

 

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript addresses a relevant and timely topic: predicting adherence to a telewellness program among people with mobility limitations using machine learning. The study leverages a relatively large real-world dataset and combines predictive modeling with SHAP-based interpretability, enhancing its clinical relevance. The article is generally well-structured, and the results are clearly presented. The work has practical implications for personalized engagement strategies.

However, several points should be clarified to strengthen methodological rigor and interpretability.

First, the predictive performance is modest (R² ≈ 0.12). While acknowledging this, the framework should emphasize that baseline-only variables can inherently limit predictive accuracy and that the model is better suited to risk stratification than precise prediction.

Second, more details on preprocessing are needed to rule out potential data losses. Please clarify whether imputation, coding, and feature selection were performed only within the training folds. Additionally, using a single 80/20 split could produce unstable estimates; cross-validation or repeated splits would improve robustness.

Third, the rationale behind feature selection and dimensionality reduction should be better justified, as the reduction in predictors appears minimal. Reporting hyperparameter ranges, proportions of missing data, and confidence intervals for performance metrics would also improve transparency.

Fourth, the interpretation of sensitive predictors such as race and socioeconomic indicators should be more carefully discussed, incorporating ethical and equity considerations when implementing such models.

Finally, the data (particularly the SHAP plots) could be clearer, and the manuscript would benefit from a light English revision for grammar and conciseness.

Overall, this is a solid and clinically meaningful study that would benefit from minor methodological clarifications and presentation improvements.

Comments on the Quality of English Language

The manuscript is generally understandable, but the English would benefit from minor editing for grammar, clarity, and conciseness. Several sentences are overly long or awkwardly phrased, and there are occasional grammatical inconsistencies and typographical errors. A careful language revision or professional proofreading is recommended to improve readability and precision.

Author Response

* Comment 1: the predictive performance is modest (R² ≈ 0.12). While acknowledging this, the framework should emphasize that baseline-only variables can inherently limit predictive accuracy and that the model is better suited to risk stratification than precise prediction.

Thank you for this helpful suggestion. We agree that models relying solely on baseline characteristics may inherently limit predictive accuracy because adherence behaviors are influenced by multiple dynamic factors during program participation. To clarify this interpretation, we added a sentence in the Discussion emphasizing that the current framework is helping in risk stratification—identifying participants who may be at increased risk of low adherence—rather than precise individual-level prediction (lines 333-337).

* Comment 2: more details on preprocessing are needed to rule out potential data losses. Please clarify whether imputation, coding, and feature selection were performed only within the training folds. Additionally, using a single 80/20 split could produce unstable estimates; cross-validation or repeated splits would improve robustness.

Thank you for this helpful comment. We have clarified the preprocessing workflow in the Methods section. Specifically, the dataset was first divided into training and test sets, and all preprocessing steps—including imputation, encoding, and feature selection—were performed using the training data only and subsequently applied to the test data to prevent potential information leakage (Section 2.4. Data preprocessing, lines 168-171). In addition, model performance was evaluated using a 5-fold cross-validation procedure, which provides more robust performance estimates than a single train–test split. These clarifications have been incorporated in the Methods section (Section 2.7: Model Validation, lines 207-213).

* Comment 3: the rationale behind feature selection and dimensionality reduction should be better justified, as the reduction in predictors appears minimal. Reporting hyperparameter ranges, proportions of missing data, and confidence intervals for performance metrics would also improve transparency.

Thank you for this helpful suggestion. We expanded the Methods section to clarify the rationale for feature selection and dimensionality reduction, noting that these procedures were implemented to reduce redundancy among predictors, mitigate potential multicollinearity, and improve model stability and interpretability, even though the numerical reduction in predictors was modest (Methods section 2.5. Feature Selection and Dimensionality Reduction: lines 179-183).

We also clarified the model optimization procedure. Hyperparameters were tuned using grid search combined with 5-fold cross-validation for applicable models. The results of these analyses are summarized in Table 4 (Line 266), where model performance metrics represent the mean values and 95% confidence intervals obtained across the cross-validation folds. These revisions have been incorporated in the Methods and Results sections (Methods section 2.6. Regression Models, lines 197–206, Results: Table 4, line 266).

In addition, the proportion of missing data across variables has now been reported in the Methods section (Section 2.4. Data preprocessing, lines 162–166).

* Comment 4: the interpretation of sensitive predictors such as race and socioeconomic indicators should be more carefully discussed, incorporating ethical and equity considerations when implementing such models.

Thank you for this important suggestion. We agree that predictors such as race and socioeconomic indicators require careful interpretation, particularly when used in ML models. To address this point, we have expanded the Discussion section to acknowledge the ethical considerations associated with using sensitive sociodemographic variables. The revised text clarifies that these predictors were interpreted primarily as indicators of structural barriers that may influence engagement with digital interventions rather than intrinsic determinants of adherence. These revisions have been added to the Discussion section (page 13, lines 358–370).

* Comment 5: the data (particularly the SHAP plots) could be clearer, and the manuscript would benefit from a light English revision for grammar and conciseness.

Thank you for this valuable comment. The manuscript has been carefully revised to improve clarity and language throughout. The entire document was reviewed and proofread by native English-speaking researcher to ensure grammatical accuracy, readability, and overall language quality.

In addition, the figures were regenerated using Python with improved visualization settings to enhance clarity and resolution. Specifically, the plots were exported using higher-resolution settings (≥600 DPI), larger figure dimensions, and improved font scaling.

Back to TopTop