Mapping Mental Trajectories to Physical Risk: An AI Framework for Predicting Sarcopenia from Dynamic Depression Patterns in Public Health
Round 1
Reviewer 1 Report
Comments and Suggestions for Authorscomments:
1) nice and well presented manuscript with organised figures and diagrams
2) in abstarct: provide a better terminology of dynamic depression trajectories
3)were there any other studies that assessed other AI or ML enabled models to assess the risk to sarcopenia
4) provide references for sarcopenia assessment text
5) provide a figure to be more easy for the reader to undertand the methodology
6) excellent presenantion of results and data - nice heatmaps and bar charts
7) extend the part of the clinical implications of your model- systme with cooperation between psychologists etc. and dieticians etc.
Author Response
1、Comments 1: nice and well presented manuscript with organised figures and diagrams
Response 1: We sincerely thank the reviewer for their positive and encouraging comments on our manuscript. We are delighted that they found the manuscript well-presented and the figures well-organized. Their favorable feedback is highly encouraging.
2、Comments 2: in abstarct: provide a better terminology of dynamic depression trajectories.
Response 2: We sincerely thank the reviewer for this insightful suggestion. We agree that a more precise terminology would enhance the clarity and academic rigor of our abstract. As suggested, we have replaced "dynamic depression trajectories" with the more standard and descriptive term "longitudinal depressive symptom trajectories" throughout the abstract (and have ensured consistency in the main text where appropriate). We believe this change more accurately reflects the methodological approach of Group-Based Trajectory Modeling and the nature of our findings.
3、Comments 3: were there any other studies that assessed other AI or ML enabled models to assess the risk to sarcopenia
Response 3: We thank the reviewer for this insightful comment. We have now revised the manuscript to discuss existing AI/ML models for sarcopenia risk prediction. A new paragraph has been added to the Introduction (Page 2) and a comparative discussion has been incorporated into the Discussion section (Page 18) to better situate our work within the current research landscape and highlight our methodological innovation of using longitudinal depression trajectories.
4、Comments 4: provide references for sarcopenia assessment text
Response 4: We thank the reviewer for pointing this out. We have now added the recommended reference to the Asian Working Group for Sarcopenia (AWGS) criteria in the 'Sarcopenia Assessment' subsection (Page 3) to properly cite the foundational definition used in our study.
5、Comments 5: provide a figure to be more easy for the reader to undertand the methodology
Response 5: We thank the reviewer for the suggestion to enhance methodological clarity. We have now created a graphical abstract that visually summarizes our entire AI framework, from data processing through trajectory analysis and model development to the final risk prediction output. We believe this addition makes our study design significantly more accessible to readers.
6、Comments 6: excellent presenantion of results and data - nice heatmaps and bar charts
Response 6: Thank you to the reviewers for your encouragement!!!
7、Comments 7: extend the part of the clinical implications of your model- systme with cooperation between psychologists etc. and dieticians etc.
Response 7: We sincerely thank the reviewer for this insightful and constructive suggestion. We fully agree that elucidating the practical, multidisciplinary clinical pathway is crucial for translating our AI model's predictive capability into tangible health outcomes. In direct response to this valuable comment, we have now substantially expanded the Discussion section to explicitly outline a proposed multidisciplinary intervention framework.
Specifically, we have added a new paragraph in the Discussion that details how the identification of high-risk individuals by our model can trigger a collaborative care process. This new text explicitly describes the distinct yet synergistic roles of psychologists (addressing the core depressive symptoms), nutritionists (providing targeted nutritional support for muscle health, e.g., protein and vitamin D), and exercise physiologists (prescribing tailored exercise regimens).
Reviewer 2 Report
Comments and Suggestions for AuthorsIn this paper, the authors evaluate seven machine learning algorithms for the development of predictive models for sarcopenia risk based on dynamic depression trajectories.
The study is well detailed and well discussed in the body of the paper, the theoretical framework is sufficient and adequate. Considering the presentation of the results and their contribution to a pattern recognition model for predicting sarcopenia, I recommend its publication after corrections.
1- The conclusion is too superficial, considering the results presented, and should be rewritten.
a) The conclusion should include which method(s) are most suitable for predicting sarcopenia;
b) Include a comparison of the results obtained with results from other similar studies;
c) Include proposals for future work.
d) The best accuracy was 0.8265, is this result good enough? It should be justified in comparison with results from similar studies.
Author Response
1、Comments 1: The conclusion should include which method(s) are most suitable for predicting sarcopenia;
Response 1: We agree with the reviewer. We have now explicitly stated in the first paragraph of the Conclusion that Random Forest and XGBoost are the most suitable and robust methods for this prediction task, highlighting their superior and stable performance.
2、Comments 2 and 4: Include a comparison of the results obtained with results from other similar studies; The best accuracy was 0.8265, is this result good enough? It should be justified in comparison with results from similar studies.
Response 2 and 4: Thank you for these critical points. We have now added a detailed comparison with three key previous studies.
The predictive accuracy of our best model (RF, 0.8265) is highly competitive with the current state-of-the-art. For instance, Kim et al. 49reported a top test accuracy of 0.848 using physical factors, while Seok et al. 50achieved 78.8% accuracy with socioeconomic data. Although Ozgur et al.51 reported higher accuracies (RF: 89.4%), their model was developed on a selective sample of female participants from a university hospital, which may limit its generalizability to community-dwelling populations and both genders. In contrast, our model achieved its robust performance on a large, nationally representative cohort of both men and women, using a novel and dynamic predictor, longitudinal depression trajectories. This approach not only delivers competitive accuracy but also captures the temporal evolution of a core, modifiable risk factor, offering a more holistic and clinically informative tool for population-level screening than static assessments.
3、Comments 3: Include proposals for future work.
Response 3: We have added a new paragraph at the end of the Conclusion section outlining specific proposals for future work.
Reviewer 3 Report
Comments and Suggestions for Authors
The paper is written in high language quality, in a formal academic tone, suitable for research papers. The writing style is concise, and only some minor punctuation issues arise, or even more - some reduction of overlongish sentences might be necessary especially those connected to the description of methodology. The methodology of tje use of AI techniques, or machine learning models, trajectory modeling of depressive symptoms, and sarcopenia risk prediction is ell suited with the other papers available in the field, with frameworks to capture evolution in trends in public health-connected calculations, though dynamic depression trajectories predicting sarcopenia is a novel idea. Since standard machine learning techniques are known, the contribution of the paper is within a new approach to combine depression trajectory analysis with sarcopenia prediction.
In tghe introductory part of the paper, the author give a broad overview f the current sarcopenia and depression research outcomes, stressing the need for prediction to maintain public health above the baseline condition. Multiple factors are taken into account in predictive model, to support an early identification of high-risk individuals. Recent advances in machine learning applied to this issue are recognized by the authors, but a gap in integration of mental health with sarcopenia risk is well identified.
Thus, the core contribution is definitely novel, and promising. The authors not only develop the prediction system but also connect the symptomes of depression ith their predictor. The statistical and model validation parts are based on a typical machine learning-related benchmarks, in the form of 10-fold cross-validation, sensitivity to class imbalance, and hypothesis testing against random chance. The improvement is learly visible over the conventional approaches, giving a nice insight into the connection between temporal depression patterns with the risk mentioned above. of course the question is whether the authors can provide any explainability text connected to their method, with support of the medical doctors.
As a result, the F1 score is above 81%, what proves new patterns can be captured effectively. This states the paper is innovative, with rigorous benchmarks and validations included. is there any chance that selection of individuals gives any bias in the reserach? Do you think you could provide the same results when data is imbalanced, and what overfitting could appear in this case? And in the future runs, is it possible to enable data sharing and model generalization, avoiding imbalance, via federated learning and privacy-preserving AI techniques? Do you plan to allow external evaluation of your result like in clinical tests?
Author Response
1、Comment 1: The paper is written in high language quality... some reduction of overlongish sentences might be necessary especially those connected to the description of methodology.
Response 1: We thank the reviewer for this suggestion. We have carefully reviewed the manuscript, particularly the Methods section (2.6 and 2.7), and have simplified several overly long sentences to improve readability and flow. For example, we broke down complex sentences describing the trajectory modeling and model evaluation procedures into shorter, more direct statements.
2、Comment 2: ...the question is whether the authors can provide any explainability text connected to their method, with support of the medical doctors.
Response 2: We agree that enhancing the clinical explainability is crucial. We have significantly strengthened the Discussion section to better translate our AI findings into clinical insights:
-
We now explicitly state that our findings mandate viewing persistent depression as an independent risk factor for sarcopenia, urging proactive muscle health screening for such individuals in the discussion section.
-
We have expanded the description of the multidisciplinary care approach, specifying the roles of psychologists, dietitians, and physiotherapists, and linking these interventions directly to the pathways identified by our model in the discussion section.
3、 Comment 3: ...is there any chance that selection of individuals gives any bias in the research? Do you think you could provide the same results when data is imbalanced, and what overfitting could appear in this case?
Response 3: We appreciate the reviewer's focus on these critical methodological aspects.
Selection Bias: We have added a new paragraph in the Limitations section to explicitly discuss the potential for selection bias due to our complete-case requirement, its likely direction (underestimating risk in the frailest), and the need for external validation in more inclusive cohorts.
Class Imbalance & Overfitting: We have strengthened the Methods section to more clearly articulate our robust strategy. We emphasize that SMOTETomek was applied only to the training data within the cross-validation folds, and that the held-out test set remained untouched. This, combined with the inherent robustness of tree-based ensembles, ensures that our reported performance is a reliable, generalizable estimate and that overfitting was effectively mitigated.
4、 Comment 4: ...is it possible to enable data sharing and model generalization, avoiding imbalance, via federated learning and privacy-preserving AI techniques? Do you plan to allow external evaluation of your result like in clinical tests?
Response 4: We thank the reviewer for these excellent and forward-looking suggestions. We have fully incorporated them into the "Future Work" section of our Conclusions.
We now explicitly state that external validation in independent and clinical cohorts is our next priority.
We also outline plans to explore federated learning as a key strategy to build more generalizable models across institutions while preserving data privacy and inherently addressing site-specific data imbalances.
Reviewer 4 Report
Comments and Suggestions for AuthorsI congratulate the authors on their study, which, with the necessary additions proposed below, could make an important contribution to research.
1. Could the authors provide more details on multiple imputation by indicating which algorithm, the variables included in the imputation model, the number of iterations, diagnostics, and sensitivity analysis comparing the results of complete cases?
2. Could the authors add external validation or at least a robust internal-external validation strategy, in addition to calibration metrics, decision curve analysis, or net benefit metrics, and report the confidence interval for all primary metrics?
3. The study proposes a 3-class GBTM solution. How sensitive are the downstream ML results to the choice of 2, 4, or 5 classes or to the use of continuous trajectory features instead of discrete classes? In this regard, I suggest integrating the Methods section, trajectory analysis with the study doi:10.3390/computers14090344 because it describes how hybrid modelling choices affect downstream predictive tasks and robustness.
4. Could the authors acknowledge and quantify the potential bias introduced by ASM estimation from anthropometry compared to DXA?
5. Could the authors elaborate on why GBTM was chosen over alternative longitudinal approaches and present sensitivity analyses?
6. The authors should clarify that the importance of characteristics (from RF/XGBoost) indicates a predictive association, not causality.
7. Update references to recent studies to give the study more scientific robustness.
Author Response
1、Comment 1: Could the authors provide more details on multiple imputation by indicating which algorithm, the variables included in the imputation model, the number of iterations, diagnostics, and sensitivity analysis comparing the results of complete cases?
Response 1: We sincerely thank the reviewer for this valuable comment. In response, we have now provided a comprehensive description of the multiple imputation procedure in Section 2.7 of the revised manuscript. We conducted a sensitivity analysis comparing the baseline characteristics of the included participants (n=6,125) with those excluded due to missing data (n=5,481). The results are presented in Supplementary Table S1.
2、Comment 2: Could the authors add external validation or at least a robust internal-external validation strategy, in addition to calibration metrics, decision curve analysis, or net benefit metrics, and report the confidence interval for all primary metrics?
Response 2:
We sincerely thank the reviewer for this insightful and constructive suggestion, which has significantly strengthened the validity and clinical relevance of our study. In direct response to this comment, we have implemented the following major revisions:
-
Implementation of Decision Curve Analysis (DCA) and Net Benefit Metrics: As suggested, we have now conducted a comprehensive Decision Curve Analysis (DCA) to evaluate the clinical utility of our predictive models beyond conventional performance metrics. A new paragraph detailing the DCA results has been added to the Results section (Section 3.3), and the corresponding figures are included in the Supplementary Materials (Supplementary Figures S3-S13). The analysis demonstrates that our top-performing models (Random Forest and XGBoost) provide a superior net benefit compared to the "Treat All" or "Treat None" strategies across a wide range of clinically reasonable threshold probabilities. This confirms the practical value of our models for clinical decision-making. We have also integrated the interpretation of these findings into the Discussion section to highlight their clinical implications.
-
Reporting of Confidence Intervals for All Primary Metrics: In strict adherence to the reviewer's request, we have now reported the 95% confidence intervals for all primary performance metrics (Accuracy and Weighted F1-score) on the independent test set. These are now clearly presented in the revised Table 4, providing a precise estimate of the uncertainty associated with our model performance (e.g., Random Forest Accuracy: 0.8745 ± 0.0105).
-
Addressing External Validation and Future Work: We fully acknowledge the importance of external validation for establishing model generalizability. In the revised Conclusions section (Section 5), we have explicitly stated that "external validation in independent and diverse populations... is essential to confirm the generalizability and clinical utility of our model" and have framed this as a paramount direction for our future research. We also wish to clarify that while a temporal split for internal-external validation was considered, the core predictor in our study—the longitudinal depressive symptom trajectories—requires data from all four waves (2011-2018) for its construction, making a strict hold-out of the final wave methodologically challenging for this specific analysis. Nevertheless, we are confident that our rigorous internal validation protocol—including an 80/20 hold-out test set, 10-fold cross-validation, and the now-reported confidence intervals—provides a robust and reliable assessment of model performance.
We believe that these comprehensive revisions have thoroughly addressed the reviewer's valuable suggestions and have substantially enhanced the methodological rigor and translational potential of our work.
3、Comment 3: The study proposes a 3-class GBTM solution. How sensitive are the downstream ML results to the choice of 2, 4, or 5 classes or to the use of continuous trajectory features instead of discrete classes? In this regard, I suggest integrating the Methods section, trajectory analysis with the study doi:10.3390/computers14090344 because it describes how hybrid modelling choices affect downstream predictive tasks and robustness.
Response 3: We thank the reviewer for this insightful comment regarding the sensitivity of our machine learning results to the GBTM class specification. To address this, we conducted a sensitivity analysis comparing 2, 4, and 5 class models. The results confirmed that the predictive performance was robust across different class numbers. However, the 3 class solution was consistently selected as it provides the optimal balance between statistical fit and, most importantly, clinical interpretability and utility, yielding distinct and actionable trajectory groups ('Persistently Low', 'Moderate', and 'High'). This principle of aligning model complexity with clinical actionability is well-supported in hybrid modelling literature(doi:10.3390/computers14090344). For transparency, the trajectories and model performance for the 5 class solution are provided in the Supplementary Materials (Figures S1-S2) for reference.
4、Comment 4: Could the authors acknowledge and quantify the potential bias introduced by ASM estimation from anthropometry compared to DXA?
Response 4: We thank the reviewer for this important comment. We have now explicitly acknowledged this limitation in the Discussion section (Limitations). We state that the use of anthropometric equations, instead of DXA, may introduce non-differential misclassification bias. We further clarify that such bias would likely lead to an underestimation of both the association between depression trajectories and sarcopenia and the performance of our predictive models. Therefore, this limitation does not invalidate our findings but suggests that the true relationships may be even stronger than reported.
5、Comment 5: Could the authors elaborate on why GBTM was chosen over alternative longitudinal approaches and present sensitivity analyses?
Response 5: We thank the reviewer for raising this important methodological point. In our study, the term "GBTM" refers to its specific implementation via the Latent Class Growth Model (LCGM), which is a standard practice in the field. We have now clarified this in the revised Methods section (2.6. Trajectory Analysis).
We further elaborate that we chose the parsimonious LCGM for its clinical interpretability. Crucially, we present a sensitivity analysis (now in Supplementary Table S1) where we directly compared the LCGM with a more flexible alternative, the Growth Mixture Model (GMM). The results showed that the GMM did not provide a better fit for our data (identical SABIC value for the 3-class model), thus supporting our use of the simpler LCGM. We believe this directly addresses the reviewer's concern by demonstrating a reasoned methodological choice and providing the requested sensitivity analysis.
6、Comment 6: The authors should clarify that the importance of characteristics (from RF/XGBoost) indicates a predictive association, not causality.
Response 6: We thank the reviewer for this crucial comment. We fully agree that it is essential to distinguish between predictive association and causality when interpreting machine learning models. As requested, we have now clarified this point in the manuscript to avoid any potential misinterpretation.
We have added a statement in the Results section (3.4. Feature Importance) explicitly noting that the feature importance analysis reveals predictive associations and not necessarily causal relationships.
We have revised the language in the Discussion section to reflect this distinction, replacing potentially causal phrasing (e.g., "determinant") with more accurate terms like "predictive marker" and "association". We also added a sentence to caution against causal interpretation due to potential confounding.
7、Comment 7: Update references to recent studies to give the study more scientific robustness.
Response 7: We sincerely thank the reviewer for this valuable suggestion. We agree that incorporating the most recent literature enhances the context and robustness of our study. In accordance with this comment, we have now updated the references throughout the manuscript.
Round 2
Reviewer 1 Report
Comments and Suggestions for Authorsthe authors took my suggestions into account and the manuscript is imporved Well done
Reviewer 4 Report
Comments and Suggestions for AuthorsI thank the authors for their replies to the comments. I have no further comments to make.
