Review Reports - A Machine Learning Approach to Identifying Risk Factors for Long COVID-19

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

An interesting and worthwhile contribution, well articulated. The random forest method and ROC analyses are appropriate choices.

More discussion is needed on why the small data size is acceptable. Given the number and breadth of factors included, a far larger size would be expected. Alternatively, further sensitivity analysis could be included such as leave-out methods.

Itt s strongly recommended the above aspect be addressed. Nevertheless, this contribution has value as a demonstration of method and pilot for results.

Perhaps some speculation could also be included on whether a similar approach might have value for other risk analyses (eg severity, recovery rate, comorbiduties, natural immunity, vaccine effectiveness.

Author Response

Please see attached file.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a machine learning algorithm aimed at identifying key risk factors associated with Long COVID-19, and it conducts a broad and detailed analysis of potential health-related and sociodemographic factors. However, the paper still faces the following issues:

1. The study excludes comorbidity variables with an incidence rate below 50. Could this decision lead to the omission of some important risk factors, thereby impacting the comprehensiveness of the model?

2. How can it be ensured that the random forest is the optimal machine learning method? Has it been compared with other methods?

3. For variables not identified as important features, you mentioned that these variables might influence model performance through interaction effects. Is it necessary to conduct a multilevel modeling analysis to capture these complex relationships?

4. You stated that one of the advantages of the random forest model is its ability to handle multiple predictor variables; however, this model is often regarded as having poor interpretability. What strategies do you plan to implement to enhance the interpretability of this model?

5. You mentioned that the definition of Long COVID is continuously evolving. Could you elaborate on how these changes affect data interpretation and result analysis?

Author Response

Please see attached file.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I have no further suggestions.