Review Reports - Predicting Disease Activity Score in Rheumatoid Arthritis Patients Treated with Biologic Disease-Modifying Antirheumatic Drugs Using Machine Learning Models

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study presents a machine learning approach for predicting 12-month disease activity in RA patients initiating bDMARD therapy, with notable strengths including rigorous nested cross-validation and meaningful external validation using a real-world registry. In general, some revisions are needed to strengthen the manuscript:

The introduction should more explicitly position this work within existing literature on RA treatment response prediction. Highlight what distinguishes this approach from prior models (e.g., continuous outcome prediction vs. binary classification, specific feature engineering, or generalizability focus). A concise comparison paragraph contrasting key methodological/performance differences with 2-3 recent similar studies would strengthen the novelty claim.

While the BioReg validation is commendable, deeper analysis of performance differences between cohorts is needed. Discuss potential factors driving the accuracy decline (e.g., heterogeneity in patient characteristics, treatment protocols, or data collection practices across sites). Feature importance comparison between cohorts could offer valuable insights into model stability.

The current results section is fragmented. Integrate internal and external validation findings cohesively. Provide a consolidated table comparing all models' key metrics (MAE, R², accuracy, F1) for both datasets. Ensure all referenced figures (Fig 1-4) are included and properly contextualized. The abrupt ending in Section 6 requires completion.

Expand the discussion on the clinical utility of continuous DAS28-CRP prediction versus binary remission. Provide specific examples of how individualized thresholds could inform treatment adjustments. Address potential limitations of using baseline-only features versus incorporating early treatment response indicators.

Resolve minor inconsistencies (e.g., "Mean Absolute Error (MAE)" duplicated in metrics listing). Justify the choice of MinMax scaling over alternatives. Clarify why Ridge outperformed other regularized methods (e.g., Lasso). The patents section should be removed unless substantive content is added.

Acknowledge sample size constraints, potential unmeasured confounders, and the need for prospective validation. Discuss implications of bDMARD class heterogeneity (e.g., TNFi vs. JAK inhibitors) not addressed in the model.

Comments on the Quality of English Language

The English could be improved to more clearly express the research.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

I present my review in subsections, which should be addressed in the response to the reviewer and, if possible, improve and enhance the text of the article.
1. No publication selection criteria or search strategy is provided. The document does not meet the standards of a systematic review.
2. The introduction lacks information introducing the reader to AI in medicine. Please read and and cite the following article DOI: 10.3390/diagnostics13152582.
3. Only the advantages of the methods (SWE, ATI, SMI) are presented. An analysis of their limitations, such as operator dependence or technical difficulties, is missing. Please review similar methods in usg.
4. Cutoffs, AUC values and other data are scattered. Comparative tables are needed to facilitate interpretation.
5. Most studies date from before 2020. Newer studies and current guidelines are lacking.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This study investigated the use of machine learning models to predict the disease activity score in Rheumatoid Arthritis patients. I think the paper is interesting, but it can be improved in different aspects, mainly to clarify important points.

Comments (answers to the questions can/should be used to improve the manuscript):
1. 'bDMARDs' is not a well known abbreviation. So, I think it should not be used in the title.
2. The outcome (target variable) is not clear in the abstract. Consider "12-month disease activity, measured by DAS28-CRP".
3. Use properly abbreviations. For example, 'we developed a machine learning framework'; ML has already been defined, so use it. Check the whole text, there are many misuses (or lack of use) of abbreviations.
4. Avoid short paragraphs and paragraphs with only 1 or 2 short sentences.
5. Check 'booktabs array' - line 98.
6. BioReg (n = 88) has a smaller sample size when compared to Erlangen (n = 154). Why do the authors argue in different parts of the paper 'broader clinical settings and patient variability'? Broader in which sense? patient variability is not shown in the standard deviation values in table 1.
7. Both Erlangen (n = 154) and BioReg (n = 88) have a small sample size. Do you agree?
a) When were both datasets collected?
b) Was the purpose of external validation geographical or temporal?
8. The information 'This dataset includes a more diverse and heterogeneous population of rheumatoid arthritis patients, collected from multiple clinical sites across Austria.' is repetitive.
9. For figures 2, 3 and 4:
a) Provide labels/values in bars
b) Legend should not be over the bars
c) Title is not required, since figures have captions.
10. To better understand MSE and MAE, provide the range of DAS28-CRP.
11. What is the distribution of classes in the BioReg dataset? If it is unbalanced, could there be any relationship with the difference in performance when analyzing precision and recall metrics?
12. How was the hyperparameter fine-tuning process carried out? Grid search? Declare this in the methods section. Also, provide the best hyperparameters for each model in the results.
13. Model utility: What is the utility of the model? In a time line, when should it be used? 'baseline data' should be better explained in the manuscript. This sentence is not clear 'Baseline data were collected at the time of bDMARD initiation, and follow-up assessments were conducted throughout the treatment period.'. Also, what kinds of interventions can be conducted based on the model output?
14. Fairness analysis: Demographic variables (age and gender) are sensitive attributes and were used as predictors. How biased are the models, considering these variables? Do the models predict well for man, woman, younger and older alike?
15. 'Furthermore, while the model performed well in retrospective evaluation, prospective studies will be needed to assess its clinical utility in real-world decision-making.' is not a limitation; it is a future work.
16. A conclusion section is welcome in the manuscript.
17. Finally, but very importantly, related work discussion (paragraph in lines 44-48) is poor. Unpack it to explicitly argue the study contributions.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The values in Table 1 are presented as mean (standard deviation) - they can be presented this way after checking the normality of the distribution, e.g. using the Shapiro-Wilk test. If the distribution is different from normal, the min, max and quartile values (including the median) should be used.

The discussion should put more emphasis on comparing the results with those of other researchers, limitations of our own research, and key directions for further research.

Conclusions regarding the aim of the work are missing.

Patents section should be removed.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have complied with most of the reviewer's comments. They corrected the text of the article and thus contributed to its substantive value.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors answered my questions, and addressed my concerns.

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have taken into account all my comments in the revised version of the article.