Prediction of Clinical Outcomes with Explainable Artificial Intelligence in Patients with Chronic Lymphocytic Leukemia
Round 1
Reviewer 1 Report (Previous Reviewer 2)
The authors responded and modified the manuscript accordingly to reviewers' suggestions. It is suitable for publish in the journal.
Reviewer 2 Report (Previous Reviewer 3)
The authors should be congratulated for their efforts to improve this manuscript's quality. All my concerns were adequately addressed and discussed, and I have no further concerns. I recommend publication of this very interesting study in its present form.
This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.
Round 1
Reviewer 1 Report
In the manuscript, Hoffmann and colleagues apply a novel AI method of analysis of flow cytometry data to CLL, and identify a set of populations predictive of inferior out come in terms of time to first treatment.
The work is certainly interesting and provides some novel data by high-dimensional analysis, highlighting aspects often overlooked by cytometrists. The association between some of the identified signatures to defined markers (e.g. the CD4+ population) help understanding the overall reach of the proposed method. However, some data analysis raise some concerning or is vaguely descripted. Some suggestions are offered below.
- The main issues concern the cohort itself. First, the imputation of missing values as half point in the IPI score is improper, dangerous and wrong. Samples with missing variables should be eliminated from the analysis upfront as a significant source of bias. The only exception could be made if the addition of the missing parameter would not change the overall score (e.g. basic score 4, age is missing. But age scores 1, therefore if >65y would still not change the final score since high-risk includes 4-5-6).
- Second, 157 patients are a really small group for any ML/AI learning model, even if supervised, for the number of analyzed features, which equals 17. Explaining the variability of each feature would be therefore very hard. Could the fact that some populations (e.g. T2C0018, Figure 3B) include both CLL and T cells be explained by this issue?
Also, Authors should be aware that the dataset is highly unbalanced in the number of censored events (115-42, 3:1 ratio) and this strongly impacts on the overfitting of binary classification.
- Authors have employed unpaired t-test (line 160) to detect predictive populations; does non-parametric U-test works as well? The usage of t-test comes with the usual assumptions that are often broken; Authors should show that distribution of values follows assumptions (possibly a boxplot showing how the distribution of values behaves would suffice) or, better, rely on non-parametric tests.
- Frequency of CD38-positive cells was used as a proxy for a “bad prognosis” population; although this historical marker is still in use in many laboratories, its power has been re-evaluated in favor of other markers such as CD49d (cfr. Bulian et al, JCO) which is unfortunately not included in the panel. To keep the comparison with a known unfavorable marker, IGHV status as a sole variable (outside IPI) may be used and could strengthen the Author’s conclusions.
- One of the issues that we are facing with complex AI-derived complex populations is the understanding the biological sense underlying each of them. Identification of Th CD4+ cells as the main contributor to the T1C0016 is indeed very interesting. It would be possible to run some kind of centroid analysis (or similar)? The heatmap provided in figure 4 partially addresses the issue, but is limited to CLL markers.
- Following, it is unclear which data is shown in the heatmap; is it possibly the mean or median (much preferable) of each marker’s value across all samples? A diverging color scale, centered on CLL expression and Z-scaled on the columns may also help in highlighting subgroup differences.
- The analysis of T1C0011 population is also interesting. Do Authors have data on lymphocyte count? Is there any correlation between number of apoptotic cells and number of total lymphocytes (that is, higher disease burden)?
- Regarding table 3, I find striking the change of significance of T1C0023 between the four-factor model and the 2-factor without IPI, also with a CI of 0.15-0.81 which is basically the whole range of data. Seems quite as this parameter is now intercepting the IPI. Could Authors provide a chi-squared between the IPI levels and their signatures?
- The phrase “Multiple logistic regression analysis in repeated 10 bootstrap trials was done for the combination of more than one independent variable to predict dichotomous groups” (lines 91-93) is unclear and needs to be clarified.
- Also, A few Kaplan-meyer curves are warranted to better visualize the data.
- Gene names should be reported in italics
- Checking spelling at lines 41, 132
Author Response
Please see the attachment
Reviewer 2 Report
The authors used AI system to calculate the prognostic factors of CLL and showed positive results to predict the outcome.
1. Still IPI system have prognostic value from the study, the authors should stress more about the advantage of AI than traditional methods.
2. How about other cytogenetic/molecular factors, like ZAP-70? Could the authors put these parameters into AI for more comprehensive analysis?
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
I have read the article by Hoffmann et al. with great interest, as the application of data mining techniques in oncohematological studies is a potentially intriguing topic. The authors present the implementation of their novel algorithm, ALPODS, for identifying cell types in the multidimensional flow cytometry data of newly diagnosed CLL patients. The identified variables were then assessed as novel prognostic factors. Overall, the manuscript is intriguing and may yield material with clinical application potential. However, I have concerns regarding the methodology, data analysis, and preparation of the manuscript.
1. One of the most significant limitations of the study is the lack of information regarding the administered treatment. There are currently numerous therapeutic options available for CLL, including novel therapies like as BTK inhibitors or venetoclax. Patients were diagnosed with CLL between 2014 and 2020, thus earlier therapies, such as immunochemotherapy regimens, could be then provided. This should be improved, and implemented in Table 1, but also in case of various methods of treatment in multivariate analyses, as it is obvious that patient receiving novel/better treatment options will achieve longer TTF . Additionally, some newly diagnosed CLL patients do not require treatment at diagnosis. Also, information indicating the frequency of untreated patients should be included.
2. I am dubious about the validity of the study's endpoint (TTF). Regarding the above statement, it is unknown how many patients required therapy at the time of diagnosis. Moreover, TTF will vary depending on the administered medication. Importantly, there is a large difference in the median follow-up period between these groups, suggesting that the TTF 0 group was observed for too little time to observe the disease progression. Instead of TTF, maybe time to first therapy may be a more relevant measure.
3. Why did the authors not utilize standard univariate and multivariate Cox regression models to determine the prognostic importance of established variables, given that they had information on the time of diagnosis and progression?
4. lines 182-187. Comparisons of the ROC curves of XAI-based models and IPI should be provided to determine the statistical significance of the difference between the areas under the ROC curves, not only p-values from ROC analyses for specific analyses for models and IPI.
5. Table titles should be provided.
6. Table 3. Presentation of odds ratios <1 for inferior outcomes is counterintuitive and should be inverted for clarity improvement.
7. The discussion is too brief in relation to the entire article and should be expanded.
Author Response
Please see the attachment
Author Response File: Author Response.pdf