An Expectation-Maximization Algorithm for Including Oncological COVID-19 Deaths in Survival Analysis
Round 1
Reviewer 1 Report
Referee's report on "An Expectation Maximization..."
by Felice, Mazzoni, and Moriconi
This is a good paper, carefully done, and concerning an
interesting topic. It deserves publication as is. Here are
a few remarks and suggestions for the authors' future work,
none of which are required for the final version here.
* Particular good points are the emphasis that covid deaths
cannot by treated as censoring variables for KM survival
computations, and the extended version of Greenwood's formula
taking that into account.
* Concentrating on expected lifetimes makes things hard, not
just because of "last observation" difficulties - in practice,
the lack of information near the end of the study period
could lead to unstable estimation. While not as neat,
the ideas could be applied to medians instead of expectations.
* Efron's 1967 "redistribute to the right" interpretation of KM
estimates has something to do with the "ehat" computations in
the CoDMI algorithm.
* Parametric survival methods, as in Efron JASA '88, could be
applied here, giving less variable estimats.
* What about covariates, ala Cox regressio
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
This is nicely done. Lower the number of acronyms fir readability
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
An iterative procedure is used to compute virtual event times in censored data with spurious events. Virtual events are injected them into data and Kaplan-Meier estimates are recalculated until convergence. A unique contribution is the calculation of a correction of the confidence intervals consequent to the injection of estimated samples as additional samples. The proposed method produces a complete dataset, amenable to processing with standard statistical procedures. The authors have made the R code available to the community. The virtual samples added are ordinarily seen as events of interest. Some of them could also correspond to censorings, and a procedure is provided for this case, with leeway for expert opinion in reference to the specifics of any given study. Experiments include simulations and an analysis on a real dataset where artificial Covid events have been injected.
Directions for future research are outlined very briefly. The authors might wish to expand them, perhaps also discussing the effect of the proportion and distribution of Covid events on convergence.
Could the proposed data augmentation technique be used in conjunction with some of the methods designed to handle the related problem of competing risks, to improve their performance?
The manuscript is written very well. However, the usual recommendation for a proofread applies.
Author Response
Please see the attachment.
Author Response File: Author Response.docx