Incorporating a New Summary Statistic into the Min–Max Approach: A Min–Max–Median, Min–Max–IQR Combination of Biomarkers for Maximising the Youden Index
Round 1
Reviewer 1 Report
See attached.
Comments for author File: Comments.pdf
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Other comments are recommendations are found in the attached file
Comments for author File: Comments.pdf
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 3 Report
The paper extend min-max approach of Liu er al [19] to min-max-median and min-max-IQR. In the simulation study and real data application, the performance of the proposed approach is demonstrated along with min-max approach and the logistic regression for comparison.
In section 2.3, the authors could explain algorithm more explicitly.
Advantage of the proposed approach is not clear compared to the min-max approach and the logistic regression.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 4 Report
Thank you for the opportunity of reading this paper. It is an interesting idea, and the paper is well written and a good read. However, unfortunately I don't think it is suitable for publication, at least not in this journal and in this form.
In my opinion, since this is a mathematical journal and the section is "Probability and Statistics Theory", there should be strong theoretical arguments, while this paper has no theory to speak of. There are some heuristic arguments (which are, admittedly, interesting) and a couple of simulations which prove very little. The authors claim that this solves problems with a high computational burden, however in the simulation there are at most 10 parameters and 1000 sample size, which is not considered a very large sample nowadays. Moreover, even in these simulations, the only scenarios where LR is not superior to all other methods are those with (highly) correlated covariates. I hope that the authors are not being disingenuous, because it is well known that linear regression "fails" with correlated predictors, and that there are several ways to fix that (variable selection, PCA, lasso etc.).
To sum up, should the authors publish the paper in a mathematical journal, I would expect stronger mathematical proof of the soundness of their proposed method. Barring that (and if the editors believe that simulation is proof enough), the simulations should include "state of the art" solutions for different scenarios and show that their proposed method is an improvement on those. There should also be a stronger argument on why those scenarios are sufficient, and not just hand-picked for illustration.
Minor comments and mistakes:
l. 50: appears -> appear
l. 85 (and elsewhere): "whatever the number.." -> "regardless of the number..."
l. 121: R software should be cited
eq. (1): left side, i.e. the dependent variable, is missing
l. 142: "Let $X_ijk$ the biomarker..." -> BE/DENOTE the biomarker...
eq. 2: I believe that max should not be in italic (if the paper is written in LaTeX, use \max)
eq. 6: \forall j\neq k =1, 2, 3 -> I don't understand this. Why j \neq k, and isn't k only 1 or 2? Perhaps there is a typo somewhere
l. 217 and elsewhere: Do not use * for multiplication, use a dot (\cdot in LaTeX)
Tables: The captions should be clearer. First, it should say something like "mean Youden indices..." and then describe the scenarios.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 4 Report
Thank you for the thoughtful reply. The authors have persuaded me in many regards, but I still have two main concerns:
- LR si still the main method to which others are compared, even when it is clear that it is not the right option in some cases. I expect the authors to compare their methods to the current best method for all scenarios and show that their method improves on those. In cases where LR fails, I expect at least some kind of explanation as to why they didn't use an LR modification that suits the scenario better. If there are no existing uch modifications (i.e. if LR is the current best method), a source or an explanation of that is needed.
- Given that the main draw of the method is (or should be) computational simplicity, the authors still do not handle large data sets. They have now included more biomarkers (p=100), but then only on a small sample - why? For small data sets, more complex methods are also feasible.
Minor corrections:
l. 145: the authors misunderstood my comment, I meant "Let X_ijk be the jth biomarker..."
l. 342: "This not only has the advantage that it is always computationally feasible. Furthermore, in scenarios..." - if you start a sentence with "this not only...", there should also be a "..., but...", in this case "but also in scenarios...". In other words, this should either be one sentence, or (better) remove "not only"
l. 346: "provide a non-overfitting" is an unusual phrase, I suggest "avoid overfitting"
Author Response
Please see the attachment.
Author Response File: Author Response.docx