Evaluating Uncertainty Quantification in Medical Image Segmentation: A Multi-Dataset, Multi-Algorithm Study
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe paper has the following issues:
1. "[7] Estimation uncertainty for polyps in colorectal cancer dataset." cannot use the reference number as the subject. It is recommended that the author follow the application guidelines of the reference.
2.The introduction to the dataset in chapters 3.1.1-3.1.4 is too detailed and takes up a lot of paper space. It is recommended to streamline it.
3.Suggest providing a network architecture diagram in the Architecture Description section of 3.2 forthe readers to understand.
4.Ref is italicized in formula (1) and regular in formula (2). The same variable needs to be consistent throughout the text, and it is recommended that the author unify it.
Comments on the Quality of English Language
The English could be improved to more clearly express the research.
Author Response
Comments 1: "[7] Estimation uncertainty for polyps in colorectal cancer dataset." cannot use the reference number as the subject. It is recommended that the author follow the application guidelines of the reference.
Response 1: Thank you for pointing this out. We agree with this comment. The reference number is fixed in all the places needed.
Comments 2: The introduction to the dataset in chapters 3.1.1-3.1.4 is too detailed and takes up a lot of paper space. It is recommended to streamline it.
Response 2: Thank you for pointing this out. We agree with this comment. All dataset descriptions have been shortened.
Comments 3: Suggest providing a network architecture diagram in the Architecture Description section of 3.2 for the readers to understand.
Response 3: Thank you for pointing this out. We agree with this comment. The network architecture is added in Figure 5.
Comments 4: Ref is italicized in formula (1) and regular in formula (2). The same variable needs to be consistent throughout the text, and it is recommended that the author unify it.
Response 4: Thank you for pointing this out. We agree with this comment. The problem is fixed for all equations.
Also, the discussion section has been rewritten to present the results clearly, and the text was reviewed to fix any language issues.
Reviewer 2 Report
Comments and Suggestions for Authors- The paper is written in a clear way and the content is of interest for the medical research community.
- Authors needs to highlight in a clear way in a separate paragraph what is the key output that readers needs to get out from this paper. What is the message that they want to communicate to readers from this paper. It wasn't completely clear to me.
- Authors didn't compare their results with other state of the art methods. They did include a table where they compare which components other papers included and which they didn't , but they didn't compare their results with any of the suggested methods . Authors needs to do this to see where their method stands comparing to other state of the art models. This is an important key.
- Authors needs to include some examples where their algorithms failed to segment the desired region in the discussion section and highlight why the algorithm failed in their opinion. They used diverse datasets, some illustrations would add to the value of the paper.
- Figures are clear and well presented which added to the quality of the paper.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper, titled “Evaluating Uncertainty Quantification in Medical Image Segmentation: A Multi-Dataset, Multi-Algorithm Study”, explores the impact of uncertainty quantification methods on medical image segmentation across multiple datasets and algorithms. The authors utilize various uncertainty estimation techniques, including Monte Carlo (MC) dropout, ensemble models, and regression models, to evaluate segmentation performance. The study assesses how different dropout rates and multiple annotation approaches affect the model's uncertainty estimation, comparing models trained on four public medical datasets: Knee, SIJ, Lung, and Heart. A novel gray-level Dice coefficient (GDC) metric is introduced to measure segmentation performance.
1. The paper evaluates models on four datasets (Knee, SIJ, Lung, and Heart), but it does not provide a clear rationale for why these specific datasets were chosen. While the study aims to analyze the effect of multiple annotations, the datasets vary significantly in sample size and imaging modalities, which could affect the generalizability of the findings. A more detailed explanation of how these datasets represent a broad range of segmentation challenges would enhance the credibility of the results. Additionally, including datasets with more diversity in imaging modalities (e.g., ultrasound or PET) could strengthen the conclusions.
2. The paper explores the effect of multiple annotations on segmentation performance, but the analysis is somewhat limited. Although the authors report that multiple annotations led to performance improvements in certain datasets, such as SIJ, they do not thoroughly discuss why this improvement is observed in some cases but not others (e.g., the Knee dataset). The potential reasons behind this variation should be further explored, such as differences in dataset size, annotation quality, or the complexity of the anatomical structures being segmented. Additionally, the paper could benefit from a more detailed examination of how the averaging of multiple annotations impacts segmentation accuracy, as it may introduce noise or bias.
3. The authors investigate the effect of varying dropout rates (20%, 40%, 60%, 80%) in Monte Carlo models but conclude that the dropout rate has little to no impact on model performance except at the extreme dropout levels. While this observation is interesting, the paper does not provide a detailed explanation for why the dropout rate has minimal effect on the uncertainty estimates. A more comprehensive analysis, possibly including theoretical or empirical justifications, would help clarify this result. Additionally, exploring alternative methods for uncertainty estimation, such as variational inference, could provide a more robust comparison.
4. The paper introduces several technical terms, such as "gray-level Dice coefficient" (GDC) and "certainty maps," without adequately explaining them upfront. Providing clearer definitions for these terms when they are first introduced would make the paper more accessible to a wider audience.
5. While the paper discusses model performance, it does not address the computational complexity of the different uncertainty estimation methods. Given that Monte Carlo dropout and ensemble models can be computationally expensive, it would be valuable to include an analysis of the time and resources required to train and evaluate the models, especially for large-scale medical datasets.
6. The authors introduce GDC as a novel metric, but they do not explain how this metric compares to more commonly used metrics in the field of medical imaging. A more detailed analysis of how GDC differs from traditional Dice coefficients, and whether it offers significant advantages, would help validate the use of this new metric.
Comments on the Quality of English LanguageN/A.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 3 Report
Comments and Suggestions for AuthorsThanks for the revision.
I suggest for acceptance.