Benchmarking Anomaly Detection Methods for Extracardiac Findings in Cardiac MRI
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis study presents a timely and clinically relevant investigation into automated detection of extracardiac findings (ECFs) using CMR imaging. The novelty lies in addressing a critical gap in cardiac imaging workflows, where incidental ECFs are frequently overlooked despite their potential clinical significance. By benchmarking 20 state-of-the-art (SOTA) anomaly detection (AD) methods across unsupervised, semi-supervised and supervised paradigms, the authors offered a comprehensive evaluation of the applicability to anatomically complex CMR data. The clinical value is substantial, as automated ECF detection could enhance diagnostic accuracy, and prompt timely referrals. The inclusion of diffusion models and contrastive learning approaches reflects alignment with cutting-edge trends in medical AI. However, the suboptimal performance of most AD methods compared to supervised baselines underscores the unique challenges posed by CMR data and highlights the need for domain-specific innovations. Focus on ECFs addresses an understudied yet impactful area in cardiology, with potential to improve patient's outcomes. Thus there are still some issues needed to be addressed:
- Relatively limited sample size and lack diversity - the dataset lacks demographic diversity (single-center/retrospective) and may not fully represent the whole patient population or other rare pathologies.
- despite balancing validation/test subsets, the low prevalence of ECFs (10.41% at the image level) risks biasing models toward normality, especially in unsupervised methods.
- 3D context ignored: the exclusive use of 2D slices disregards spatial continuity between slices, which is critical for contextualizing abnormalities (e.g., distinguishing liver-lung transitions from true lesions).
- Hyperparameter sensitivity - while hyperparameters were tuned, the ad hoc adjustments (Table A1) lack a unified framework, raising concerns about reproducibility. The authors should provide more explanation for this point.
- The supervised baselines (SupIC/SupIS) are oversimplified. For example, SupIC uses a basic ResNet-18 without leveraging pretrained weights or advanced architectures (e.g., transformers), potentially underestimating supervised learning’s ceiling.
- Reliance on pixel-AP and sample-AUROC might not fully capture clinical utility. Metrics like sensitivity/specificity at operationally relevant thresholds (e.g., for triage) are missing.
- OS-DDPM’s use of simplex noise, while innovative, limits its ability to detect high-frequency anomalies (e.g., small nodules).
- Attention-based methods (e.g., expVAE, AMCons) failed to localize anomalies meaningfully (Figure 5), why these architectures underperformed should be discussed.
- Expand on attention-based models failed and propose architectural modifications for CMR-specific challenges in the discussion section.
Some other minor suggestions:
1) It would be better to include demographic statistics and ECF subtype distributions.
2) Standardize anomaly map color scales and add GT overlays in figures. Simplify Table 2 by aggregating view-specific results or using suppl. tables.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis study presents the first systematic evaluation of 20 anomaly detection methodologies for identifying extracardiac findings in cardiac MRI examinations. Through the establishment of a comprehensive multi-category dataset containing 35,000 images from 691 patients, the research demonstrates that fully-supervised approaches achieve optimal performance when sufficient data is available, though their effectiveness shows significant dependency on the quantity of abnormal samples. Conversely, unsupervised techniques exhibit superior adaptability in data-constrained scenarios. The investigation reveals critical challenges in current anomaly detection paradigms, particularly in processing complex cardiac anatomical structures inherent in MRI data. In summary, this paper has a clear structure and logical coherence. However, the article still has some minor revisions for improvement:
- The related work section may be overly complex for readers. It is recommended that the authors include more illustrative figures in related work section to help readers understand the distinctions and interconnections between various algorithms.
- In the Introduction section of this paper, the innovation points of the paper are briefly outlined. However, we suggest making innovations at the technical level and supplementing them, for instance, conducting research on detecting heart rate based on an improved algorithm.
- In section 3.2, the study was uniformly conducted at a low resolution of 128×128. It is worth trying to retain the key regions (such as the lungs) of 256×256 size and compare the detection effect on small lesions.
- In section 3.3, the authors mentioned two fully supervised baseline algorithms, ResNet18 and UNet. Please clarify the rationale for selecting these two networks. Additionally, the authors should provide detailed training configurations for these networks, such as learning rate, number of epochs, and other critical hyperparameters.
- In the metrics section, the authors employed AP (Average Precision), pixel-level AP, Dice coefficient, and IoU (Intersection over Union) to evaluate algorithm performance. However, the definitions of these metrics were not introduced. Therefore, the authors should add the definitions of the aforementioned metrics in the metrics section, and provide formulas if necessary for clarity.
- In Table 2, the authors presented a large amount of data. Therefore, the authors should add explanations in Section 4 to guide readers on how to readand understand this table. For example, clarify whether higher or lower values of thesemetrics indicate better performance. Additionally, while the qualitative analysis in Section 4 is valuable, it is insufficient. A detailed quantitative analysis of Table 2 should also be provided to strengthen the discussion.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe manuscript presents a benchmarking study on various anomaly detection (AD) methods for identifying extracardiac findings (ECFs) in cardiac MRI. The authors compare 20 state-of-the-art (SOTA) unsupervised, semi-supervised, and open-set supervised methods against fully supervised baselines. The study highlights the limitations of AD methods in handling the complex anatomical structures present in cardiac MRI and suggests that further research is needed in this area.
- While the paper highlights the poor performance of AD methods in detecting ECFs, it does not explore potential improvements or alternatives. Could you discuss ways to enhance AD models, such as integrating domain-specific feature learning or hybrid approaches combining supervised and unsupervised techniques?
- The study is based on an in-house dataset, which limits generalizability. There is no discussion on how well the findings might translate to other CMR datasets or different clinical environments. Could you explicitly address dataset biases and potential domain adaptation techniques?
- The manuscript briefly mentions previous supervised ECF detection approaches but lacks an in-depth comparison. Including a detailed discussion on how the benchmarked methods compare to prior supervised ECF detection models would strengthen the study.
- While hyperparameter tuning is mentioned, detailed information on optimization procedures is missing. A supplementary table summarizing hyperparameter search strategies for each model would improve transparency.
- The dataset is not publicly available, and access is restricted on a case-by-case basis. Could you clarify under what conditions researchers can access the dataset to reproduce findings?
- It would be better to include references to recent advancements in anomaly detection for medical imaging (if available).
- Some sections introduce too many technical terms in a single sentence, making comprehension difficult for non-specialist readers. It's recommended to run a thorough proofreading or use professional language editing software.
The manuscript presents a well-executed benchmarking study with significant contributions to anomaly detection in cardiac MRI. However, improvements are needed in discussing dataset limitations, potential methodological enhancements, and comparison with prior work.
Comments on the Quality of English LanguageSome sections introduce too many technical terms in a single sentence, making comprehension difficult for non-specialist readers. It's recommended to run a thorough proofreading or use professional language editing software.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsI have no comments now.
Reviewer 3 Report
Comments and Suggestions for AuthorsAuthors have included almost all of the reviewer's suggestions and answered to all of the questions. Current version of the manuscript meets the standards of the Applied Sciences Journal.