4.1. TCGA Dataset Results
For each slide, the predicted tissue mask P is compared to the ground-truth mask G after binarizing both to and, if necessary, resizing the prediction to match the ground-truth resolution. Two overlap metrics are computed: mean Intersection over Union (mIoU), which penalizes both over- and under-segmentation, and mean Dice Class Consistency (mDCC), which assigns twice the weight to overlap regions to better handle class imbalance.
All slides in the evaluation set were down-sampled to 10 μm resolution, and masks were generated accordingly. We report mIoU, mDCC, and average per-slide inference time (in seconds). The classical methods (Otsu, K-Means, and Double-Pass) were executed on a 12-core CPU, while GrandQC was run on both CPU and an RTX-4090 GPU. GrandQC’s performance serves as an in-domain upper bound, given its training on similar TCGA data.
For each slide, we computed IoU and Dice against the ground-truth mask. Study-level mIoU/mDCC in
Table 2 are simple arithmetic means over all 3322 slides. Inference time is the per-slide runtime averaged over the same 3322 slides.
Aggregate results, weighted by cohort slide counts, demonstrate GrandQC’s leadership in mIoU and mDCC with moderate GPU inference time. Our proposed Double-Pass, an annotation-free hybrid method, follows closely in accuracy while achieving the lowest CPU inference times among the high-performing methods, significantly outperforming K-Means in efficiency and Otsu in accuracy.
Table 3 provides a breakdown by cancer type, revealing performance variations across cohorts. GrandQC consistently achieves the highest mIoU and mDCC in all cohorts, attributable to its supervised training on annotated TCGA slides, which enables it to learn cohort-specific patterns like staining variations and tissue morphologies. In contrast, the annotation-free classical methods, particularly our Double-Pass, rely solely on unsupervised image statistics and heuristics, yet manage to attain competitive scores. For example, in LIHC, Double-Pass trails GrandQC closely in both mIoU and mDCC, while K-Means slightly edges it out in accuracy but at a much higher computational cost. Otsu, while generally the fastest among single classical methods, exhibits lower accuracy, particularly in complex cohorts like BRCA and GBM, where under-segmentation is more pronounced.
These per-cohort results underscore GrandQC’s superior performance due to its in-domain training, which equips it to handle diverse cancer-specific challenges, such as necrotic areas in GBM or sparse tissues in BRCA. However, the classical methods offer compelling trade-offs: they require no annotations or GPU resources, making them accessible for preliminary processing in computational resource-constrained settings. Notably, our Double-Pass stands out by striking an optimal balance, delivering mIoU and mDCC values very close to GrandQC in several cohorts, such as CHOL and HNSC, at consistently sub-0.3-second CPU inference times—faster than Otsu in many cases while surpassing it in accuracy across most cohorts, and dramatically quicker than K-Means without sacrificing much precision. This remarkable speed stems from its innovative hybrid design, which fuses complementary passes (color-based artifact rejection and efficient downsampled clustering) to mitigate the limitations of single classical approaches, avoiding the full-pixel clustering overhead seen in K-Means.
The efficiency of Double-Pass is particularly useful in digital pathology, where processing thousands of gigapixel WSIs from large cohorts like TCGA can become a bottleneck. Rapid inference on standard CPUs enables scalable preprocessing in high-volume labs, reducing overall analysis time, minimizing computational costs, and allowing pathologists and AI models to focus on clinically relevant regions without delays. This is especially important in resource-limited settings or during real-time applications, where GPU access may be unavailable, making Double-Pass a practical, high-impact solution for advancing cancer research and diagnostics.
Overall, the results highlight cohort-dependent variability, likely driven by differences in tissue density, staining intensity, and artifact prevalence. For instance, hepatic cohorts (CHOL and LIHC) yield higher scores across all methods due to denser, well-stained tissues, whereas central nervous system (GBM) and breast (BRCA) cohorts present greater challenges with sparse or heterogeneous regions. While GrandQC sets the accuracy benchmark, Double-Pass’s annotation-free nature, exceptional speed, and near-comparable segmentation quality position it as a standout alternative for scalability, effectively reducing preprocessing bottlenecks and enhancing workflow efficiency.
4.2. Qualitative Results
Qualitative comparisons (
Figure 2,
Figure 3 and
Figure 4) illustrate Double-Pass’s smoother masks in cancer slides, minimizing missed regions.
These visualizations reveal that Double-Pass produces more contiguous masks with fewer false negatives compared to Otsu, particularly in heterogeneous tissues like ACC, where it achieves near-GrandQC quality without annotations. In non-cancer examples like sputum, methods show robustness to different stains, but cancer-specific findings indicate Double-Pass’s strength in avoiding over-inclusion of artifacts, enhancing downstream AI reliability. Limitations include potential misses in very faint areas; however, the hybrid approach mitigates this better than pure classical methods.
Double-Pass delivers deep learning–comparable accuracy while remaining lightweight and annotation-free, a key in pathology diagnosis where expert time is limited.
In cancer cohorts, performance varied: Higher scores in cohorts like LIHC, and lower scores in others like GBM. This highlights Double-Pass’s potential in diverse tissues. In BRCA, Double-Pass achieved a competitive performance. Limitations include the thumbnail resolution missing micro-details; future work could integrate multi-scale approaches.