4.3.1. Quantitative Comparison and Evaluation
The quantitative evaluation reported in this paper aims to comprehensively examine the effectiveness and generalization ability of the proposed method, GDCA-Net. We used Equations (
9)–(
12) to calculate performance metrics such as the accuracy, recall rate, mAP50, and mAP50-95. Given the diverse instances in the dataset—covering varying lesion sizes, shapes, and textures, as well as differences in endoscopic image quality—this study systematically tested multiple different deep learning models.
This paper focuses on utilizing deep learning models for polyp segmentation, particularly emphasizing early, precise segmentation of polyps to assist in clinical diagnosis. After thoroughly evaluating the dataset, we selected YOLOv12-seg as the primary framework due to its outstanding performance and efficiency in rapidly segmenting polyps of varying sizes and shapes. The segmentation model built on YOLOv12-seg demonstrated significant improvements across multiple performance metrics.
To systematically assess the effectiveness of the proposed method, this study conducted a comprehensive comparative analysis of a series of deep learning-based segmentation techniques and their improvements [
43]. These comparison models include YOLOv6-seg [
54], YOLOv8-seg, YOLOv8p2-seg [
55], YOLOv10n-seg [
56], YOLOv11-seg [
57], YOLOv12-seg [
58], EfficientNetv2-seg [
59], vanillanet-seg [
60], and ADNet-seg. Additionally, to assess the model’s generalization ability across diverse scenarios and requirements, we selected multiple datasets for experimentation. These include the PolypDB dataset, the Kvasir-SEG dataset, and the CVC-ClinicDB dataset. The comparison results of various performance metrics across different datasets are compiled in
Table 7 and
Figure 6, comprehensively demonstrating the advantages and disadvantages of the proposed GDCA-Net model compared to other models.
As shown in
Table 7, the proposed GDCA-Net model performs well on most core metrics. On the highly challenging PolypDB dataset, GDCA-Net achieved the best results in terms of both the mAP50 and mAP50-95 metrics, with values of 85.9% and 46.9%, respectively. This indicates that the model demonstrates strong robustness when faced with challenging data such as low image quality, uneven lighting, and blurred polyp boundaries. On the high-quality Kvasir-SEG dataset, GDCA-Net also performed exceptionally well. GDCA-Net topped the rankings, with an F1 score of 94.9%, and achieved outstanding results of 97.0% and 74.1% on mAP50 and mAP50-95, respectively. It is worth noting that other advanced models in this dataset, such as YOLOv11-seg and YOLOv8-seg, also achieved very high scores, indicating that the dataset is relatively less challenging. However, GDCA-Net maintains its lead on this high-standard dataset, further validating its advanced capabilities. On the CVC-ClinicDB dataset, GDCA-Net achieved an mAP50 of 98.5% and an mAP50-95 of 82.9%, with performance comparable to that of advanced models such as ADNet-seg and YOLOv8-p2-seg.
Although GDCA-Net does not achieve the highest precision or recall across all datasets when compared to certain models (e.g., YOLOv8-seg and ADNet-seg), it demonstrates consistently high performance across all metrics and datasets, indicating superior overall generalization capability. Those models with marginally higher precision often exhibit a corresponding decrease in recall, highlighting the inherent trade-off between detection sensitivity and false-positive suppression. In contrast, GDCA-Net achieves a more balanced performance profile, which is particularly crucial in clinical applications where both high sensitivity and high specificity are equally critical.
These experimental results strongly demonstrate the effectiveness and robustness of GDCA-Net in handling datasets of different styles and with differing challenges. It is worth noting that in multiple comparison experiments, due to the strict requirements of EfficientNetv2-seg and vanillanet-seg on the dataset, they performed poorly on the PolypDB dataset, resulting in underfitting.
4.3.2. Ablation Experiments
To clarify the contributions and functions of each component in the GDCA-Net model, a series of ablation experiments was conducted in this study. Specifically, eight ablation experiments were performed using the PolypDB dataset, covering the YOLOv12-seg model, models ➀ to ➇, and GDCA-Net. The experimental results are detailed in
Table 8. By systematically introducing improved modules based on the YOLOv12-seg baseline model, this study evaluated the roles of the GD mechanism, AKConv, CAF, ContMix, and Wise-IoU. The experimental results clearly demonstrate that the model’s outstanding performance is not accidental but the result of the synergistic effects of multiple key technologies.
In the evaluation of individual components, model ➃ achieved the most significant performance improvement. Its mAP50 improved significantly from 83.7% in the baseline model to 86.4%, and its F1 score also jumped from 84.8% to 86.7%. This significant improvement demonstrates that Wise-IoU can achieve extraordinary effectiveness in optimizing boundary regression and handling complex, uneven segmentation samples. Thus, it has become the core driving force behind the model’s performance improvement. Meanwhile, model ➁ improved the mAP50 by 1.1% and mAP50-95 by 2.1%, demonstrating the effectiveness of this mechanism in enhancing multi-scale feature fusion; model ➂ had a positive impact on recall and mAP50-95, indicating that CAF and ContMix effectively enhance the model’s ability to capture contextual information and irregular morphological features. By integrating global contextual attention and dynamic convolutions that can adapt to irregular shapes, the model can more accurately distinguish polyps in complex backgrounds and precisely segment their diverse, non-linear shapes.
Further experiments reveal the strong synergistic effects between components. When the GD mechanism and AKconv are combined with CAF and ContMix (Model ➄), the model’s performance is further improved in terms of mAP50 and F1 scores. Notably, the combination of CAF and ContMix with Wise-IoU (Model ➅) performs well, achieving an mAP50 of 85.5% and an F1 score of 85.9%, demonstrating strong synergistic gains. The combination of the GD mechanism with AKConv and Wise-IoU (Model ➆) also achieved outstanding performance, with an mAP50 of 85.2% and mAP50-95 of 47.5%.
Finally, by integrating all improved components—GD mechanism with AKConv, CAF with ContMix, and Wise-IoU—into GDCA-Net—the model achieves the best overall performance among all combinations. Although its F1 score of 85.5% is slightly lower than the peak value achieved by Wise-IoU alone, its mAP50 and mAP50-95 reach 85.9% and 46.9%, respectively, approaching optimal levels across all evaluation metrics. This fully demonstrates that GDCA-Net achieves optimal balance and optimization across all performance dimensions by integrating all components, making it a robust model that performs exceptionally well in various complex scenarios.
It should be noted that Models ➃, ➆, and ➇ achieved comparable results, as they share several key components (e.g., Wise-IoU and GD + AKConv), which significantly enhances segmentation performance. However, Model ➇ consistently demonstrates more stable performance across all metrics, indicating that the synergistic integration of modules yields more robust and balanced performance improvements than any individual component alone.
4.3.3. Qualitative Analysis
To comprehensively evaluate the segmentation performance of the GDCA-Net model, this study not only conducted the aforementioned quantitative analysis but also performed qualitative analysis. First, we randomly selected 12 images from the PolypDB dataset [
51], as shown in
Figure 7a. The GDCA-Net demonstrated strong robustness and adaptability under challenging conditions such as low image quality, uneven lighting, complex backgrounds, and blurred polyp boundaries. The model can reliably segment polyps of various shapes and handles blurred boundaries with great precision, with its prediction results highly consistent with the ground-truth labels (
Figure 7b).
Second, to assess the model’s generalization ability, we randomly selected 12 images from the Kvasir-SEG dataset [
52], as shown in
Figure 7c. The images in this dataset have relatively high quality, with polyp boundaries typically being clear. GDCA-Net also demonstrated exceptional segmentation capabilities on this dataset, with its prediction results matching the true labels in
Figure 7d. This demonstrates that GDCA-Net can effectively utilize the rich information in high-quality images to achieve high-precision segmentation and successfully generalize to datasets of different styles.
In addition, to comprehensively evaluate the segmentation performance of GDCA-Net across different clinical scenarios, we randomly selected 12 images from the CVC-ClinicDB dataset, as shown in
Figure 7e. Polyps in this dataset are often characterized by uneven illumination, mucus interference, and complex surrounding tissue textures. Under these challenging conditions, GDCA-Net still demonstrates excellent segmentation performance, accurately capturing both the overall structure and subtle contours of polyps. Its predicted results show a high degree of consistency with the ground-truth labels in
Figure 7f; even in the presence of slight occlusion, the model maintains stable segmentation consistency.
Overall, the GDCA-Net model proposed in this study performs well in tasks involving different image qualities and polyp features, covering a wide range of scenarios, including blurry, complex, clear, and simple ones. On the PolypDB core dataset, GDCA-Net can accurately and effectively segment various types of polyps, thereby providing important auxiliary support for clinical doctors in early diagnosis.
4.3.4. Failure Cases Analysis
Although GDCA-Net demonstrates superior segmentation performance across various datasets, we also observed a few typical failure cases during qualitative analysis, as shown in
Figure 8. Including and analyzing these cases is essential for understanding the current limitations of the proposed method and guiding future improvements. As shown in
Figure 8, GDCA-Net occasionally fails to detect polyps with extremely low contrast, smooth texture, or unclear boundaries, particularly when they are small or flat against the surrounding mucosa. In these cases, the model struggles to differentiate subtle intensity variations, leading to incomplete or missed segmentation regions. In addition, the model sometimes misclassifies bright reflections caused by endoscopic illumination as polyp regions. These false positives are likely due to the similar intensity distribution between specular highlights and actual lesions, which confuses the feature extraction process.
These failure cases highlight potential areas for improvement. Future work will focus on incorporating illumination-invariant feature representations and context-aware refinement mechanisms to better handle challenging imaging conditions and reduce misclassification.