5.3. Segmentation Results
This section provides a comprehensive visual comparison of segmentation results across the DRIVE, CHASE, and HRF datasets obtained using both the baseline and proposed models. Additionally, a detailed analysis of each figure highlights the qualitative differences between the methods. Finally, an ablation study is included to validate the impact of the proposed improvements and confirm the optimal model configuration.
Figure 12 displays the segmented fundus images for the DRIVE dataset.
The images in the first column (
Figure 12a,e,i) belong to the input images; these are the preprocessed images that previously underwent data augmentation in the green channel and applied gamma correction and CLAHE. The next column, column 2 (
Figure 12b,f,j), belongs to the images segmented by the baseline U-Net model; in the third column (
Figure 12c,g,k), the images are segmented by the improved lightweight U-Net model. The last column (
Figure 12d,h,l) shows the images belonging to the ground truth.
Figure 13 illustrates the model’s performance on the DRIVE dataset using a boxplot, providing insights into the experiment’s repeatability of five times. Similarly, a 5-fold cross-validation approach was used. The metrics displayed are the Dice Similarity Coefficient, Intersection over Union, Accuracy, Sensitivity, and Specificity. These have been abbreviated to DSC, mIoU, Acc, Sen, and Spec, respectively.
Figure 13a presents the box plot corresponding to the modified U-Net architecture, whereas
Figure 13b depicts the baseline U-Net. The results indicate that the modified model shown in
Figure 13a outperforms the baseline across all evaluated metrics.
The comparison between the modified U-Net model and the baseline U-Net model demonstrates notable improvements across several key performance metrics, indicating the positive impact of the modifications. The modified U-Net exhibits a substantial increase in the DSC, with a 95% confidence interval ranging from 0.771 to 0.775, compared to the baseline model, which has a lower DSC interval of 0.737 to 0.741. Similarly, the mIoU for the modified model ranges from 0.627 to 0.633. At the same time, the baseline exhibits a lower mIoU range of 0.583 to 0.589, reflecting an enhanced ability of the modified U-Net to generate more accurate segmentations.
The sensitivity of the modified model, with a confidence interval of 0.797–0.821, is slightly lower than that of the baseline, which ranges from 0.866 to 0.885, indicating that the baseline model is more sensitive in detecting true positives. However, the specificity of the modified U-Net, with a confidence interval of 0.974 to 0.978, surpasses the baseline, which ranges from 0.952 to 0.955, indicating better performance in correctly identifying true negatives. Additionally, the accuracy of the modified model, with a confidence interval ranging from 0.911 to 0.913, is slightly higher than the baseline’s accuracy range of 0.894 to 0.895, indicating improved overall performance.
This improvement is particularly enunciated in the mean Intersection over Union (mIoU), which measures the overlap between the predicted segmentation and the ground-truth mask, and in Sensitivity, which is defined as the proportion of correctly detected vessels.
Table 5 compares the retinal vessel segmentation methods on the DRIVE dataset. While specific studies avoid one or more metrics, each approach demonstrates distinct strengths: high Sensitivity in Li et al. [
49] or high DSC in Kande et al. [
27]. In this light, the improved lightweight U-Net stands out with a balanced performance, with a DSC of 0.7871, an mIoU of 0.6318, a Sensitivity of 0.7421, a Specificity of 0.9837, and an Accuracy of 0.9113, garnering its robustness in correctly identifying vessel pixels while minimizing false positives. The findings underscore the significance of multi-metric assessments in comprehensively evaluating segmentation quality.
Figure 14 shows the images segmented from the CHASE dataset using the baseline U-Net and the lightweight U-Net. The images within the initial column are the input images that have undergone preprocessing. The subsequent columns present the images that have been segmented by the lightweight U-Net and the baseline U-Net, respectively. The final column displays the images that serve as the ground truth.
Figure 15 presents the metric for each model, which displays the box plot using 5-fold cross-validation and demonstrates its behavior.
Figure 15a shows the metrics that demonstrate relatively tight clustering, with DSC and mIoU values ranging from 0.79 to 0.80 and from 0.65 to 0.67, respectively, as well as uniformly high Specificity (0.98 to 0.99) and Accuracy (0.97 to 0.98). This consistent performance suggests that the lightweight U-Net achieves a favorable trade-off between precision (0.78–0.80) and Sensitivity (0.79–0.83) without over-segmentation. In contrast,
Figure 15b exhibits a marginally lower DSC (0.76–0.78) and mIoU (0.62–0.64), accompanied by significantly lower Specificity (0.76–0.78) and Accuracy (0.68–0.72). However, the baseline U-Net model has high Sensitivity (0.97–0.98) and competitive precision (0.85–0.87). The aforementioned patterns serve to emphasize the fundamental trade-off inherent in segmentation tasks. Specifically, while baseline U-Net’s high recall capacity can result in over-segmentation (and consequently lower Specificity), the lightweight U-Net attains a more balanced and stable performance across all metrics.
The comparison between the modified U-Net model and the baseline U-Net model in
Figure 15 reveals significant improvements in key performance metrics, highlighting the effectiveness of the implemented modifications. The modified U-Net exhibits a marked increase in the DSC, with a 95% confidence interval ranging from 0.791 to 0.801, indicating superior segmentation Accuracy compared to the baseline, which has a lower DSC interval of 0.625 to 0.635. Additionally, the mIoU for the modified model ranges from 0.654 to 0.668. In contrast, the baseline exhibits a higher mIoU, ranging from 0.693 to 0.714, demonstrating that the modified model achieves better overall segmentation quality.
The sensitivity of the modified model, with a confidence interval between 0.799 and 0.818, is slightly lower than that of the baseline (ranging from 0.976 to 0.979), suggesting that the baseline model is more sensitive to true positives. However, the Specificity and Accuracy of the modified model show remarkable improvement, with Specificity ranging from 0.985 to 0.986 and Accuracy from 0.974 to 0.975, compared to the baseline’s Specificity (between 0.970 and 0.971) and Accuracy (between 0.769 and 0.777). These results demonstrate that the modifications have significantly enhanced the model’s overall robustness, resulting in improved generalization and performance in segmentation tasks.
Table 6 presents the proposed improved lightweight U-Net model, which achieves the highest mIoU (0.6910) and Specificity (0.9843) metrics. This is in comparison to the baseline U-Net and the methods by Saha Tchinda et al. [
21] and Liu et al. [
23], as well as by Ding et al. [
24]. Despite exhibiting slightly lower Sensitivity (0.8220) and Accuracy (0.9718) metrics than the baseline U-Net model, the proposed framework demonstrates competitive performance in contemporary state-of-the-art methodologies.
As can be seen in
Figure 16, the images from the HRF dataset are presented. The initial column shows the preprocessed images, with subsequent columns depicting the images segmented by the enhanced lightweight U-Net model, the baseline U-Net model, and the corresponding mask images.
Figure 17 shows the boxplots of DSC, mIoU, Sensitivity, Specificity, and Accuracy, highlighting the enhanced and more consistent performance of the lightweight U-Net (lightweight U-Net, LU-Net) in comparison to the baseline U-Net on the HRF dataset. LU-Net demonstrates higher median values and narrower interquartile ranges, signifying stronger repeatability and robustness. The elevated DSC and mIoU indices demonstrate superior overlap with the ground-truth masks, while increased Sensitivity and Specificity indices reveal the effective capture of fine vessel structures alongside a lower rate of false positives. In addition, LU-Net’s Accuracy exceeds that of the baseline U-Net, indicating a greater proportion of correctly classified pixels overall. Thus, the findings demonstrate that LU-Net achieves not only superior average performance but also reduced variability, making it particularly suitable for clinical applications demanding consistent and reliable segmentation results.
The comparison between the modified U-Net model and the baseline U-Net model in
Figure 17 reveals significant improvements in various performance metrics, particularly in segmentation Accuracy and Sensitivity. The modified U-Net demonstrates an improvement in the DSC, with a 95% confidence interval ranging from 0.516 to 0.521, compared to the baseline model, which shows a lower DSC interval between 0.467 and 0.473. Similarly, the mean mIoU for the modified model ranges from 0.665 to 0.677. By contrast, the U-Net baseline has a lower mIoU range of 0.505 to 0.514, indicating that the modified U-Net is more effective at achieving precise segmentations.
The sensitivity of the modified model, with a confidence interval of 0.979–0.980, is higher than that of the baseline, which ranges from 0.946 to 0.951, indicating an enhanced ability of the modified model to detect true positives accurately. However, the Specificity of the modified U-Net, with a 95% confidence interval between 0.867 and 0.872, is only slightly higher than that of the baseline, which ranges from 0.855 to 0.867, indicating a comparable performance in correctly identifying true negatives. Finally, the Accuracy of the modified model, with a confidence interval ranging from 0.681 to 0.685, surpasses the baseline’s Accuracy range of 0.637 to 0.642, indicating a stronger overall performance.
Table 7 offers a comparative analysis of the proposed LU-Net’s performance with other methods on the HRF dataset about retinal vessel segmentation. While LU-Net demonstrated higher Dice Similarity Coefficients (0.6902 vs. 0.6417) and mean Intersection over Union values (0.5270 vs. 0.4725), there was a marginal decline in Sensitivity (0.8161 vs. 0.8559) and Accuracy (0.8437 vs. 0.8710).
However, LU-Net exhibited an enhanced Specificity (0.9707 vs. 0.9531). Several studies have reported sensitivities ranging from 0.7840 to 0.8612, with accuracies frequently exceeding 0.96. However, direct comparisons have been hindered by inconsistencies in the reported metrics (for example, DSC and mIoU were omitted). Nevertheless, the gains achieved by LU-Net in Dice and mIoU underscore the production of more spatially coherent segmentation masks, suggesting that its balance of Specificity and Sensitivity may be advantageous in clinical applications where minimizing false positives is prioritized while maintaining effective vascular detection.
Table 8 presents the results of an ablation study for the DRIVE, CHASE, and HRF dataset. The table comprehensively compares the modules and their impact on the metrics presented. The modification of the loss function from BCE to DICE loss initially resulted in enhanced mIoU metrics. Dice Loss optimizes the overlap between predicted and ground-truth regions, making it particularly effective in handling class imbalance and improving segmentation performance in small or sparse structures.
Furthermore, the table presents a comparison of a hybrid function that combines the BCE and DICE loss functions, each weighted at 0.5. Additionally, integrating reverse attention (RA) yielded notable gains in two of the three evaluated datasets. On DRIVE, RA raised the Dice score by 6.8% points and the IoU metric by 8.3%, while using CHASE_DB1, the proposed method improved these metrics by 2.7% and 4.3%, respectively. A non-significant measurable benefit was observed when using the HRF dataset.
Similarly, the experimental findings demonstrate that LU-Net outperforms the baseline on both the CHASE and HRF datasets, although it should be noted that each dataset responds to different training choices. For CHASE, the optimal segmentation is achieved by pairing AdamW with Dice loss, as evidenced by metrics such as Dice = 0.7946 and mIoU = 0.6598. This outcome underscores the efficacy of weight-decay regularization and synthetic variability in facilitating the recovery of thin vessels. The HRF model, which exhibits larger and more uniform vessels, is predominantly influenced by AdamW and Dice loss, with minimal gains from reverse attention. The optimal configuration achieves Dice = 0.7756 and mIoU = 0.6342. Across both datasets, AdamW consistently enhances overlap metrics; Dice loss improves Specificity at a negligible cost to Sensitivity; and the sub-two-million-parameter architecture provides competitive performance suitable for real-time, resource-constrained retinal imaging.
Table 9 presents the LU-Net model demonstrating remarkable computational efficiency, requiring only 1.94 million parameters and 12.21 GFLOPs. A comparison with the baseline U-Net reveals a
reduction in parameters and an
decrease in floating-point operations while maintaining or exceeding the performance of the MSMA Net and TP-UNET models. The proposed architecture’s substantial savings make it well-suited for deployment on hardware with limited resources and for large-scale clinical workflows where the speed of inference and the amount of memory used are critical.