1. Introduction
Retinal vascular image segmentation [
1,
2] aims to distinguish vascular pixels from background pixels in fundus color images, and its segmentation results can be used for the classification and auxiliary diagnosis of ophthalmic diseases. Different from general medical images, fundus images have low vascular contrast and multiple structural forms characteristics such as simplicity and complexity. The traditional retinal vessel segmentation methods [
3] mainly include thresholding, tracking, and filtering methods, which are directly based on the pixel intensity or morphological features of blood vessels and heavily rely on domain knowledge. For example, the Hanbay [
4] method is difficult to accurately characterize the contextual information of blood vessels due to its dependence on features such as color and shape. For this reason, many research works use methods based on deep convolutional neural networks, such as combining multiple U-Net networks [
5], to improve the performance of retinal vessel segmentation. In addition, a combination of coarse segmentation network and fine segmentation network [
6] was used to predict the probability map of retina from the input patch and refine the predicted map. However, the multiple cascaded deep networks mentioned above will increase computational costs and reduce the overall segmentation efficiency of retinal vessel images.
Some individual deep networks focus on obtaining richer feature maps. For instance, Guo et al. [
7] introduced an attention map based on the spatial attention mechanism into the U-Net network and calculated the weights of vascular and non-vascular regions to enhance the accuracy of vascular segmentation. To generate complete prediction results and accurately obtain vessels of different sizes and shapes, Wu et al. [
8] proposed a scale-aware feature aggregation module to dynamically adjust the receptive field for extracting multi-scale features and designed an adaptive feature fusion module to guide the effective fusion of features between adjacent layers and capture more semantic information. Zhong et al. [
9] proposed multi-layer multi-scale dilated convolution to capture sufficient global information under different receptive fields through a cascading mode. These methods have greatly promoted the application of deep convolutional neural networks in retinal vessel segmentation tasks. However, due to the loss of some spatial information during the downsampling process in the classic U-shaped network, the segmentation effect of retinal capillaries, regions around lesions, and fine vessels in low-contrast areas is poor. To address these issues, Li et al. [
10] fused edge and spatial features in a dual-encoder to provide enhanced edge information for the decoder and improve the model’s accuracy. Subsequently, Li et al. [
11] proposed a combination of interaction fusion blocks, cross-layer fusion blocks, and scale feature fusion blocks to effectively fuse features of the same scale, adjacent scales, and full scales, achieving better segmentation performance. To distinguish vessels from the background, Tan et al. [
12] introduced contrastive loss to improve the segmentation of vessels in lesion areas, but the segmentation accuracy of fine vessels was not high due to the lack of spatial information.
The aforementioned methods have achieved remarkable success in retinal vessel segmentation, primarily attributable to the powerful feature learning capabilities of fully supervised deep convolutional neural networks (DCNNs) and the availability of large-scale annotated datasets. However, medical image annotation involves substantial human and material resources. To address this challenge, various weakly supervised and semi-supervised learning approaches have emerged [
13,
14,
15,
16,
17]: weakly supervised methods train models using only image-level labels or bounding boxes without requiring pixel-level annotations, whereas semi-supervised methods achieve segmentation with minimal annotated samples. Both paradigms significantly reduce labeling costs, offering viable solutions for medical image analysis.
Motivated by semi-supervised learning framework, this study aims to minimize pixel-level annotation efforts while maintaining segmentation performance comparable to or exceeding fully supervised approaches. We propose a pseudo-label refinement strategy-based semi-supervised method for retinal vessel segmentation. The method innovatively integrates multi-criteria filtering mechanisms to selectively preserve high-confidence pixels in pseudo-labels while eliminating low-reliability regions, without compromising spatial information integrity. The experimental results on the public database prove the effectiveness and superiority of the proposed algorithm. The overview of the proposed general framework has been shown in
Figure 1.
To sum up, the main contributions of this paper are:
(1) This paper proposes a symmetry-aware pixel-level refinement-based semi-supervised learning framework for vessel segmentation, which leverages both limited labeled and abundant unlabeled images to train deep neural networks. By incorporating vascular symmetry constraints into the pseudo-label generation process, our approach effectively reduces the labor and resource consumption associated with manual data annotation through its capacity to exploit unlabeled data while preserving anatomical symmetry patterns in retinal vascular structures.
(2) We present a symmetry-preserving pixel-level pseudo-label refinement strategy that systematically eliminates low-confidence pixels in pseudo-label maps, thereby minimizing the interference from erroneous predictions inherent in pseudo-labeling. The proposed refinement mechanism employs a synergistic combination of multiple screening criteria, including probability-based thresholding, edge detection, image filtering, morphological symmetry enhancement operations, and adaptive thresholding strategies. Each component contributes distinct functionalities: probabilistic screening ensures confidence calibration, edge detection preserves structural continuity, morphological operations enhance vascular connectivity while maintaining bilateral symmetry, while adaptive threshold can adapt to the intensity changes of different imaging modes.
3. Method
3.1. Framework of the Proposed Method
Our proposed semi-supervised vessel segmentation method operates with limited labeled samples in the training dataset. Inspired by semi-supervised learning principles, the framework leverages multiple pseudo-label refinement strategies to iteratively update the training dataset and optimize the segmentation network.
Figure 1 illustrates the overall framework architecture, which systematically integrates pseudo-label generation, refinement, and model retraining across successive iterations.
We first apply image preprocessing and data augmentation to the images in the existing dataset. The preprocessing step is designed to enhance feature compatibility for subsequent neural network training. The dataset is then partitioned into labeled and unlabeled training subsets. After constructing the semantic segmentation neural network architecture, the model is trained on the labeled subset to obtain updated network weights. Subsequently, these weights are utilized to generate pseudo-labels for the unlabeled training data through forward prediction.The core innovation lies in our pixel-level pseudo-label refinement strategy, which systematically filters unreliable predictions from the pseudo-label maps through multi-criteria screening. The high-quality pseudo-labels obtained through this refinement process are merged with the original labeled dataset to form an augmented training set for network retraining. This iterative cycle—pseudo-label generation → refinement → dataset augmentation → network retraining—is repeated until the predetermined termination criteria are met. Finally, the fully optimized network architecture and converged parameters are evaluated on the independent test cohort to validate segmentation performance.
3.2. Image Preprocessing
This section focuses on two essential preprocessing stages: image enhancement and data augmentation. The primary objective of image preprocessing is to improve image quality by enhancing clarity and contrast, thereby facilitating vascular feature extraction. Data augmentation techniques are implemented to expand the dataset size, as larger dataset sizes enable neural networks to achieve more precise data fitting and improve model robustness.
The preprocessing and data augmentation process in this section mainly includes the following steps:
(1) Each retinal vessel image was randomly cropped into 400 patches of 96 × 96 pixels, as shown in the
Figure 2.
(2) Grayscale conversion of retinal vessel images is performed using the following equation:
where
grey(
m,
n) denotes the pixel value at the
m-th row and
n-th column of the grayscale image after conversion, and
red(
m,
n),
green(
m,
n),
blue(
m,
n) respectively represent the pixel values of the corresponding red, green, and blue channels in the original color image.
(3) The gray-level histogram of each retinal image is equalized to make the histogram conform to a normal distribution between 0 and 255. The gray-level values of pixels in each image are determined by the following formula:
where
IM denotes the original grayscale image,
represents the mean of all pixel values in image
IM,
denotes the standard deviation of pixel values in
IM,
IN follows a standard normal distribution with zero mean and unit variance, and
IE represents the result of
IM after histogram equalization and standardization.
(4) The Contrast-Limited Adaptive Histogram Equalization (CLAHE) algorithm [
55] is applied to sub-blocks in the image for image enhancement. Each sub-block has a size of 10 × 10 pixels.
(5) Gamma transformation is applied to all images to enhance image brightness, as defined in Equation (
4).
where
b denotes the grayscale value of a single pixel in the original image,
a represents the grayscale value of a single pixel in the image after Gamma transformation, and
is set to 1.2.
(6) The grayscale values of all pixels in all images are divided by 255, resulting in transformed values between 0 and 1, thereby completing the normalization.
3.3. U-Net Segmentation Model
Current mainstream segmentation networks include FCN [
56], U-Net [
36], U-net++ [
38], DeepLab [
57], etc. We conducted fully supervised training using the aforementioned networks, and all data operations and environmental settings followed the same procedures.
Table 1 presents the final test accuracy. As shown in the table, U-Net achieves the highest accuracy among the three models, and thus U-Net is selected as the baseline network for the proposed method.
We modified the employed U-Net architecture with a symmetry-aware design philosophy. The detailed structure is shown in
Figure 3. The symmetric encoder-decoder framework contains 6 convolutional layers with 3 × 3 convolutional kernels and two 2 × 2 max-pooling layers in the encoder path. The decoder comprises four 3 × 3 convolutional layers and two 2 × 2 upsampling layers, maintaining spatial symmetry coherence through mirrored channel expansion/contraction patterns. After each 3 × 3 convolutional layer, a Rectified Linear Unit (ReLU) and a dropout layer are added to prevent overfitting. During downsampling, the number of feature channels is doubled while the spatial dimensions (height and width) are halved to extract deep features, preserving bilateral symmetry in feature hierarchy. In the upsampling process, the number of channels remains unchanged, and the spatial dimensions are doubled to reconstruct anatomical symmetry patterns. Additionally, a 1 × 1 convolutional layer is appended at the end of the network to map class outputs to corresponding feature channels. The symmetry-guided skip connections concatenate shallow feature maps directly with their corresponding deep feature maps, integrating shallow and deep information through cross-scale symmetry preservation. This is crucial for recovering fine-grained details while maintaining topological symmetry in vascular structures, further improving segmentation accuracy. The symmetric architecture ensures consistent information flow in both encoding and decoding phases, enhancing the model’s ability to capture and reproduce anatomical symmetry patterns inherent in retinal vessel networks.
3.4. Pseudo-Label Generation and Filtering
Feeding unlabeled data into the trained network architecture and weights yields a two-channel probability matrix, corresponding to vessel and background channels respectively.
Here we denote x as the input pixel value and y as the output class label. The training dataset consists of labeled and unlabeled images. The labeled dataset is , and the unlabeled dataset is , where represents the labels for the labeled dataset, denotes vessel structures, denotes background, , N is the number of pixels in the labeled dataset, represents the pixel values in the input image, is the label for the unlabeled dataset, , and M is the number of pixels in the unlabeled dataset.
Our designed pseudo-label pixel-level filtering strategy comprises four components.
3.4.1. Filtering Strategy Based on Output Pixel Probabilities
If , is considered more likely to belong to the vessel class. If , is considered more likely to belong to the background class, where T denotes the threshold. Count the number of pixels satisfying as , and the number of pixels satisfying as . Let I represent the total number of iterations and represent the current iteration count. Sort all probabilities , select the top pixels with the highest probabilities, assign to these pixels (deemed reliable vessel pixels), and select the bottom pixels with the lowest probabilities, assign to these pixels (deemed reliable background pixels). For the remaining pixels, set , indicating these pixels have low reliability and should be excluded from subsequent training.
Due to the larger proportion of background compared to vessels during training, the learner’s response to the background gradually intensifies as the number of iterations increases. Consequently, pixels that should belong to vessels are increasingly mispredicted as background, leading to a continuous decrease in
for these vessel pixels until they fall below the threshold
T. To address this, we designed a dynamic threshold
T through experiments [
58], which decreases as
of these pixels diminishes. The threshold
T is determined by the following formula:
Here, z denotes the number of labeled images in the original unprocessed dataset (We apply Formula (5) to both the DRIVE and STARE datasets, with the only difference being that the value of z is determined based on the actual number of training samples in each dataset) The threshold strategy was empirically determined through experiments, which effectively suppresses the gradual decrease of predicted values for certain vessel pixels caused by the increasing number of iterations, thereby improving segmentation accuracy.
3.4.2. Filtering Strategy Based on Edge Detection
In the filtering strategy based on output pixel probabilities , pixels at the boundary between vessels and background often have ambiguous classifications, as their probabilities tend to cluster around the threshold T, causing them to be filtered out during the selection process. This leads to the loss of edge information in vessel segmentation, which ultimately compromises the segmentation accuracy.
Therefore, we employ the Sobel operator to perform edge detection on the network’s output probability maps during the filtering process. Let the edge detection result be . In the edge detection result, if , the pixel is considered an edge pixel. Here, we set . Count the number of pixels satisfying as . Sort all pixel values , and select the top pixels with the highest . For these pixels, if , set ; if , set . For the remaining pixels, retain the values obtained from the filtering strategy based on output pixel probabilities .
3.4.3. Filtering Strategy Based on Median Filtering
Due to noise in the images, some isolated background pixels are erroneously predicted as vessels in the segmentation results. Therefore, we apply median filtering to the pixel-wise probabilities
. Let the filtered pixel value be
. If
(The value of parameter
in the median filtering was set to 0.10, following the reference to the work proposed by Gour et al. [
59]), the pixel is considered a potential noise point and excluded from training in subsequent stages, i.e., set
for such pixels. The remaining
values are retained as previously defined.
We define a variable
, where
indicates that the pseudo-label of the
m-th pixel is of high quality and should be retained for subsequent training, while
denotes that the pseudo-label is unreliable and should be discarded. Specifically:
Figure 4 displays the retained (white regions, i.e.,
) and discarded (black regions, i.e.,
) pixels obtained through the three filtering strategies across five iterations. The fourth row represents the combined retained or discarded pixels from the first three strategies. The final row shows the pseudo-labels obtained by integrating all three filtering processes, where white regions denote retained vessel pseudo-labels, black regions denote retained background pseudo-labels, and gray regions represent eliminated pseudo-labels during iterations.
3.4.4. Filtering Strategy Based on Erosion
For unlabeled training data, due to the distinct boundary between the circular vessel regions in the original image and the peripheral regions without vessels, pixels at this boundary are often erroneously predicted as vessels. If these mispredictions are repeatedly reinforced during iterative training, errors will accumulate and severely degrade the learning performance and model accuracy. To address this issue, we apply symmetry-aware morphological erosion to the provided masks, shrink the central circular region, and set the central region pixels to 1 and peripheral region pixels to 0 (The erosion method we adopted is Python’s built-in function, with the code being: erosion = cv2.erode (image, kernel, iterations = 1). In this code snippet, image refers to the size of the input images from our dataset, kernel indicates the size of the convolution kernel used during the erosion process, and we set the kernel size to 3 × 3 with the number of iterations being 1.
Figure 5 are the comparison images generated after applying the erosion process. The images from left to right are: the predicted result after processing, the mask before processing, and the mask after processing). This symmetry-preserving mask is then multiplied with the original image and its labels, zeroing out pixels in the peripheral and boundary regions of the original image and labels. Subsequently, the aforementioned filtering strategies are applied to exclude these boundary and peripheral regions from training (i.e.,
).
In summary, we implemented seven distinct retinal image vessel segmentation methods based on different combinations of the pseudo-label filtering strategies described in this section, namely:
(1) Fully supervised vessel segmentation method (SS);
(2) Semi-supervised vessel segmentation method based on conventional self-training (SSS);
(3) Fixed-threshold semi-supervised vessel segmentation method incorporating the pseudo-label filtering strategy based on output pixel probabilities (SSS1);
(4) Fixed-threshold semi-supervised vessel segmentation method combining the pseudo-label filtering strategy based on output pixel probabilities and the erosion-based filtering strategy (SSS2);
(5) Fixed-threshold semi-supervised vessel segmentation method integrating the pseudo-label filtering strategy based on output pixel probabilities, the erosion-based filtering strategy, and the edge detection-based filtering strategy (SSS3);
(6) Fixed-threshold semi-supervised vessel segmentation method combining the pseudo-label filtering strategy based on output pixel probabilities, the erosion-based filtering strategy, the edge detection-based filtering strategy, and the median filtering-based filtering strategy (SSS4);
(7) Dynamic-threshold semi-supervised vessel segmentation method integrating the pseudo-label filtering strategy based on output pixel probabilities, the erosion-based filtering strategy, the edge detection-based filtering strategy, and the median filtering-based filtering strategy (SSS5), which is the improved semi-supervised vessel segmentation method proposed in this paper.
Additionally, due to the overwhelming majority of background pixels over vessel pixels in images, to balance the training between background and vessels and reduce the dominance of background pixels, we directly exclude images with insufficient vessel pixels in pseudo-labels (images with fewer than 100 predicted vessel pixels) from participating in pixel filtering and subsequent training. As shown in
Figure 6, the first row displays pseudo-labels, the second row shows corresponding ground-truth labels, and white squares indicate excluded pseudo-label images. It can be observed that images with insufficient vessels are indeed excluded, while images with sufficient vessels are retained as pseudo-labels.
3.5. Update and Retraining of the Labeled Training Set
We merge the aforementioned pseudo-labels and their corresponding original images with the original labeled training set to form a new labeled training set.
Before training the network, we input the original image and its corresponding label into the network. Since our vessel segmentation task is a binary classification (vessel and background), the labels input into the network should be two one-hot encoded matrices: one representing vessels and the other representing the background. The loss function is:
Here,
denotes the
i-th element value in the one-hot encoded vector corresponding to the vessel label, and
denotes the
i-th element value in the one-hot encoded vector corresponding to the background label. If filtering is not applied, then one must be 1 and the other must be 0, and the loss function becomes the traditional cross-entropy loss, as shown in Equation (
8).
After incorporating the filtering strategy, the designed loss function evolves into the form shown in Equation (
9):
At this point, if a pixel is retained (i.e.,
), the loss function remains equivalent to Equation (
8). If a pixel is discarded, the loss value becomes 0. According to the principle of backpropagation, during neural network training, such pixels will not contribute to weight updates. This achieves dynamic filtering of unreliable pixels during iterative training.
Here, the parameter
is derived from Equation (
10), and the Adaptive Moment Estimation (Adam) optimizer is employed for parameter updates.
For a more intuitive understanding of the proposed algorithm workflow, we summarize the aforementioned methods using pseudocode, Algorithm 1 describes in detail the implementation steps of the proposed method
Algorithm 1 Improved Semi-Supervised Vessel Semantic Segmentation Method Based on Pixel-Level Filtering Strategy |
Require: - 1:
: Labeled dataset , where N is the number of pixels in the labeled dataset; - 2:
: Unlabeled dataset , where M is the number of pixels in the unlabeled dataset; - 3:
: The value of the n-th pixel in the labeled image, ; - 4:
: The label of the n-th pixel in the labeled image, ; - 5:
: The value of the m-th pixel in the unlabeled image, ;
Ensure: - 6:
: The final network weight parameters obtained after training; - 7:
procedure
Initialization - 8:
Preprocess the training datasets and ; Initialize the U-Net network structure; Randomly initialize the U-Net network weights ; Initialize network parameters: epoch, batchsize, learning rate, dropout ratio, number of iterations; - 9:
end procedure - 10:
procedure
Step 2 - 11:
Train the U-Net network using the labeled dataset and update the U-Net network weights ; - 12:
end procedure - 13:
procedure
Step 3 - 14:
Use the updated network weights to predict the unlabeled dataset and obtain the pseudo-labeled dataset ; - 15:
end procedure - 16:
procedure
Step 4 - 17:
Apply the screening strategy to to filter out unreliable predictions, resulting in , where . Here, are the retained pixels and their pseudo-labels, while are the discarded pixels and their pseudo-labels; - 18:
end procedure - 19:
procedure
Step 5 - 20:
Merge the filtered pseudo-labeled dataset with the original labeled dataset , i.e., . Retrain the U-Net network using the new labeled dataset and update the U-Net network weights . During training, use the loss function formula (9) to exclude the discarded pixels from the training process; - 21:
end procedure - 22:
procedure
Step 6 - 23:
while termination condition not met do - 24:
Go back to Step 3; - 25:
end while - 26:
end procedure - 27:
procedure
Step 7 - 28:
Obtain the final network model parameters . - 29:
end procedure
|
4. Experimental Results and Analysis
4.1. Dataset and Evaluation Metrics
We validate and analyze the proposed method on two public retinal vessel datasets: DRIVE and STARE. The DRIVE dataset [
60] contains 40 color retinal images, with 20 images allocated for training and 20 for testing. The STARE dataset [
61] consists of 20 original retinal images, equally split between healthy and pathological cases. In experiments, the first 5 images from both healthy and pathological subsets were selected as the training set, while the remaining images formed the testing set. The study used publicly available, de-identified datasets; no additional IRB approval was required.
We employ the following four metrics to evaluate model performance: Accuracy (Acc), Sensitivity (Sen), Specificity (Spe), Dice and the Area Under the Curve (AUC) metric.
Accuracy (Acc): The proportion of correctly classified pixels among all pixels, which reflects the neural network’s ability to correctly identify vessels and background. This metric is defined in Equation (
11).
where TP (true positive) represents the positive cases judged to be positive, FN (false negative) represents the positive cases judged to be negative, TN (true negative) represents the negative cases judged to be negative, and FP (false positive) represents the negative cases judged to be positive.
Sensitivity (Sen): The proportion of correctly classified pixels among all predicted vessel pixels, which reflects the neural network’s reliability in identifying vessels. This metric is defined in Equation (
12).
Specificity (Spe): The proportion of correctly classified pixels among all predicted background pixels, which reflects the neural network’s reliability in identifying background. This metric is defined in Equation (
13).
AUC Metric: Since neural networks output probabilities rather than discrete classification results, different thresholds affect the values of variables in the confusion matrix. Therefore, we generate a set of confusion matrices using varying thresholds and plot the Receiver Operating Characteristic (ROC) curve by taking False Positive Rate (FPR) as the x-axis and True Positive Rate (TPR) as the y-axis. The Area Under the Curve (AUC) is computed to measure classifier quality. A higher AUC (closer to 1) indicates better performance, while AUC = 0.5 corresponds to random guessing. An effective iterative training framework should aim to maximize AUC toward 1.
4.2. Parameter Settings
All experiments were conducted on a single NVIDIA GeForce RTX 2080 Ti GPU. (NVIDIA is a multinational technology company headquartered in Santa Clara, CA, USA and it designs and develops the GeForce RTX series of GPUs, including the RTX 2080 Ti.) The code was implemented in Python 3.12 and trained using the TensorFlow framework.
In this experiment, we randomly initialized the U-Net model without any pretraining. The Adaptive Moment Estimation (Adam) optimizer was employed for model optimization. Unlike stochastic gradient descent (SGD), which requires a fixed learning rate, Adam adaptively adjusts the learning rate during training. Each iteration involved training the full dataset for 60 epochs, with a batch size of 32 images. To prevent overfitting, a dropout rate of 20% was applied between consecutive convolutional layers using 3 × 3 kernels. Additionally, 10% of the input images were allocated as the validation set. All input images underwent preprocessing and data augmentation. For the DRIVE dataset, each training image was randomly cropped into 400 patches of size 96 × 96 pixels [
62]. The 20 test images from DRIVE were reserved for final evaluation. A total of 5 iterative cycles were performed, with 20% of unlabeled pixels and their corresponding pseudo-labels incorporated into the training process in each cycle.
4.3. Quantitative Experimental Results and Analysis
4.3.1. Combined Filtering Strategies for Vessel Segmentation Methods Comparison
The primary objective of this experiment is to evaluate vessel segmentation methods based on combinations of pseudo-label filtering strategies on the DRIVE dataset. These methods include the seven approaches introduced in
Section 3.4 (SS, SSS, SSS1, SSS2, SSS3, SSS4, SSS5). The DRIVE dataset used in this study contains 20 training and 20 test images. In all semi-supervised methods, the training set is partitioned into a labeled subset (
n images) and an unlabeled subset (20-
n images). In SS method experiments, only
n labeled images were used as the training set for fully supervised methods. The four evaluation metrics (Acc, Sen, Spe, AUC) are presented in
Table 2, where two experimental results are reported: (1)
n = 1, labeled image ratio = 5%; (2)
n = 2, labeled image ratio = 10%.
From
Table 2, the following conclusions can be drawn:
(1) Semi-supervised methods outperform fully supervised approaches;
(2) When only output probability-based filtering is applied, overall performance fails to improve due to the lack of vessel boundary information, misclassification of circular region boundaries, and accumulated noise during iterative training;
(3) The integration of erosion-based, edge detection-based, and median filtering-based strategies leads to significant performance gains, validating the effectiveness of these filtering mechanisms;
(4) The incorporation of dynamic thresholding further enhances performance, demonstrating the importance of adaptive thresholding and confirming the validity of the proposed semi-supervised framework with multi-strategy filtering.
To further evaluate the segmentation performance of each filtering algorithm, we conducted related tests based on the DRIVE dataset as well. These methods include the four filtering strategies introduced in the paper and their various combinations. The DRIVE dataset used in this study consists of 20 training images and 20 test images. In all semi-supervised methods, the training set is divided into a labeled training set (
n images) and an unlabeled training set (20-
n images). In our experiments, only 2 images from the labeled training set (representing 10% of the labeled images) are used as the training set for the fully supervised method. We present the comparative experimental results in the
Table 3. It can be observed from
Table 3 that there is a significant gap in the Acc, Spe and Dice metrics between individual filtering strategies and the combined Strategy 1 + 2 + 3 + 4. Furthermore, as more strategies are combined, the overall segmentation performance continues to improve. Among them, Strategy 1, 2, 3, and 4 correspond sequentially to the four strategies mentioned in the paper: probability, edge, median, and erosion.
4.3.2. Comparison of Segmentation Performance Across Varying Numbers of Labeled Samples
In this experiment, we compared the improved semi-supervised segmentation method, the semi-supervised segmentation method, and the typical fully supervised segmentation method. For fair comparison, we used the same U-Net architecture, loss function, data preprocessing, and postprocessing procedures across all supervised, semi-supervised, and improved semi-supervised segmentation methods. In the DRIVE training set, we randomly selected n retinal images as labeled samples and the remaining 20-n images as unlabeled samples. In fully supervised learning, only the n labeled retinal images were used to train the U-Net network. In semi-supervised and improved semi-supervised learning, both n labeled and 20-n unlabeled retinal images were utilized for training.
We report accuracy (Acc), sensitivity (Sen), specificity (Spe), and AUC metrics for
n = 1 to 18. For each
n, ten distinct experiments were conducted, where
n retinal images were randomly selected as labeled samples in each experiment, and the remaining 20-
n images served as unlabeled samples.
Table 4 summarizes the average performance of the SS, SSS, and SSS5 methods across these ten experiments. From
Table 4, we observe that the SSS5 method outperforms the SSS method, and the SSS method outperforms the SS method, validating the effectiveness of the improved semi-supervised learning framework for retinal vessel segmentation.
For clarity, the experimental results in
Table 4 are visualized as boxplots in
Figure 7. The Y-axis represents the evaluation metrics, while the X-axis denotes the number of labeled images (
n = 1 to 18). For each
n, ten experiments were conducted. The boxplots comprehensively illustrate the distribution of the 10 experimental results. From top to bottom in these plots, the statistical characteristics of the results are clearly observable, including the upper extreme, upper quartile, median, mean, lower quartile, and lower extreme.
The boxplots lead to the following conclusion:
(1) Since the SSS5 method consistently outperforms the SSS method across all n values, and the SSS method consistently outperforms the SS method, this validates the superior performance and generalization capability of the improved semi-supervised learning framework.
(2) When n is small, our improved semi-supervised learning framework (SSS5) significantly enhances vessel segmentation performance in retinal images. Thus, even with minimal labeled samples, the method achieves high performance. In our semi-supervised framework, when n exceeds 8, segmentation performance plateaus without substantial improvement as n increases. Therefore, labeling only 8 retinal images yields satisfactory segmentation results, reducing labeling costs by 60%.
4.3.3. Performance Evolution of the Proposed Semi-Supervised Algorithm with Increasing Iterations
In this experiment, our objective is to analyze how the performance of the SSS5 method evolves with increasing training iterations. We recorded the experimental results of SSS5 under the condition of only one labeled image (
n = 1), as visualized in the boxplot of
Figure 8. Ten random experiments were conducted, and the boxplot comprehensively illustrates the distribution of these ten results in
Figure 8.
In the boxplot, the vertical axis represents the evaluation metrics, while the horizontal axis denotes the training iterations, where “S” denotes supervised learning and the numbers 1–5 indicate specific iteration stages. From the experimental results, it is evident that the mean values of precision (Acc) and sensitivity (Sen)—the two most critical metrics—gradually increase with the number of iterations. This leads to the conclusion that our improved semi-supervised learning framework enhances segmentation performance as iterations progress, demonstrating its convergence and stability.
4.3.4. Qualitative Experimental Analysis
For clarity and to further evaluate the proposed SSS5 improved semi-supervised algorithm, we conducted qualitative comparisons between the supervised method and several improved semi-supervised algorithms, as shown in
Figure 9. The qualitative comparison results demonstrate the role of each filtering strategy. In this experiment, we set the labeled training set to 1 image and the unlabeled training set to 19 images. The qualitative comparison is divided into five steps. After each step, we provide the qualitative test results and zoomed-in views. On the left is an original retinal image from the test set, along with its ground-truth annotation and segmentation mask.
Step 1: Test results after supervised training using labeled data.
Step 2: Test results trained with the improved semi-supervised method SSS1. As shown in the figure, compared to the supervised method, SSS1 predicts a more complete vascular tree, connecting the fragmented segments predicted by the fully supervised method.
Step 3: Test results trained with the improved semi-supervised method SSS2. As shown in the figure, this method effectively eliminates the circular region boundary mispredictions present in SSS1.
Step 4: Test results trained with the improved semi-supervised method SSS3. Due to the absence of edge information in SSS2, thin vessels were erroneously predicted as thick vessels. After incorporating an edge-aware filtering strategy in SSS3, the network learned rich vascular boundary details, resolving the overestimation of thin vessel thickness.
Step 5: Test results trained with the improved semi-supervised method SSS4. During SSS3 iterations, progressive noise accumulation occurred, leading to increased artifacts in testing. Adding median filtering in SSS4 effectively mitigated this issue.
The qualitative analysis demonstrates that the four filtering strategies we introduced are meaningful and each contributes uniquely to improving prediction results. After applying each filtering strategy, distinct improvements in segmentation accuracy are observed. The best prediction performance is achieved when all four filtering strategies are combined.
4.3.5. Comparison with Existing Methods
To further validate the overall performance of the proposed pseudo-label filtering semi-supervised method, comparative experiments were conducted on multiple public datasets and compared with state-of-the-art vessel segmentation model.
In the experiments, we used the results with 50% labeled training samples as the baseline for comparative analysis.
Figure 10 shows partial retinal vessel segmentation results on the DRIVE dataset. The first column displays the input images. The second column presents the segmentation results of our method, the third column shows results based on U-Net++, and the fourth column illustrates results from the DeepLab method. From
Figure 10, it is evident that our semi-supervised vessel segmentation method achieves superior vessel segmentation, particularly in distinguishing thin vessels, compared to the classical fully supervised U-Net++ method. This improvement arises from the adoption of a pseudo-label filtering training strategy, which avoids generating low-quality pseudo-labels during model training. Additionally, the slicing operation expands the training samples, enabling the network to accurately identify fine vessels.
Figure 11 presents partial retinal vessel segmentation results on the STARE dataset. The first column displays the input images. The second column presents the segmentation results of our method, the third column shows results based on U-Net++, and the fourth column illustrates results from the DeepLab method. As shown in
Figure 11, our method and the fully supervised U-Net++ method produce comparable segmentation outcomes, both effectively segmenting vessels correctly.
For some of the more severely affected retinal images in the STARE dataset, we have also included segmentation results in
Figure 12. In these figures, the first row shows three grayscale retinal input images, the second row presents the corresponding ground truths, and the third row displays the segmentation results achieved by our proposed method. It can be observed that for those severely affected retinal images, where the vessel pixels are distorted due to interference from lesion areas, our method can still segment the vessels. However, the overall segmentation performance on these diseased images is not as good as on healthy images.
Table 5 and
Table 6 quantitatively compare the proposed semi-supervised method with existing vessel segmentation approaches on the DRIVE and STARE datasets, respectively. The “Category” column in the tables specifies the category of each method: “S” for fully supervised, “U” for unsupervised, and “SS” for semi-supervised methods. From
Table 5, our semi-supervised method achieves the highest performance among semi-supervised and unsupervised methods in terms of Accuracy (Acc), Specificity (Spe), and AUC metrics. It also surpasses most fully supervised methods in Acc but underperforms CE-Net and Park et al.’s method in Sen. CE-Net introduces complex modules with dilated convolutions to expand the receptive field, enhancing real vessel detection and achieving a Sen of 83.09%. Park et al. construct an attention mechanism based on GANs, improving thin vessel recognition rates. As shown in
Table 6, our method again attains the top performance in semi-supervised and unsupervised categories across Accuracy (Acc), Specificity (Spe) and approaches the performance of fully supervised methods. Overall, our method, trained with only partially labeled datasets, employs a pseudo-label filtering strategy for retinal vessel segmentation and outperforms most fully supervised approaches.
4.4. Discussion
Based on the above experimental results, the following conclusions can be drawn:
(1) From the experiments on different filtering strategies: the segmentation performance of the semi-supervised method SS surpasses the fully supervised method S. Using only pixel-level filtering yields suboptimal results. However, combining all pseudo-label filtering strategies leads to the best vascular segmentation performance (SS5), outperforming traditional self-training semi-supervised method SS. These experiments validate the effectiveness of the proposed filtering strategies.
(2) From the experiments with varying training samples: the proposed pseudo-label filtering semi-supervised vessel segmentation method (SS5) demonstrates excellent generalization. When trained with limited pixel-level labels, it significantly improves segmentation accuracy. With more than 8 labeled samples, SS5 achieves performance comparable to fully supervised methods, reducing labor and resource costs in pixel-level annotation by 60%.
(3) From qualitative experiments: each pseudo-label filtering strategy contributes meaningfully. Different strategies improve pseudo-label quality to varying degrees, while the combination of multiple strategies (SS5) achieves the best vascular segmentation results.
(4) From comparisons with existing methods: the proposed pseudo-label filtering semi-supervised retinal vessel segmentation method (SS5) achieves strong performance with limited labeled samples. It surpasses most fully supervised methods in Accuracy (Acc) and other metrics on DRIVE and STARE datasets. The superior results demonstrate robustness and generalization, enabling SS5 to assist physicians in vascular segmentation research.
(5) Our method fully utilizes unlabeled samples. While slight performance gaps remain compared to some advanced fully supervised methods, SS5 significantly reduces the labor and resource costs caused by the need for large-scale labeled datasets in traditional supervised approaches.
4.5. Limitations
The proposed method has achieved good results, but it still has some limitations: To obtain efficient pseudo-labels, we added multiple label information to a single image, which indirectly increased the code execution time. In the future, we will focus on further reducing the label information, enabling the model to achieve satisfactory performance even with only a single-label image or image-level weak labels. Additionally, the segmentation performance varies across different datasets. We aim to improve the model’s generalization ability so that it can achieve good results not only on retinal images but also on industrial and natural scene images.