Semi-Supervised Learning Method of U-Net Deep Learning Network for Blood Vessel Segmentation in Retinal Images

: Blood vessel segmentation methods based on deep neural networks have achieved satisfactory results. However, these methods are usually supervised learning methods, which require large numbers of retinal images with high quality pixel-level ground-truth labels. In practice, the task of labeling these retinal images is very costly, ﬁnancially and in human e ﬀ ort. To deal with these problems, we propose a semi-supervised learning method which can be used in blood vessel segmentation with limited labeled data. In this method, we use the improved U-Net deep learning network to segment the blood vessel tree. On this basis, we implement the U-Net network-based training dataset updating strategy. A large number of experiments are presented to analyze the segmentation performance of the proposed semi-supervised learning method. The experiment results demonstrate that the proposed methodology is able to avoid the problems of insu ﬃ cient hand-labels, and achieve satisfactory performance.


Introduction
The analysis of retinal images is helpful for detecting many diseases including ophthalmopathy, diabetes and cardiovascular disease [1]. The automatization of this analysis process is able to reduce the costs associated with human effort and avoid the problems caused by manual work. In these retinal image analysis tasks, the automated segmentation of blood vessels is the crucial part. The task of segmenting the blood vessels aims to obtain the complete blood vessel tree in the retinal image. The blood vessel tree can be used to achieve many tasks, including retinal blood vessel quantification [1], retinal image registration [2], retinal mosaic [3] and biometric identification [4].
The automated segmentation of blood vessels is a challenging task in the intelligent analysis of retinal images because the size and shape of blood vessels vary hugely in different locations [5,6]. More and more scholars have proposed improved blood vessel segmentation methods. So far, these methods can be divided into three categories: image-processing based blood vessel segmentation methods, machine-learning based blood vessel segmentation methods and deep learning based blood vessel segmentation methods.
The image-processing based blood vessel segmentation methods usually design many segmentation rules according to the features of blood vessels, and then realize the automated segmentation of the blood vessels according to the designed rules. These methods can be divided into many sub-classes, such as matched filter response, blood vessel tracking, morphological approaches, and vessel model based approaches. The matched filtering approaches calculate the matched filter response (MFR) of the retinal image, and then segment the blood vessel tree by using the corresponding

Overview
Deep learning methods have achieved satisfactory performances in medical image segmentation. However, these methods mainly depend on the number and quality of the labeled medical images. If the labeled medical images are insufficient or low-quality, it is difficult to achieve the satisfactory segmentation result. In practice, different from the natural scene image, the task of labeling these medical images is very costly in terms of finance and human effort. In order to solve this problem, we proposed a novel semi-supervised learning method to achieve the segmentation of the blood vessel tree. The main purpose of this method is to achieve satisfactory performance by using the limited labeled retinal image.
The framework of our proposed semi-supervised learning method is shown in Figure 1. It includes two parts: parameter learning and dataset updating. In the parameter learning part, we first used a small number of labeled blood vessel data to train the improved U-Net deep neural network. Then, we used the designed dataset updating strategy to update the old dataset. The above steps would be repeated until the training accuracy met the requirement. In this method, we used a small amount of labeled data and a large number of unlabeled data to achieve the task of blood vessel tree segmentation. In the next section, the detailed information of the proposed method will be introduced.

Overview
Deep learning methods have achieved satisfactory performances in medical image segmentation. However, these methods mainly depend on the number and quality of the labeled medical images. If the labeled medical images are insufficient or low-quality, it is difficult to achieve the satisfactory segmentation result. In practice, different from the natural scene image, the task of labeling these medical images is very costly in terms of finance and human effort. In order to solve this problem, we proposed a novel semi-supervised learning method to achieve the segmentation of the blood vessel tree. The main purpose of this method is to achieve satisfactory performance by using the limited labeled retinal image.
The framework of our proposed semi-supervised learning method is shown in Figure 1. It includes two parts: parameter learning and dataset updating. In the parameter learning part, we first used a small number of labeled blood vessel data to train the improved U-Net deep neural network. Then, we used the designed dataset updating strategy to update the old dataset. The above steps would be repeated until the training accuracy met the requirement. In this method, we used a small amount of labeled data and a large number of unlabeled data to achieve the task of blood vessel tree segmentation. In the next section, the detailed information of the proposed method will be introduced.

Parameter Learning Strategy
Blood vessel segmentation is the key to realizing an automated retinal image analysis system. In order to achieve the satisfactory segmentation results, we often need to achieve complex feature engineering and select the appropriate classifier. This task is very costly in terms of both time and human effort. In order to solve this problem, we used the U-Net deep network to segment the blood vessel tree, which was able to learn the features of the given retinal image automatically, and avoid the problem caused by complex feature engineering.
The U-Net deep network contains numerous unknown parameters. The parameter learning process aims to estimate the unknown parameters by using the given training dataset. The typical U-Net consists of two parts: encoder architecture and decoder architecture. In our paper, we used an improved U-Net to segment the retinal blood vessel tree. The detailed network structure was as follows.
In the encoder part, we extracted the latent features from the given retinal images, and downsample the resolution by using the pooling technique. The encoder part included two 3x3

Parameter Learning Strategy
Blood vessel segmentation is the key to realizing an automated retinal image analysis system. In order to achieve the satisfactory segmentation results, we often need to achieve complex feature engineering and select the appropriate classifier. This task is very costly in terms of both time and human effort. In order to solve this problem, we used the U-Net deep network to segment the blood vessel tree, which was able to learn the features of the given retinal image automatically, and avoid the problem caused by complex feature engineering.
The U-Net deep network contains numerous unknown parameters. The parameter learning process aims to estimate the unknown parameters by using the given training dataset. The typical U-Net consists of two parts: encoder architecture and decoder architecture. In our paper, we used an improved U-Net to segment the retinal blood vessel tree. The detailed network structure was as follows. In the encoder part, we extracted the latent features from the given retinal images, and down-sample the resolution by using the pooling technique. The encoder part included two 3 × 3 convolutional layers and one 2 × 2 max pooling layer. Each convolutional layer was followed by the rectified linear unit (ReLU). In the down-sampling step, the number of feature channels were doubled.
In the decoder part, we reconstructed the feature map by using the up-sampling technique, and recovered the detailed information of the object by using the skip connections technique. The decoder part consisted of one 2 × 2 up-sampling layer, two 3 × 3 convolutional layers and one 1 × 1 convolutional layer. The nearest neighbor interpolation technology was used in the 2 × 2 up-sampling layer. Each 3 × 3 convolutional layer was followed by ReLU. The 1 × 1 convolution layer aimed to map each 32-component feature vector into the target category.
In addition, in order to increase the accuracy of the parameter learning method, we used the skip connection and the dropout regularization technique [72] in the training process. The skip connection technology can protect the details of the retinal image, and the dropout technology can avoid the over-fitting problem.
Let x i represent the i-th input retinal image. The loss function for training the U-Net is the standard binary cross-entropy loss [35]. The formula is defined by where f θ (x i ) is the predicted label of the i-th input image, and y i is the ground-truth. The U-Net model parameters θ can be estimated by solving the following optimization problem The Adaptive Moment Estimation (Adam) optimization algorithm [73] was used to solve this optimization problem.

Dataset Updating Strategy
The aim of the semi-supervised learning method proposed in this paper is to obtain the satisfactory blood vessel tree by using a small amount of labeled data. In practice, we can obtain a large amount of unlabeled retinal images easily. In order to make full use of these images, we implement a dataset updating strategy, as shown in Figure 1. The dataset updating strategy aims to mark the unlabeled images automatically, and increase the number of the labeled images, which is helpful for estimating the network parameters. The specific implementation process is as follows.
Suppose that the given training dataset D = x n , y n , x * m ; n = 1 : N, m = 1 : M and D = D 1 ∪ D 2 . The subset D 1 = x n , y n ; n = 1 : N is the given labeled dataset and the subset D 2 = x * m ; m = 1 : M is the given unlabeled dataset. Here, the x n represents the n-th retinal image and the y n represents the n-th ground-truth label. The x * m is the m-th retinal image without ground-truth. For convenient description, we let f θ represent the U-Net model, and the parameter θ is estimated by solving Equation (2). Let y * m represent the predicted label of x * m . Then, we can obtain y * m by Next, we can update the training dataset D by The updated dataset D * will be used as input to recompute the U-Net model parameters θ. To summarize, the pseudo-code of our proposed semi-supervised learning method is shown in the Algorithm 1, as follows: Algorithm 1 Semi-supervised learning method for the blood vessel tree segmentation Input: Training dataset D = x n , y n , x * m ; n = 1 : N, m = 1 : M , retinal image { x }. Output: The blood vessel tree image y . STEP 1: Initialization: • Learning method: epoch, batch size, learning rate, weight decay rate.
• Initial training set D.
• Initial U-Net model. STEP 2: Update the U-Net model parameters θ by solving Equation (2). STEP 3: Predict the pseudo-label y * m using Equation (3). STEP 4: Update training dataset D * using Equation (4). STEP 5: • while a stopping criterion is not met do • Update the U-Net model parameters θ using the updated training dataset D * ; • Return to STEP 3 • end while STEP 6: Take x as input and compute y using Equation (3).

Datasets
In experiments, we use the famous public dataset DRIVE, which consists of 40 retinal images and their corresponding ground-truth labels and ground-truth masks. The size of the fundus image is 565 * 584. We use 20 fundus images as the training dataset to train the improved U-Net, and 20 fundus images as the test dataset to evaluate the performance of the trained U-Net. As shown in Figure 2, (a) is a given fundus image, (b) is the ground-truth label, and (c) is the ground-truth mask.

Datasets
In experiments, we use the famous public dataset DRIVE, which consists of 40 retinal images and their corresponding ground-truth labels and ground-truth masks. The size of the fundus image is 565 * 584. We use 20 fundus images as the training dataset to train the improved U-Net, and 20 fundus images as the test dataset to evaluate the performance of the trained U-Net. As shown in Figure 2, (a) is a given fundus image, (b) is the ground-truth label, and (c) is the ground-truth mask.

Implementation Details
In the training process of U-Net, we use the fixed learning rate of 0.001, exponential decay rate of first-order moment estimation of 0.9, exponential decay rate of second-order moment estimation of 0.999 in Adam algorithm. The Adam (Adaptive Moment Estimation) algorithm is used to optimize the network parameters. The training process stops when the iteration number of the epoch is more than 100. Here we use the random initializations for the network parameters. The dropout technique is used in the improved U-Net, and the dropout rate is 0.2.

Evaluation Metrics
In experiments, we use the accuracy (ACC) to evaluate our proposed method , which has been widely used in much literature. Let = , represent the test dataset and represent our proposed method. It can be calculated by the following formula where ∏( = ) is the indicator function defined as

Implementation Details
In the training process of U-Net, we use the fixed learning rate γ 1 of 0.001, exponential decay rate of first-order moment estimation β 1 of 0.9, exponential decay rate of second-order moment estimation β 2 of 0.999 in Adam algorithm. The Adam (Adaptive Moment Estimation) algorithm is used to optimize the network parameters. The training process stops when the iteration number of the epoch is more than 100. Here we use the random initializations for the network parameters. The dropout technique is used in the improved U-Net, and the dropout rate is 0.2.

Evaluation Metrics
In experiments, we use the accuracy (ACC) to evaluate our proposed method f θ , which has been widely used in much literature. Let D = I n , y n represent the test dataset and f θ represent our proposed method. It can be calculated by the following formula here m i is the predicted result and m t i is the corresponding ground truth.

Quantitative Performance Comparison
In these experiments, we aimed to analyze the segmentation performance of the proposed method. In order to verify the effectiveness of the proposed semi-supervised method, we compared the semi-supervised method with the supervised method. The ACC comparison results between the supervised method and the semi-supervised method are shown in Table 1. In this experiment, we randomly divided the public DRIVE training dataset (20 fundus images) into A-group and B-group. The A-group included the n labeled fundus images, and the B-group included the 20-n unlabeled fundus images. As shown in the first column in Table 1, the parameter n = 4: 19. In the experiment of the supervised method, we used the n labeled fundus images (A-group) to learn the deep neural network, and assess the performance of the trained model. In the experiment of the semi-supervised method, we used the n labeled (A-group) and 20-n unlabeled (B-group) fundus images to learn the deep neural network, and then assess the performance of the trained model. In addition, in order to ensure reasonable experimental results, we gave ten experimental results. As shown in Table 1, the first row indicated the experiment number. In the corresponding column, we gave the experimental results for the different numbers of labeled images. In the last column, we also gave the mean ACC results of ten experiments. From the table, we can observe: (1) with the increase in parameter n, the segmentation accuracy of the supervised and semi-supervised methods increased. This is consistent with the fact that an increase in the numbers of labeled images is helpful to improve the accuracy of parameter estimation of a deep neural network model. (2) When n = 5:17, the accuracy of the semi-supervised method was better than that of the supervised method. These experiment results verified the effectiveness of our proposed semi-supervised method. A satisfactory segmentation result could be obtained by labeling a small number of fundus images.
For easy observation, the box plot of the experiment results obtained by the proposed semi-supervised method is shown in Figure 3. As shown in Figure 3, the characteristics of the experimental results distribution can be observed from the box plot. From top to bottom, the observed statistical characteristics included upper extreme, upper quartile, median, lower quartile and lower extreme. In Figure 3, the Y-axis was the accuracy (ACC) and the X-axis was the number of labeled images (n = 4:19). The circular sign represented outliers and the red line represented the median. The green curve represented the curve of ACC with the number of labeled images. For easy observation, the box plot of the experiment results obtained by the proposed semisupervised method is shown in Figure 3. As shown in Figure 3, the characteristics of the experimental results distribution can be observed from the box plot. From top to bottom, the observed statistical characteristics included upper extreme, upper quartile, median, lower quartile and lower extreme. In Figure 3, the Y-axis was the accuracy (ACC) and the X-axis was the number of labeled images (n = 4:19). The circular sign represented outliers and the red line represented the median. The green curve represented the curve of ACC with the number of labeled images.
From the figure, we can observe: (1) with the increase of the number of labeled images, the segmentation accuracy increased obviously. (2) When there were more than ten labeled images, the increase of segmentation accuracy was not obvious. Therefore, we can conclude that satisfactory segmentation accuracy can be obtained by labeling ten fundus images. The proposed semisupervised method can reduce the cost of labeling the retinal images in terms of both finance and human effort.  From the figure, we can observe: (1) with the increase of the number of labeled images, the segmentation accuracy increased obviously. (2) When there were more than ten labeled images, the increase of segmentation accuracy was not obvious. Therefore, we can conclude that satisfactory Symmetry 2020, 12, 1067 8 of 14 segmentation accuracy can be obtained by labeling ten fundus images. The proposed semi-supervised method can reduce the cost of labeling the retinal images in terms of both finance and human effort.
In order to further verify the effectiveness of this algorithm, we compared our proposed semi-supervised method with state-of-the-art methods. The ACC and AUC (area under curve) comparison results on the DRIVE dataset are shown in Table 2. In the table, we listed 46 state-of-the-art methods, including their type, year, and ACC evaluation metrics. In the second column, the symbol * represented the image-processing based method, the symbol + represented the machine learning based method, and the symbol ♦ represented the deep learning based method. In the second row, we listed the accuracy result obtained by a human observer. In the last row, we listed the accuracy result obtained by our proposed semi-supervised method by using eleven labeled fundus images. The number in brackets was the number of labeled images. Table 2. ACC and AUC comparison with state-of-the-art methods on the DRIVE dataset. The symbol * represents the image-processing based method, the symbol + represents the machine learning based method, and the symbol ♦ represents the deep learning based method.

Methodology
Type From the table, we can observe: (1) The ACC and AUC results obtained by our proposed semi-supervised method outperformed the results obtained by the human observer and many state-of-the-art methods.
(2) The proposed semi-supervised method only used eleven labeled fundus images, which meant that our method could save half of the cost of labeling the fundus images in terms of both finance and human effort. Therefore, we can conclude that the proposed semi-supervised method can obtain a state-of-the-art performance and avoid the problems of insufficient hand-labels.

Qualitative Performance Comparison
To further evaluate the proposed semi-supervised method, Figure 4 shows the qualitative performance comparison between the semi-supervised and the supervised method. The original fundus image is shown in Figure 4a and its ground truth is shown in Figure 4b. The segmentation results of the proposed semi-supervised methods by using 8, 12 and 15 labeled images are shown in Figure 4c-e, and the segmentation results of the supervised methods by using 8, 12 and 15 labeled images are shown in Figure 4g-i. For convenient observation, the local enlarged segmentation maps obtained by the proposed semi-supervised and supervised methods are shown in Figure 4f,j. From Figure 4f,j, we can see that the proposed semi-supervised method can obtain the complete retinal blood vessel tree, and the supervised method cannot obtain the complete retinal blood vessel tree. Therefore, we can conclude that the proposed semi-supervised method can achieve satisfactory segmentation performance.

Convergence Analysis
This experiment aims to verify the convergence of the proposed semi-supervised method. In this experiment, we carried out ten experiments. Based on the results of these ten experiments, we drew

Convergence Analysis
This experiment aims to verify the convergence of the proposed semi-supervised method. In this experiment, we carried out ten experiments. Based on the results of these ten experiments, we drew the box plot of the loss curve with time for the proposed semi-supervised method, as shown in Figure 5. The box plot was used to reflect the statistical characteristics of the loss curve. In the figure, the Y-axis was the loss and the X-axis was the number of epochs. Similar to Figure 3, the circular sign represented outliers and the red line represented the median. The green curve represented the loss curve with time. From the figure, we can observe: (1) With the increase of the epoch, the loss of the proposed semi-supervised method decayed rapidly.
(2) When the number of epochs was more than ten, the change of loss curve tended to be stable. This verified the convergence of the proposed semi-supervised segmentation method. . Qualitative performance comparison between the semi-supervised and supervised method: (a) original image; (b) ground truth; semi-supervised methods by using (c) 8, (d) 12 and (e)15 labeled images; supervised methods by using (g) 8, (h) 12 and (i) 15 labeled images; (f) the local enlarged map of (e); (j) the local enlarged map of (i).

Convergence Analysis
This experiment aims to verify the convergence of the proposed semi-supervised method. In this experiment, we carried out ten experiments. Based on the results of these ten experiments, we drew the box plot of the loss curve with time for the proposed semi-supervised method, as shown in Figure  5. The box plot was used to reflect the statistical characteristics of the loss curve. In the figure, the Yaxis was the loss and the X-axis was the number of epochs. Similar to Figure 3, the circular sign represented outliers and the red line represented the median. The green curve represented the loss curve with time. From the figure, we can observe: (1) With the increase of the epoch, the loss of the proposed semi-supervised method decayed rapidly.
(2) When the number of epochs was more than ten, the change of loss curve tended to be stable. This verified the convergence of the proposed semisupervised segmentation method.

Conclusion
In this paper, we propose a new deep learning based semi-supervised learning method for blood vessel tree segmentation in the retinal image. The proposed method is able to combine the deep neural network with the semi-supervised learning strategy, which deals with the costly problem of labeling the retinal images, in terms of both finance and human effort. On the DRIVE datasets, the proposed semi-supervised method only uses eleven labeled fundus images and outperforms the results obtained by the human observer and many state-of-the-art methods, which means that our method can save half of the cost of labeling the fundus images in terms of both finance and human

Conclusions
In this paper, we propose a new deep learning based semi-supervised learning method for blood vessel tree segmentation in the retinal image. The proposed method is able to combine the deep neural network with the semi-supervised learning strategy, which deals with the costly problem of labeling the retinal images, in terms of both finance and human effort. On the DRIVE datasets, the proposed semi-supervised method only uses eleven labeled fundus images and outperforms the results obtained by the human observer and many state-of-the-art methods, which means that our method can save half of the cost of labeling the fundus images in terms of both finance and human effort. In addition, a large number of experimental results show that the proposed semi-supervised method can effectively accomplish the blood vessel segmentation task and has good convergence. In the future, we plan to investigate the use of more improved training methods for semi-supervised learning. In addition, we are also interested in the use of the semantic priors of the blood vessel for further improvement.