Improvement of Retinal Vessel Segmentation Method Based on U-Net

: Retinal vessel segmentation remains a challenging task because the morphology of the retinal vessels reﬂects the health of a person, which is essential for clinical diagnosis. Therefore, achieving accurate segmentation of the retinal vessel shape can determine the patient’s physical condition in a timely manner and can prevent blindness in patients. Since the traditional retinal vascular segmentation method is manually operated, this can be time-consuming and laborious. With the development of convolutional neural networks, U-shaped networks (U-Nets) and variants show good performance in image segmentation. However, U-Net is prone to feature loss due to the operation of the encoder convolution layer and also causes the problem of mismatch in the processing of contextual information features caused by the skip connection part. Therefore, we propose an improvement of the retinal vessel segmentation method based on U-Net to segment retinal vessels accurately. In order to extract more features from encoder features, we replace the convolutional layer with ResNest network structure in feature extraction, which aims to enhance image feature extraction. In addition, a Depthwise FCA Block (DFB) module is proposed to deal with the mismatched processing of local contextual features by skip connections. Combined with the two public datasets on retinal vessel segmentation, namely DRIVE and CHASE_DB1, and comparing our method with a larger number of networks, the experimental results conﬁrmed the effectiveness of the proposed method. Our method is better than most segmentation networks, demonstrating the method’s signiﬁcant clinical value.


Introduction
Retinal vessels play an important role in various fields, especially in clinical medicine, wherein retinal vessels obtained are particularly important for diagnosing certain diseases and are also very easy to observe.Retinal morphology and density can provide timely feedback to the doctor and can cause visual impairment in mild cases and blindness in severe cases, such as diabetes, as well as cardiovascular diseases [1,2], so the precise segmentation of retinal vessels is crucial.The manual operation of retinal vessel segmentation was adopted in the early years, which is relatively backward.This method is not only time-consuming and laborintensive, resulting in low efficiency, but also the segmentation results are not ideal.With the development of medical images, more and more image segmentation methods are used to segment retinal vessels.These have achieved relatively good results.Therefore the accuracy of retinal vascular segmentation for blindness prevention is particularly important [3].
In the early stage of medical image segmentation, the development of Convolutional Neural Networks (CNNs) [4] has achieved good results in image segmentation.CNNs can automatically learn features mainly through multi-layer structures and are also capable of learning features at multiple levels.Long et al. [5] proposed an FCN network structure that can classify images at the pixel level.FCN replaces the fully connected layer in CNNs with a convolutional layer, which can solve the segmentation problem at the semantic level by classifying each pixel.On this basis, Ronneberger et al. [6] proposed a network structure similar to a u-shaped encoder-decoder, which is called U-Net.In the middle of the encoder and decoder, a skip connection is used to obtain better context information and global features, and four pooling layers can also realize the multi-scale feature recognition of image features.Because of the excellent performance of U-Net, most of the medical image segmentation networks are based on the improvement of U-Net.Zhou et al. [7] proposed a U-Net++ network structure whose main purpose is to solve the low efficiency of the U-Net network caused by a large number of experiments and the problem of feature fusion only at the same scale; it significantly improves the optimal depth of supervised learning, can flexibly aggregate more scale features in skip connection and improves the efficiency by pruning technology to achieve a better segmentation effect.Alom et al. [8] proposed the R2U-Net model for medical image segmentation, adding a convolutional neural network of cyclic residuals to the U-Net model, which allows the network to deepen while avoiding the problem of disappearing gradients in the better acquisition of features at the same time to improve the accuracy of segmentation.The Attention U-Net network proposed by Oktay et al. [9] can well incorporate an attention mechanism into the neural network to make the skip connection more selective, which can improve attention to the segmented region and improve performance without increasing the calculation amount.To better fuse features together, Huang et al. [10] proposed a multi-scale deeply supervised network structure U-Net+++, which can improve location awareness and enhance the segmentation of boundaries while providing fewer parameters.Trans U-Net network structure proposed by Chen et al. [11] uses the Transformer [12] structure on the structure of the U-Net encoder.Transformer structure can better extract global information, which can make up for CNN's inability to better extract global information so as to better extract features.
In retinal vascular image segmentation, U-Net and other related networks have also made many improvements.Jin et al. [13] proposed a DU-Net network structure, which uses deformable convolution to replace the original convolution layer, which combines low-level features with high-level features and realizes accurate segmentation according to the size and shape of blood vessels.Hu et al. [14] proposed an S-UNet network structure to avoid the loss of feature information caused by sampling under the U-Net structure.By connecting Mi-U-Net networks in series, this network structure can prevent the overfitting problem caused by small data volume.Wang et al. [15] proposed a FRNet that, by introducing a FAF structure, could efficiently combine features of different depths and reduce information loss.Yang et al. [16] proposed an improvement based on the U-Net network.The method used is to add a decoder.By adding a fusion network to fuse the outputs of two decoders, the two decoders are responsible for segmenting thin and thick vessels, thus achieving accurate segmentation of retinal vessels.Dong et al. [17] proposed a cascaded residual attention U-Net for retinal vessel segmentation, which they named CRAUnet.In this method, they used a similar DropBlock regularization, which could greatly reduce the overfitting problem.In addition, they also used an MFCA module to explore helpful information and to merge information instead of using a direct skip connection.Yang et al. [18] proposed a structure called DCUnet, wherein they used deformable convolution to construct the feature extraction module, and to improve the transfer efficiency, they used a residual channel attention module.Yang et al. [19] proposed a method based on residual attention and dual-supervision cascaded U-Net and named it RADCU-Net.This enhances efficiency while improving the accuracy of retinal vessel segmentation.
Combining the above analysis, an improved retinal vessel segmentation method based on the U-Net network structure is proposed.It has been shown to be effective in both DRIVE and CHASE_DB1 datasets.The main contributions to this paper are as follows: (1) A new network structure based on U-Net is proposed, which can be used to accurately and efficiently detect retinal blood vessels.(2) For the problem of partial feature loss caused by using convolution many times, ResNest is used to replace the original convolution layer of the encoder as the main network to enhance the feature extraction, which can better extract the image feature information.
(3) A novel DFB network architecture is proposed to solve the problem of up-and-down feature mismatch caused by skip connections.This can better achieve the image of low-level features and high-level features of the fusion to achieve accurate vascular segmentation.

Materials and Methods
Aiming at the problem of retinal vessel segmentation, an improved segmentation network based on U-Net is proposed.This section describes in detail the framework and modules of the proposed U-Net-based improved segmented network.

Network Structure
Figure 1 is the whole network structure of the proposed improved segmentation network based on U-Net.It is composed of two parts: the main U-shaped network and the multi-scale fusion block.In order to reduce the loss of more image features from encoding to decoding, a DFB structure is proposed in combination with an encoder-decoder framework.In order to solve the problem of information loss caused by convolution, ResNest Block is used to replace the convolution module, which makes the encoder extract image features better.In order to solve the problem of local feature information loss caused by using convolution multiple times in feature extraction, the original convolution module is replaced by ResNest Block so that the encoder can better extract image feature information.In order to solve the problem of image information loss caused by skip connection, a DFB network structure is proposed to optimize the original skip connection and achieve effective multi-scale feature representation.
(1) A new network structure based on U-Net is proposed, which can be used to accurately and efficiently detect retinal blood vessels.(2) For the problem of partial feature loss caused by using convolution many times, Res-Nest is used to replace the original convolution layer of the encoder as the main network to enhance the feature extraction, which can better extract the image feature information.(3) A novel DFB network architecture is proposed to solve the problem of up-and-down feature mismatch caused by skip connections.This can better achieve the image of low-level features and high-level features of the fusion to achieve accurate vascular segmentation.

Materials and Methods
Aiming at the problem of retinal vessel segmentation, an improved segmentation network based on U-Net is proposed.This section describes in detail the framework and modules of the proposed U-Net-based improved segmented network.

Network Structure
Figure 1 is the whole network structure of the proposed improved segmentation network based on U-Net.It is composed of two parts: the main U-shaped network and the multi-scale fusion block.In order to reduce the loss of more image features from encoding to decoding, a DFB structure is proposed in combination with an encoder-decoder framework.In order to solve the problem of information loss caused by convolution, ResNest Block is used to replace the convolution module, which makes the encoder extract image features better.In order to solve the problem of local feature information loss caused by using convolution multiple times in feature extraction, the original convolution module is replaced by ResNest Block so that the encoder can better extract image feature information.In order to solve the problem of image information loss caused by skip connection, a DFB network structure is proposed to optimize the original skip connection and achieve effective multi-scale feature representation.

Feature Extraction
It is very important to accurately extract the characteristics of retinal blood vessels if we want to understand the patient's condition through the morphology of retinal blood vessels.We think that the loss of image feature information in the U-Net network structure is mainly caused by convolution in the encoder, so we mainly make some changes to the encoder.Since the deep convolutional neural network has achieved good results in image processing, the ResNest [20] module is used to replace the convolutional block of the encoder for better extraction of image features.
ResNet [21], as one of the most successful CNN architectures, is widely used in computer vision.From ResNet to ResNext [22] and then to ResNest, as the most successful improvement of ResNet, it has a relatively good performance in computer vision downstream tasks.Not only is its computational efficiency the same as ResNet, but also its speed accuracy is better.ResNest also performs well relative to other networks of similar model complexity without introducing additional computational costs and can be used as a skeleton for other tasks.The network structure is shown in Figure 2.
It is very important to accurately extract the characteristics of retinal blood vessels if we want to understand the patient's condition through the morphology of retinal blood vessels.We think that the loss of image feature information in the U-Net network structure is mainly caused by convolution in the encoder, so we mainly make some changes to the encoder.Since the deep convolutional neural network has achieved good results in image processing, the ResNest [20] module is used to replace the convolutional block of the encoder for better extraction of image features.
ResNet [21], as one of the most successful CNN architectures, is widely used in computer vision.From ResNet to ResNext [22] and then to ResNest, as the most successful improvement of ResNet, it has a relatively good performance in computer vision downstream tasks.Not only is its computational efficiency the same as ResNet, but also its speed accuracy is better.ResNest also performs well relative to other networks of similar model complexity without introducing additional computational costs and can be used as a skeleton for other tasks.The network structure is shown in Figure 2.

Depthwise FCA Block
Due to the semantic gap in the skip connection of the U-Net, the context feature information is not processed properly, which causes the mismatch between the low-level image and the high-level image feature fusion.Therefore, this network structure (see Figure 3) is proposed to optimize the existing skip connection part and realize multi-scale feature fusion expression.

Depthwise FCA Block
Due to the semantic gap in the skip connection of the U-Net, the context feature information is not processed properly, which causes the mismatch between the low-level image and the high-level image feature fusion.Therefore, this network structure (see Figure 3) is proposed to optimize the existing skip connection part and realize multi-scale feature fusion expression.In the proposed structure, depth separable convolution [23], a lightweight network, is used.Compared to conventional convolutional operations, the number of parameters and operating costs is relatively low.In this paper, four depth-separable convolutions are In the proposed structure, depth separable convolution [23], a lightweight network, is used.Compared to conventional convolutional operations, the number of parameters and operating costs is relatively low.In this paper, four depth-separable convolutions are connected in parallel.Because too large of a convolution kernel can cause a waste of resources and computing costs, the convolution kernels of the four parallel depth-separable convolutions are selected as 1, 5, 9, and 13.On this basis, after the parallel depth-separable convolution is completed, stitching and clipping operations are performed, and finally the FCA Block [24] structure is entered (see Figure 4), so as to better process the feature information of low-level images.FCA Block is a frequency channel attention network that can highlight important channels in a multi-channel feature map and better express feature information.The FCA Block can also make up for the defect of insufficient feature information in the existing channel attention method, and the channel is generalized to a more general two-dimensional discrete cosine transform (DCT) form through a global average pool.In this way, more frequency components can be introduced to make full use of the information, and the image feature information processed by the encoder can be better fused to the decoder.In the proposed structure, depth separable convolution [23], a lightweight network, is used.Compared to conventional convolutional operations, the number of parameters and operating costs is relatively low.In this paper, four depth-separable convolutions are connected in parallel.Because too large of a convolution kernel can cause a waste of resources and computing costs, the convolution kernels of the four parallel depth-separable convolutions are selected as 1, 5, 9, and 13.On this basis, after the parallel depth-separable convolution is completed, stitching and clipping operations are performed, and finally the FCA Block [24] structure is entered (see Figure 4), so as to better process the feature information of low-level images.FCA Block is a frequency channel attention network that can highlight important channels in a multi-channel feature map and better express feature information.The FCA Block can also make up for the defect of insufficient feature information in the existing channel attention method, and the channel is generalized to a more general two-dimensional discrete cosine transform (DCT) form through a global average pool.In this way, more frequency components can be introduced to make full use of the information, and the image feature information processed by the encoder can be better fused to the decoder.

Datasets
In order to fully reflect the performance of the proposed network structure, the method is evaluated on two common datasets: DRIVE [25], CHASE_DB1 [26] (See Table 1).

Datasets
In order to fully reflect the performance of the proposed network structure, the method is evaluated on two common datasets: DRIVE [25], CHASE_DB1 [26] (See Table 1).Drive: There are a total of 40 fundus photos, 20 of which are used as a training set and 20 as a test set.The size of each image is 584 × 565 pixels, and the channel is 3.Each image also has a circular 45 • field of view (FOV) for performance evaluation.
CHASE_DB1: A total of 28 eyeground photographs of 14 children, 20 of them as a training set and 8 as a test set.The size of each image is 999 × 960 pixels, and the channel is 3.Each image also has a circular 30 • field of view (FOV) for performance evaluation.
Figure 5 is a partial sample image of both datasets.
Drive: There are a total of 40 fundus photos, 20 of which are used as a training set and 20 as a test set.The size of each image is 584 × 565 pixels, and the channel is 3.Each image also has a circular 45° field of view (FOV) for performance evaluation.
CHASE_DB1: A total of 28 eyeground photographs of 14 children, 20 of them as a training set and 8 as a test set.The size of each image is 999 × 960 pixels, and the channel is 3.Each image also has a circular 30° field of view (FOV) for performance evaluation.
Figure 5 is a partial sample image of both datasets.

Image Preprocessing
In order to train the proposed model better, image preprocessing is an important task.In the processing of retinal blood vessel images, the method of normalization processing is used.Since the DRIVE and CHASE_DB1 datasets have different pixels, we have standardized the image input pixels to 480 × 480.
Then each channel of the retinal vascular image is normalized.That is, the mean is subtracted from the characteristics of each channel, and then divided by variance.On this basis, we also use random flips and random clipping to enhance the data to achieve better training of the proposed model (see Figure 6).To prove that the preprocessing is effective, we performed a quantitative process (see Table 2), and the models for the comparison experiments we conduct later are in Table 3.

Image Preprocessing
In order to train the proposed model better, image preprocessing is an important task.In the processing of retinal blood vessel images, the method of normalization processing is used.Since the DRIVE and CHASE_DB1 datasets have different pixels, we have standardized the image input pixels to 480 × 480.
Then each channel of the retinal vascular image is normalized.That is, the mean is subtracted from the characteristics of each channel, and then divided by variance.On this basis, we also use random flips and random clipping to enhance the data to achieve better training of the proposed model (see Figure 6).To prove that the preprocessing is effective, we performed a quantitative process (see Table 2), and the models for the comparison experiments we conduct later are in Table 3.

Results
In order to better reflect the proposed U-Net based improved network for retinal blood vessel segmentation of the image effect, the use of comparative experiments and ablation experiments to prove its effectiveness.
This section first describes the relevant evaluation indicators, then tests the basic segmentation network structure and compares it with the proposed method.

Evaluation Indicators
In order to better highlight the effectiveness of the proposed model, some evaluation indicators, including accuracy, F 1 -score, sensitivity, specificity, and precision, were used to evaluate the segmentation ability of retinal vascular images.
A cc (Accuracy): This reflects the proportion of correctly classified blood vessels and background pixels to the total number of pixels (see Equation ( 1)).
Among them, TP and TN represent correctly segmented retinal vessels and background pixels, respectively, FP and FN represent incorrectly segmented retinal vessels and background pixels, respectively.
F 1 -score: The ability to measure the accuracy of a dichotomous model, taking into account both model accuracy and recall, is a harmonic mean of both accuracy and recall (see Equation ( 2)).
S e (Sensitivity): Also known as true positive rate (TPR), this represents the proportion of retinal vessels correctly identified (see Equation ( 3)).

S e =
TP TP + FN S p (Specificity): Also known as true negative rate (TNR), this represents the proportion of pixel points with correct background pixel classification to the total number of background pixels (see Equation ( 4)).
S p = TN TN + FP (4) P r (Precision): This represents the proportion of the number of correctly segmented retinal vessel pixels to the total number of segmented retinal vessel pixels (see Equation ( 5)).P r = TP TP + FP (5)

Experimental Setup
The proposed model is based on the Py-Torch framework.For training of the model, the epoch for training was set to 200.We used the SGD optimizer, setting the learning rate to 1 × 10 −2 , the momentum to 0.9, and the weight decay to 1 × 10 −4 .The batch size was set to 4. In addition, in order to speed up the network training and testing process, the Nvidia GeForce RTX5000 TI card was used on the above-mentioned experimental process (See Figure 7).
Pr (Precision): This represents the proportion of the number of correctly segmented retinal vessel pixels to the total number of segmented retinal vessel pixels (see Equation ( 5)).

Experimental Setup
The proposed model is based on the Py-Torch framework.For training of the model, the epoch for training was set to 200.We used the SGD optimizer, setting the learning rate to 1 × 10 −2 , the momentum to 0.9, and the weight decay to 1 × 10 −4 .The batch size was set to 4. In addition, in order to speed up the network training and testing process, the Nvidia GeForce RTX5000 TI card was used on the above-mentioned experimental process (See Figure 7).
For the loss function, cross entropy loss plus dice loss was selected.Cross-entropy loss, which is a common loss function in semantic segmentation, can not only obtain the difference in prediction probability but also measure the performance of different classifiers in more detail.For the loss function, cross entropy loss plus dice loss was selected.Cross-entropy loss, which is a common loss function in semantic segmentation, can not only obtain the difference in prediction probability but also measure the performance of different classifiers in more detail.

Cross-Entropy Loss
Cross-entropy is a very important concept in information theory, whose main purpose is to measure the difference between two probability distributions.For image segmentation, cross-entropy loss is calculated by the average cross entropy of all pixels.The definition of cross-entropy loss is: If Ω represents a pixel region, the pixel region consists of height a, width b, and K classes.Then there is y ∈ M a×b×K ({0, 1}), y ∈ M a×b×K ([0, 1]) (see Equation ( 6)).

Dice Loss
where X denotes the pixel tag of a true-segmented retinal vessel, and Y denotes the pixel category of the model-predicted retinal vessel-segmented image (see Equation ( 7)).

Ablation Experiments
In order to prove the effectiveness of each method, the ablation experiment is used to study the performance of each module for segmented images.First, the basic U-Net network is tested, and then the performance analysis is carried out by adding ResNest Block and DFB network structures.The results are shown in Table 4's ablation experiment results.From the results of the ablation experiment, it can be seen that, compared with the basic network structure model, our proposed model has improved in all indicators, especially in F 1 -score and P r indicators.
√ in the table means the module has been added, and × means the module has not been added.
On the DRIVE dataset, F 1 -score improved from 0.8278 to 0.8687, and P r improved from 0.7287 to 0.7863.When the DFB module is added, the F 1 -score, S p , and P r indicators were all improved, which is enough to prove the effectiveness of the proposed DFB module.
In the Chase_DB1 dataset, the performance indicator can also show that the proposed model has a good effect.The key indicators F 1 -score and P r are more effective, and the other indicators also show more outstanding performance than the basic network model.
From ablation experiments of the above two datasets, we can see that, except for A cc and S e , by adding the DFB module, the other performance indicators have been improved.Although A cc and S e decreased slightly after the addition of the DFB module, the performance of the proposed model is improved, which shows the validity of the proposed model for retinal image segmentation.

Comparison Test with Other Models
In order to better highlight the effect of the proposed model, we conducted a series of experiments in two public datasets, showing the effect of retinal blood vessel segmentation and corresponding indicators.The results were compared with more advanced models, with quantitative analysis and qualitative analyses of relevant indicators.
As shown in Table 5, the F 1 -score of the proposed improved network structure based on U-Net on the DRIVE dataset can reach 0.8687, the score on S p can reach 0.9703, and the score on P r can reach 0.7863, which are higher than the scores of other models.Compared to the other best data, the difference between A cc and the best score was 0.0080, and the difference between S e scores was 0.0842.On the CHASE_DB1 dataset, as shown in Table 6, the proposed improved network structure based on U-Net still obtains relatively good results.Among them, F 1 -score can reach 0.8358, S p score can reach 0.9720, P r score can reach 0.7330; these scores are also the highest compared with other models.Compared to the other best data, the numerical difference between A cc and the best score was 0.0060 and between S e and the best score was 0.1207.Obviously, the network structure has obvious segmentation effect in the two data sets.From the contrast experiment, the F 1 -score, S p , and P r of the improved model based on U-Net are the highest on DRIVE dataset and CHASE_DB1 dataset, but the effect of the change on A cc and S e is not obvious.Among them, the effect of A cc is not significantly reduced compared to other advanced models, so the improvement of A cc is effective on the whole.
The S e effect is not ideal or even slightly decreased in the ablation test and comparison test above, which may be caused by an excessive proportion of background noise pixels.Because the background pixel noise is too high, this will lead to missegmentation of the background pixel, resulting in an increase in FN in Equation ( 3), which will decrease S e .In addition, other noise factors may be introduced in the identification process, which may be identified as background pixels, so that the proportion of correctly identified retinal blood vessel pixels measured may be reduced, which may also lead to a slight decrease in the S e indicator.
In order to prove that the inference that the S e indicator obviously decreases in the above experiment is correct, we adopt the opening and closing operation on the picture to remove the noise of the background pixels and analyze the influence of noise on the S e indicator.
From the analysis of the S e performance indicator by denoising in Table 7, it can be seen that the performance indicator of S e rises after denoising, which proves that the inference we made above is correct.On the whole, the proposed model has better segmentation performance.We did the same for A cc and found that the denoised A cc metric also increased.The results are presented in Table 8.

Qualitative Analysis
In order to visualize the effect of retinal vessel segmentation, the performance of the proposed model was qualitatively analyzed by visualization.Some samples are selected from the DRIVE and CHASE_DB1 datasets as experimental objects, and the effect of the segmentation map can be seen in Figure 8.

Qualitative Analysis
In order to visualize the effect of retinal vessel segmentation, the performance of the proposed model was qualitatively analyzed by visualization.Some samples are selected from the DRIVE and CHASE_DB1 datasets as experimental objects, and the effect of the segmentation map can be seen in Figure 8.We can clearly see the effect of the proposed model on retinal vessel segmentation in Figure 8. Due to different light intensities in the original image, the segmentation of the darker regions of the retinal vessels is not ideal, and some regions are over-segmented.In the retinal blood vessels, there are more dense areas, which leads to the poor segmentation of small and thin vessels.However, based on qualitative and quantitative analysis results, the proposed model has better segmentation performance for retinal blood vessel segmentation.
For comparison, we also show predicted images from other models (Figure 9).We can clearly see the effect of the proposed model on retinal vessel segmentation in Figure 8. Due to different light intensities in the original image, the segmentation of the darker regions of the retinal vessels is not ideal, and some regions are over-segmented.In the retinal blood vessels, there are more dense areas, which leads to the poor segmentation of small and thin vessels.However, based on qualitative and quantitative analysis results, the proposed model has better segmentation performance for retinal blood vessel segmentation.
For comparison, we also show predicted images from other models (Figure 9).

Discussion
Retinal vessel segmentation is still a great challenge, because it is very meaningful for clinical diagnosis to segment retinal vessels accurately.However, now we are faced with not only the light intensity processing of the retinal blood vessel image and the contrast processing with the background image but also the thickness and density of the retinal blood vessel.There are also some problems, such as the accuracy of the methods and techniques of image segmentation.
Although various variants of the U-Net network have achieved good results in med-

Discussion
Retinal vessel segmentation is still a great challenge, because it is very meaningful for clinical diagnosis to segment retinal vessels accurately.However, now we are faced with not only the light intensity processing of the retinal blood vessel image and the contrast processing with the background image but also the thickness and density of the retinal blood vessel.There are also some problems, such as the accuracy of the methods and techniques of image segmentation.
Although various variants of the U-Net network have achieved good results in medical image segmentation in recent years, due to the problem of information loss caused by multiple convolutions when the encoder module extracts feature images and the mismatch between high-level and low-level features caused by the skip connection part, etc.In order to solve the above problems, the convolutional layer of the encoder is replaced by a ResNest Block network structure, so that feature extraction can be strengthened as much as possible and so that the problem of feature information loss can be reduced.On the basis of this, a new structure DFB module is proposed, which can strengthen the matching of high-level and low-level features and optimize the original skip connection part.
In order to better highlight the proposed model segmentation performance, this paper adopts the methods of quantitative analysis and qualitative analysis and adopts ablation experiments to prove its effectiveness.Through quantitative analysis to compare with the current more advanced models, we can compare the performance of the proposed model from various indicators.This paper selects five important indicators.The proposed model in F 1 -score, S P , and P r , three key indicators, has a better performance.Through qualitative analysis, we can observe that the performance of the segmented image is better.Ablation experiments verify that the proposed modules are improved.
However, the segmentation of retinal vascular images will face many challenges in the future, such as the problem of the uneven brightness of retinal vascular images, as well as retinal vascular density and thickness, etc.It is still the most difficult problem in retinal vascular image segmentation.In future work, we hope to make up for the shortcomings of A cc and S e and to continue to optimize the proposed network model to achieve more accurate segmentation of retinal vessels.

Conclusions
In this paper, an improved model based on the U-Net network structure is proposed, which shows a good segmentation effect for retinal image segmentation.Through a comparison of the more advanced network structure, it can reflect the effectiveness of the proposed network model structure.The contributions of this article are as follows: (1) In order to segment retinal vessels accurately, an improved structure based on U-Net is proposed, which can segment retinal vessels accurately and help patients understand their condition in time.(2) For the problem of information loss caused by using convolution many times, ResNest is used to replace the original convolution operation of the encoder as the main network structure, which can better extract the retinal vascular features, minimizing the problem of information loss.(3) To solve the problem of mismatch between high-level features and low-level features caused by skip connection, a novel DFB network structure is proposed, which can better realize the feature fusion of context and realize accurate vessel segmentation.
In the future research, we will improve each indicator of retinal blood vessel segmentation and achieve more accurate retinal blood vessel segmentation.Institutional Review Board Statement: Not applicable.

Figure 1 .
Figure 1.Network structure based on U-Net-improved segmentation network.Figure 1. Network structure based on U-Net-improved segmentation network.

Figure 1 .
Figure 1.Network structure based on U-Net-improved segmentation network.Figure 1. Network structure based on U-Net-improved segmentation network.

Figure 3 .
Figure 3. Diagram of the depthwise FCA block structure.

Figure 3 .
Figure 3. Diagram of the depthwise FCA block structure.

Figure 5 .
Figure 5. (a) DRIVE dataset and (b) CHASE_DB1 dataset.Looking from left to right, the leftmost column is the original image, the middle column is the corresponding ground truth, and the rightmost column is the corresponding mask.

Figure 5 .
Figure 5. (a) DRIVE dataset and (b) CHASE_DB1 dataset.Looking from left to right, the leftmost column is the original image, the middle column is the corresponding ground truth, and the rightmost column is the corresponding mask.

Figure 6 .
Figure 6.Examples of image preprocessing.(a) Original image.(b) Image graying and image normalization.

Figure 7 .
Figure 7. Training loss and accuracy in datasets.Figure 7. Training loss and accuracy in datasets.

Figure 7 .
Figure 7. Training loss and accuracy in datasets.Figure 7. Training loss and accuracy in datasets.

Figure 8 .
Figure 8. Retinal vessel segmentation sample image.(a) is the original images.(b) is the segmentation prediction maps of the proposed model, and (c) is the ground truth.

Figure 8 .
Figure 8. Retinal vessel segmentation sample image.(a) is the original images.(b) is the segmentation prediction maps of the proposed model, and (c) is the ground truth.

Figure 9 .
Figure 9. Predicted images from other models.The top row is the DRIVE dataset, and the bottom row is the CHASE_DB1 dataset.

Figure 9 .
Figure 9. Predicted images from other models.The top row is the DRIVE dataset, and the bottom row is the CHASE_DB1 dataset.

Table 1 .
About the retinal vessel segmentation datasets.

Table 2 .
Effect of preprocessing on experimental data.

Table 2 .
Effect of preprocessing on experimental data.

Table 3 .
Methods compared by the experiment.

Table 4 .
Results of ablation experiments.

Table 5 .
Comparison of performance indicators on DRIVE dataset.

Table 6 .
Performance metrics on the CHASE_DB1 dataset versus contrast with other models.

Table 7 .
Analysis of S e performance indicator by removing noise.

Table 8 .
Analysis of A cc performance indicator by removing noise.

Table 8 .
Analysis of Accperformance indicator by removing noise.