Automatic Retinal Blood Vessel Segmentation Based on Fully Convolutional Neural Networks

: Automated retinal vessel segmentation technology has become an important tool for disease screening and diagnosis in clinical medicine. However, most of the available methods of retinal vessel segmentation still have problems such as poor accuracy and low generalization ability. This is because the symmetrical and asymmetrical patterns between blood vessels are complicated, and the contrast between the vessel and the background is relatively low due to illumination and pathology. Robust vessel segmentation of the retinal image is essential for improving the diagnosis of diseases such as vein occlusions and diabetic retinopathy. Automated retinal vein segmentation remains a challenging task. In this paper, we proposed an automatic retinal vessel segmentation framework using deep fully convolutional neural networks (FCN), which integrate novel methods of data preprocessing, data augmentation, and full convolutional neural networks. It is an end-to-end framework that automatically and efﬁciently performs retinal vessel segmentation. The framework was evaluated on three publicly available standard datasets, achieving F1 score of 0.8321, 0.8531, and 0.8243, an average accuracy of 0.9706, 0.9777, and 0.9773, and average area under the Receiver Operating Characteristic (ROC) curve of 0.9880, 0.9923 and 0.9917 on the DRIVE, STARE, and CHASE_DB1 datasets, respectively. The experimental results show that our proposed framework achieves state-of-the-art vessel segmentation performance in all three benchmark tests.


Introduction
Some pathological diseases in the human body can be detected through changes in the morphology and morphology of retinal vessels.Therefore, the condition of the retinal vessels is an important indicator for the diagnosis of some retinal diseases.For example, the progression of diabetic retinopathy is the most life-threatening because it leads to vision loss due to high sugar levels and hypertension in later stages [1].Doctors can detect other diseases of the body in advance by examining some eye diseases and make an early diagnosis of these diseases to carry out the corresponding treatment in advance.According to reports, early detection, timely treatment, and appropriate follow-up procedures can prevent about 95% of blindness [1].
Because the retinal vessels are easily photographed, the surgeon uses a retinal camera to acquire the retinal image of the patient.Fundus vessels are the most stable and important structures with symmetrical and asymmetrical patterns that are detectable.When the visual organ disease occurs in the eye, the diameter, color, and bending degree of the retinal blood vessel may be abnormal.Usually, the ophthalmologist manually performs blood vessel segmentation from the retinal image to extract lesion information.However, this work is cumbersome, error-prone, and time-consuming, even for experienced doctors [2].Therefore, automatic and accurate segmentation is essential.Currently, relatively new technology is the introduction of computer-aided diagnosis, which can preprocessing methods.Data augmentation methods can improve the generalization ability and performance of CNN.However, existing data augmentation methods are not necessarily applicable to retinal blood vessel images.Moreover, the existing methods still have problems such as low segmentation accuracy and poor generalization.The existence of these problems has become the main motivation for this research.
In this paper, we have devised a new automatic segmentation framework for retinal vessels based on deep fully convolutional neural networks (FCN).Our specific contributions are as follows: (1) We delved into the effects of several data preprocessing methods on network performance.
By performing grayscale, normalization, Contrast Limited Adaptive Histogram Equalization (CLAHE), and gamma correction on the retina image, the performance of the model can be improved.(2) We have devised a new data augmentation method for retinal images to enhance the performance of the model.It can be combined with existing data augmentation methods to achieve better results.We named it Random Crop and Fill (RCF).(3) We proposed M3FCN, an improved deep fully convolutional neural network structure, for retinal vessel automatic segmentation.Compared with the basic FCN, the M3FCN has the following three improvements: adding a multi-scale input module, expanding to a multi-path FCN, and obtaining the final segmentation result through multi-output fusion.The experimental results show that all three improvements can improve the performance of the model.( 4) We obtain the final segmentation image by overlapping the sampling test patch and the overlapping patch reconstruction algorithm.(5) We have proved through the ablation analysis experiments that the various improvements proposed in this paper are effective.Experimental results show that the proposed framework is robust and that the improved method has the potential to extend to other methods and medical images.
We tested the proposed framework on three standard retinal image datasets: DRIVE [10], STARE [3], and CHASE_DB1 [18].The proposed automatic retinal vascular segmentation framework can achieve state-of-the-art results, which prove the robustness and effectiveness of the method.The contributions of this paper also include retinal image preprocessing methods and new data augmentation methods.These methods will also benefit other vessel segmentation tasks.

Methodology
The new retinal vessel segmentation framework is shown in Figure 1, which consists of two stages.The first stage is the training stage, which consists of the following four steps: (1) preprocessing the retinal images; (2) extracting the patches by the dynamic extraction patch strategy; (3) inputting the patches to the fully convolutional neural network to extract features and classifications; (4) updating the network weights by the mini-batch Gradient Descent method and the backpropagation algorithm.The second stage is the testing stage, which also includes four steps: (1) the same data preprocessing method is used to process the test image; (2) the overlapping extraction patch method is used to extract the patches; (3) the patches are input to the fully convolutional neural network to extract features and classified to obtain segmentation patches; (4) the segmentation patches are reconstructed into target segmentation images by the overlap patch reconstruction algorithm.

Materials
Performances are evaluated on three public datasets: Digital Retinal Images for Vessel Extraction (DRIVE) [10], Structured Analysis of the Retina (STARE) [3], and CHASE_DB1 (CHASE) [18].Retinal images in three datasets were obtained under different equipment, illumination, etc.The DRIVE dataset contains 40 retinal images with a resolution of 565 × 584 px, 20 of which are used for training and the rest for testing.The STARE dataset has 20 retina images with a resolution of 700 × 605 px, 10 of which are lesions.The CHASE dataset consists of 28 retina images with a resolution of 999 × 960 px. Figure 2 shows an example of different datasets.All datasets provide segmentation results for two experts.We use the first expert results as ground truths and compare our segmentation results with the second expert results.

Dataset Preparation
All datasets are divided into training sets for training networks and test sets for performance evaluation.For DRIVE, we follow the criteria given by the data publisher, with 20 images for training and the remaining 20 images for testing.Similar data partitioning strategy was not initially provided for the STARE and CHASE datasets.For STARE, we used the 'leave-one-out' method proposed by [2,19,20], which is trained on 19 samples and tested on the remaining one iteratively.For CHASE, we adopted the split strategy proposed by [21], which is trained on 20 images and tested on the remaining eight images (from four children).

Image Preprocessing
After proper preprocessing of the image, the deep neural network can learn the image data distribution more effectively.Four image preprocessing strategies were used in our proposed framework to preprocess the images in sequence.Figure 3 shows the image after each preprocessing strategy.The first preprocessing strategy is to convert RGB images into single-channel grayscale images.Figure 3a-d shows that the original image and the converted single-channel grayscale image are red, green, and blue, respectively.By decomposing the RGB color image into three-channel monochrome images of red, green and blue, it can be seen that there is a higher degree of discrimination between the blood vessels and the background in the green channel, and the monochrome image of the red and blue channels has more noise and low contrast.Single-channel grayscale image shows better vessel background contrast than RGB images [19].Therefore, the original RGB image is converted into a single-channel grayscale image by [22].The conversion formula is as follows: where r, g, b are the red, green and blue channels, respectively.According to this equation, red, green and blue contribute 29.9%, 58.7% and 11.4% respectively, of which green is greater in all three colors.The grayscale image is shown in Figure 3e.
The second preprocessing strategy is data normalization.Normalizing the image can improve the convergence speed of the model [23].Let X = {x 1 , x 2 , ..., x n } be the image dataset.Z-score normalization [24] refers to setting each dimension of the data X to have zero-mean and unit-variance.The conversion formula is as follows: where µ and σ are the mean and standard deviation of X, respectively.Image data are mapped to a range of 0 to 255 by Min-Max scaling.The conversion formula is as follows: where The normalized image is shown in Figure 3f.
The third preprocessing strategy is to enhance the foreground-background contrast of the whole dataset using Contrast Limited Adaptive Histogram Equalization (CLAHE) [25].The CLAHE image is shown in Figure 3g.The last preprocessing strategy is to improve the image quality much further using gamma correction.In our experiments, the gamma value is set to 1.2.The gamma correction image is shown in Figure 3h.We implement the CLAHE and gamma correction function using OpenCV [26].

Dynamic Patch Extraction
A large number of images are often required to train the convolutional neural network, which can reduce over-fitting risk and improve model performance.However, the number of existing retinal image datasets is insufficient to support the training model.Inspired by [2,19,21,27], in our approach, patch-based learning strategies get patches differently that are fed to the network, depending on the stage of the framework.To solve the problem of insufficient memory during training, we dynamically extract a small number of patches in the loop training.Algorithm 1 describes the process of training FCN with a dynamic extraction patch strategy.Randomly generate the center coordinates (x, y) of the patch.Patches I and labels T are extracted from X and G centered on (x, y), respectively.end for end for loss = ). Update parameters θ using the Adam [28] optimizer.end for return θ.
During the test stage, the test set images are progressively oversampled.Let W, H be the image width and height.Let p be the patch size.Let s be the overlap sampling step size.Then, the number of patches per test image is: We oversample the testset image and use it as input to the model to obtain the corresponding segmented image.Finally, the segmentation images are reconstructed into the retina segmentation image by the overlap patch reconstruction algorithm.Algorithm 2 describes the process of testing FCN with the overlapping patches reconstruction algorithm.

A Novel Retinal Image Data Augmentation Method
Data augmentation is widely used in convolutional neural networks due to its effectiveness, scalability, and ease of implementation.Data augmentation operation of translating and rotating a few pixels of the training set image can generally improve the generalization ability of the model, reduce the risk of over-fitting, and improve the robustness of the model [29].Commonly used data augmentation methods are rotation, flipping, cropping, adding noise, and translation.However, as explained in [27], methods such as continuous rotation make the network more difficult to detect blood vessel segments.
We have devised a new data augmentation method for retinal images to enhance the performance of the model.We call it Random Crop and Fill (RCF).We proposed a novel data augmentation method called Random Crop and Fill (RCF) for the retinal image.The conceptual explanation of RCF is shown in Figure 4.The core idea of the RCF is to transform the local area of the image by applying a fixed size mask to the random position of each input image during each training period.In general, we found that the size of the area is a hyper-parameter that is more important than the shape, so, for the sake of simplicity, we used square clipping areas for all experiments.It consists of two steps of data manipulation.Firstly, we randomly select one patch (p × p) from the training set and randomly select a point C (x,y) from the patch as the cropping center.Let the ratio of the width of the square clipping area to the width of the patch as R. The area centered at C and having a width w = p × R is cropped from the patch.Secondly, for the cropped area, we consider the following three ways to fill: (1) Assign a random value of [0,255] to each pixel, denoted as RCF-R; (2) Fill the deleted area with 0, denoted as RCF-0; (3) Randomly select another patch from the training set and select the value of the corresponding area to fill in the deleted area, denoted as RCF-A.The results of the filling are shown in Figure 4. To ensure that the network sometimes receives unmodified patches, we perform RCF transformation with a probability p.The RCF method is a lightweight calculation method.At the expense of minimal memory consumption and training time, the diversity of the dataset is greatly improved, without the need to add additional training parameters and without affecting test time.This method is a novel data augmentation for retinal images that work well with existing data augmentation methods.On the other hand, we also use the data augmentation method to train FCN and discuss the association between RCF and data augmentation methods.We performed data augmentation in the following ways: (1) Randomly rotate 90 • , 180 • , 270 • with 50% probability; (2) Random horizontal and vertical flip with 50% probability.These transformations are applied to the original patches and the ground truths.

The Basic FCN Architecture
Our custom basic FCN implementation has an overall architecture similar to a standard FCN [30], as shown in Figure 5. Basic FCN include encoders and decoders symmetrically up and down the architecture.The encoding path is used to encode lower-dimensional input images with richer filters, capturing semantic/contextual information.The decoding path is designed to perform the inverse of the encoding and restore the spatial information by upsampling and fusion the low-dimensional features, which makes it possible to locate accurately.The difference between our custom basic FCN and the standard FCN is that we use the residual block [29] instead of one convolution layer.The shortcut connection in the residual block avoids the gradient disappearance problem of the CNN because the gradient of the residual block is always greater than or equal to 1 in back-propagation.Before each convolution layer, we use the batch normalization layer [23] to normalize the features so that they have a mean of 0 and a variance of 1.Using the BN layer can greatly improve the training speed and improve the stability and generalization of the model [23].The M3FCN consists mainly of three improved parts.The first is the multi-scale layer, which is used to construct image pyramid input and achieve multi-level perceptual field fusion.This is followed by a multi-path FCN that is used as a subject structure to learn a rich hierarchical representation.The final is multi-output fusion, which is combined with low-level features with advanced features to support deep supervision.(1) Multi-path FCN: Similar to the basic FCN architecture, M3FCN consists of two encoder paths ( 1 , 3 ) and two decoder paths ( 2 , 4 ).Each encoder path utilizes residual blocks and common convolutional layers to produce a set of encoder feature maps, and normalizes each layer feature map using the batch normalization layer and then activates them using the Leaky ReLU [31] activation function.Each decoder path decodes the features produced by the encoder using the deconvolution layer and the residual block, and normalizes each layer feature map using the batch normalization layer and then activates them using the ReLU [32] activation function.The skip connection fuses the feature map of the encoder with the feature map of the decoder.
(2) Multi-scale: The multi-scale inputs are integrated into the encoder path 1 to ensure feature transfer of the original image and effectively improve segmentation quality.We downsample the image using the average pooling layer, then use the convolution to expand the channel of the downsampled image and build a multi-scale input in the encoder path 1 .
(3) Multi-output fusion: In decoder path 4 , we extract the output feature map of each residual block, use the upsampling method to extend the feature map to the input image size, and then input them to the classifier for classification.Finally, the probability maps obtained by different classifiers are fused as the final classification result.For the retinal blood vessel segmentation task, the output is a two-channel probability map, where the two channels are the class number for blood vessel and background, respectively.
The new model has been trained to combine low-level features with advanced features and adaptively train the receptive field and sampling position based on the proportion and shape of the vessel, both of which enable precise segmentation.Through this architecture, M3FCN can learn to distinguish features and generate accurate retinal blood vessel segmentation results.

Evaluation Metrics
We use several metrics to evaluate the performance, including F1 score, Accuracy, Sensitivity, Specificity, and area under the ROC curve (AUC).When these indicators reach 1, the model is best.Different metrics are calculated as follows: where TP is the correctly labeled blood vessel pixel.TN is the background pixel that is correctly marked.FP is the background pixel that is mislabeled.FN is a blood vessel pixel that is mislabeled.

Implementation Details
The hardware environment of our laptops includes NVIDIA GeForce GTX 1060 GPU, Intel Core i7-7700HQ CPU@2.80GHz processor, 32 GB of RAM, and running Linux Ubuntu OS 16.04.All training and testing were performed in the same hardware environment.We initialize the network according to the initialization method proposed by [33] and use the Adam optimizer [28] to train the network and use the Softmax function for the final classification.The learning rate and mini-batch size are 0.001 and 256, respectively.Binary segmentation is obtained by thresholding the probability map to 0.49.The stride size is 5. On the DRIVE dataset, the patch size is 48, and the number of patches is 100,000.On the STARE dataset, the patch size is 256, and the number of patches is 1900.On the CHASE dataset, the patch size is 128, and the number of patches is 12,800.For more details on implementation, please refer to our code and logs at https://github.com/HaiCheung/FCN.All code is implemented using Pytorch [34].

Validation of the Image Preprocessing
We first evaluated the effect of the various steps involved in image preprocessing.The results for each variant are shown in Table 1.Comparing No. 0 with No. 1, it can be seen that, after conversion to a grayscale image, the experimental results are greatly improved compared with RGB color images, and the F1 score is increased by 1.06%.Therefore, subsequent experiments were performed using grayscale images.Comparing experiments with or without data normalization and CLAHE shows that these methods have a positive effect.When using gamma correction but not using CLAHE, performance is degraded, such as No. 4 and No. 6.It is worth noting that the combination of gamma correction and CLAHE is more efficient, such as No. 7 and No. 8.In particular, the F1 score reached 0.8321 when combined with these four preprocessing methods (No. 8), which was 2.73% higher than the baseline without any image preprocessing (No. 0).In the following experiment, we will continue to use these four preprocessing methods for experiments.When implementing RCF on FCN training, we have three hyper-parameters to evaluate, i.e., the probability p, the width ratio R, and the filling method.To demonstrate the impact of these hyper-parameters on the model performance, we conduct experiments on DRIVE based on M3FCN under varying hyper-parameter settings.When evaluating one of the parameters, we fixed the other two parameters.The results are shown in Figure 7 and Table 2.In the experiments, we used data augmentation to compare the performance of the RCF method under different parameters.In particular, when p = 0 or R = 0, it means using data augmentation but not using RCF and using this as our baseline.It can be seen from Figure 7 and Table 2 that, when p = 0.5, R = 0.5, and the filling method is RCF-A, M3FCN reaches the highest F1 score of 0.8321.Specifically, our best result increased the F1 score by 0.33% compared with the baseline.In the following experiment, we set p = 0.5, R = 0.5, and the filling method is RCF-A.

R
Baseline o

Validation of the Data Augmentation and RCF
To investigate the impact of data augmentation and RCF, we conducted experiments on the DRIVE based on M3FCN.The results for each variant are shown in Table 3.When applied alone, data augmentation (F1 = 0.8288) outperforms the RCF-A (F1 = 0.8255), but all outperforms the baseline (F1 = 0.8242).Importantly, RCF-A and the data augmentation methods are complementary.Particularly, the F1 score of 0.8321 can be obtained by combining these methods, which is 0.79% better than the baseline.In the following experiment, we will continue to use the data augmentation and RCF-A.Compared to the basic FCN with M3FCN, M3FCN mainly has the following improvements: (1) adding multi-scale input modules; (2) expanding to multi-path FCN; and (3) using multi-output fusion to obtain final segmentation results.In order to analyze the effects of these improvements, we performed an ablation experiment: comparing improved and non-improved FCN performance based on equal experimental setups.We evaluated the model using the DRIVE test data.F1 score, Accuracy, Sensitivity, Specificity, and AUC were shown in Table 4.It can be concluded from Table 4 that the performance of M3FCN is better than other improved models.The global F1 score for basic FCN and M3FCN is 0.8279/0.8321 on DRIVE.According to Table 4, we can draw the following experimental summary: (1) M3FCN performs better than other improved models; (2) the performances with multi-path FCN are better than non-multipath FCN, which indicates that multi-path structure has better feature extraction and noise reduction; (3) multi-scale input or multi-output fusion combined with multi-path FCN has a large performance improvement.The three improvements of multi-scale input, multi-path, and multi-output fusion have an impact on the performance of the model and can achieve optimal results after combination.

Comparison with the Existing Methods
Tables 5-7 compare the proposed method with several other state-of-the-art methods on the DRIVE, STARE, and CHASE datasets, respectively.Figure 8 visualizes the F1 scores for different methods.We observed that the M3FCN achieved the highest F1 score, indicating that it can more accurately segment the background and blood vessels.The F1 score is a judgment indicator of comprehensive recall and precision, and our approach has a better balance.M3FCN achieved the highest sensitivity, which indicates that it can correctly label more vessel pixels.Vascular pixels typically only account for 10% of all pixels in the image.The unbalanced category makes it more difficult to train a classifier for segmenting retinal blood vessels.Therefore, the high sensitivity obtained by our method is very important, and the computer-aided diagnosis system needs to detect blood vessels without adding false cases.At the same time, the M3FCN achieves the highest accuracy, indicating that it can better classify the background and blood vessels, which is not achieved by increasing the number of false positives and false negatives.It is worth noting that, in terms of the specificity of the STARE datasets, the M3FCN is only lower the basic FCN and is higher than the other existing methods.In the CHASE dataset, we ranked first among all indicators except for specificity.Zhang et al. [50] proposed a convolutional neural network based on atrous convolution, which combines low-level features with high-level features to obtain effective multi-scale features.Although the method of Zhang et al. [50] obtained the specificity of 0.9876 on CHASE, but the F1 score, Accuracy, Sensitivity, and AUC were much lower than M3FCN.In terms of specificity, M3FCN ranks third; however, because accuracy combines information from sensitivity and specificity, we can conclude that the gain of real detection is more important than the inclusion of error detection.
By jointly evaluating F1 score, Accuracy, Sensitivity, Specificity and AUC, the M3FCN shows the best performance on the DRIVE, STARE, and CHASE datasets.Zhuang et al. [21] proposed a U-Net based on a shared weight residual block, which improves the results by skip connections and the residual blocks.Compared with Zhuang et al. [21], the M3FCN improved the F1 score of 1.19%/2.13%on DRIVE/CHASE, respectively.Jin et al. [19] proposed DUNet using U-shape structures and local features to perform retinal vessel segmentation in an end-to-end method.Compared with Jin et al. [19], the M3FCN improved the F1 score of 0.84%/3.72%/3.61% on the DRIVE/STARE/CHASE, respectively.Therefore, the M3FCN is superior to other vessel segmentation methods in the DRIVE, STARE, and CHASE datasets.

Cross-Testing Evaluation
In clinical practice, it is not feasible to retrain the model whenever a fundus image of a new patient needs to be analyzed.Acquisition equipment from different hospitals often belongs to different manufacturers, so a reliable method must successfully analyze images acquired by different equipment.Therefore, robustness and generalization are important criteria for measuring the practical application capabilities of the model.In this section, we cross-tested the DRIVE and STARE datasets with reference to the generalization experiments of Jin et al. [19].Unlike the retraining model [47], we use the well-trained model described in Section 4.2.Using the STARE dataset training, the evaluation results of the DRIVE dataset test are reported in Table 8.Comparing Tables 5 and 8, the Accuracy and AUC of M3FCN decreased by 0.41% and 0.6%, respectively.The Accuracy and AUC of Jin et al. decreased by 0.85% and 0.84%, respectively.Using the DRIVE dataset training, the evaluation results of the STARE dataset test are reported in Table 9. Comparing Tables 6 and 9, the Accuracy and AUC of M3FCN decreased by 1.3% and 0.97%, respectively.The Accuracy and AUC of Jin et al. decreased by 1.96% and 1.42%, respectively.Obviously, our method yielded better results in cross-testing experiments than the method of Jin et al.For the DRIVE dataset, the M3FCN achieved the highest Accuracy and AUC, while the specificity was slightly lower than that of Li et al. [47].Li et al. [47] proposed a training strategy to effectively train wide-depth neural networks with strong inducing ability.For the STARE dataset, the M3FCN achieved the highest Accuracy, Sensitivity, and AUC, with the specificity slightly lower than that of Yan et al. [51].Yan et al. [51] divided the task into three phases and proposed a three-stage deep learning model to explore the segmentation of thick vessels and blood vessels.Although Li et al. and Yan et al. have achieved good specificity, they are far lower than our methods in other respects.The experimental results verified by cross-test show that our framework has better generalization and robustness in the face of new data.
In addition to the experimental results over the existing research, we believe that the proposed framework has better generalization and robustness due to the following points: (1) M3FCN has a deeper structure, and the skip connection at different distances greatly reduces the difficulty of training; (2) An appropriate training strategies, including data preprocessing, dynamic extraction of patches, and data augmentation.The deeper structure brings more parameters, and it also greatly increases the capacity of the model, which can improve the generalization ability of the model.Skip connection can maintain the gradient well to train the deep neural network.Skip connection is an important basis for successful deep neural networks [29].An appropriate training strategy can greatly improve the generalization of the model [52].In our proposed framework, we used an effective data preprocessing method to normalize the data, use dynamic extraction patches method and the new data augmentation methods to enrich the training samples.These solutions all increase the generalization capabilities of the model and can be expanded to other works.

Visualize the Results
Visualizations of the segmented vessel probability map of M3FCN for all three datasets are shown in Figure 9.In Figure 10, we visualize the results of the F1 score for each test image in the DRIVE, STARE, and CHASE datasets to further observe the segmentation results.From the segmentation results, the M3FCN is better than the basic FCN and the 2nd human expert.It can be seen from the mean and standard deviation that the segmentation result of M3FCN is more accurate and stable than the results of basic FCN and second human expert segmentation.This indicates that the M3FCN is highly generalized, has a strong ability to extract and recognize features, and can well segment the blood vessels of various retinal images.Whether in healthy or diseased retinal images, M3FCN can maintain a relatively stable segmentation effect, which can overcome the impact of lesions well.
In Figure 11, the Receiver Operating Characteristic (ROC) curve and Precision Recall (PR) curve are computed and visualized and compared with other several state-of-the-art methods such as DRIU [40], Wavelet [53], HED [54].The ROC curve gives information between the false positive pixel and the true positive pixel in the form of a fraction based on the threshold change on the probability map.The PR curve can better reflect the true performance of the classification when the ratio of positive and negative samples is large.The difference from the upper left convex of the ROC curve is that the PR curve is the upper right convex effect.The ROC and PR curve area on the M3FCN is 0.19% and 1.18% higher than DRIU [40], respectively.We also visualized the curve of the 2nd human expert segmentation results, and the ROC and PR curve area were 0.8781 and 0.7115, respectively.The results show that M3FCN can achieve better segmentation results than the 2nd human expert.On the STARE and CHASE datasets, we can get the same conclusions as to the DRIVE dataset.The M3FCN obtains the best performances on the DRIVE dataset (0.9880 AUC ROC , 0.8303 AUC PR ), the STARE dataset (0.9923 AUC ROC , 0.8528 AUC PR ) and the CHASE dataset (0.9917 AUC ROC , 0.8419 AUC PR ).It can be observed that the curve of the proposed methods is increasing at the true positive rate that indicates better performance of the proposed method than all other existing methods.Compared with other methods, M3FCN can extract deeper representation features, better segment background information and fine vascular information.

Figure 1 .
Figure 1.Overview of the proposed framework.

Figure 2 .
Figure 2. Examples of color images (first row) and labels (second row) from different datasets.

Algorithm 1
Training FCN with dynamic extraction patch strategy Input: Train images X ∈ R N×1×H×W , ground truths G ∈ R N×1×H×W .Input: Patch size p, dynamic patch number n. Input: Initial FCN parameter θ, epochs E. Output: FCN parameter θ.Initialize patch images I ∈ R n×1×p×p .Initialize patch labels T ∈ R n×1×p×p .for e = 1 to E do for n = 1 to N do for k = 1 to n N do

Figure 7 .
Figure 7. Test results under different hyper-parameters on DRIVE using M3FCN.

Figure 8 .
Figure 8. Visualization of F1 scores for different methods.

(
a) ROC curve and PR curve for various methods on the DRIVE dataset.(b) ROC curve and PR curve for various methods on the STARE dataset.(c) ROC curve and PR curve for various methods on the CHASE dataset.

Figure 11 .
Figure 11.Visualize the ROC curves and the PR curves.
as the upper left coordinate.Input x into the trained FCN to get the output y.Assign y to the corresponding area of O p .Assign 1 to the corresponding area of O s .

Table 1 .
Test results with different image preprocessing on DRIVE based on M3FCN.

Table 2 .
Test results under different hyper-parameters on DRIVE using M3FCN.

Table 3 .
Test results with data augmentation and RCF-A on DRIVE based on M3FCN.DA: Data augmentation.

Table 4 .
Test results with different improved FCN on DRIVE.

Table 5 .
Comparison of proposed methods with other methods in the DRIVE dataset.

Table 6 .
Comparison of proposed methods with other methods in the STARE dataset.

Table 7 .
Comparison of proposed methods with other methods in the CHASE dataset.

Table 8 .
Comparison of experimental results: training models using the STARE dataset, then testing on the DRIVE dataset.

Table 9 .
Comparison of experimental results: training models using the DRIVE dataset, then testing on the STARE dataset.