Artificial Intelligence in Ophthalmology: A Meta-Analysis of Deep Learning Models for Retinal Vessels Segmentation.

Background and Objective: Accurate retinal vessel segmentation is often considered to be a reliable biomarker of diagnosis and screening of various diseases, including cardiovascular diseases, diabetic, and ophthalmologic diseases. Recently, deep learning (DL) algorithms have demonstrated high performance in segmenting retinal images that may enable fast and lifesaving diagnoses. To our knowledge, there is no systematic review of the current work in this research area. Therefore, we performed a systematic review with a meta-analysis of relevant studies to quantify the performance of the DL algorithms in retinal vessel segmentation. Methods: A systematic search on EMBASE, PubMed, Google Scholar, Scopus, and Web of Science was conducted for studies that were published between 1 January 2000 and 15 January 2020. We followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) procedure. The DL-based study design was mandatory for a study’s inclusion. Two authors independently screened all titles and abstracts against predefined inclusion and exclusion criteria. We used the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool for assessing the risk of bias and applicability. Results: Thirty-one studies were included in the systematic review; however, only 23 studies met the inclusion criteria for the meta-analysis. DL showed high performance for four publicly available databases, achieving an average area under the ROC of 0.96, 0.97, 0.96, and 0.94 on the DRIVE, STARE, CHASE_DB1, and HRF databases, respectively. The pooled sensitivity for the DRIVE, STARE, CHASE_DB1, and HRF databases was 0.77, 0.79, 0.78, and 0.81, respectively. Moreover, the pooled specificity of the DRIVE, STARE, CHASE_DB1, and HRF databases was 0.97, 0.97, 0.97, and 0.92, respectively. Conclusion: The findings of our study showed the DL algorithms had high sensitivity and specificity for segmenting the retinal vessels from digital fundus images. The future role of DL algorithms in retinal vessel segmentation is promising, especially for those countries with limited access to healthcare. More compressive studies and global efforts are mandatory for evaluating the cost-effectiveness of DL-based tools for retinal disease screening worldwide.


Rationale
Visual impairment is a public health concern that has a negative impact on physical and mental health [1]. Visual impairment is associated with a high risk of chronic health conditions, including death. The prevalence and economic burden of visual impairment are exponentially increasing with an increasing number of aging populations [2]. It is estimated that the number of people with visual impairment will double by 2050 [3]. Several potential factors, such as cataract, age-related macular degeneration (AMD), diabetic retinopathy (DR), and glaucoma, are responsible for an increased risk of blindness [4,5]. This highlights the important public health burden that visual impairment and blindness place on our health care system. Therefore, the early detection and quantitative diagnosis of retinal diseases can help to develop more preventive measures, thereby reducing the number of newly diagnosed cases and the associated financial burden [6].

Solution Statement
Retinal fundus images are often used for early diagnosis of different ophthalmologic diseases, including DR and glaucoma [7]. Among various features in digital fundus images, retinal blood vessels provide useful information that is an important prerequisite for a number of clinical applications [8]. However, manual segmentation of retinal vessels by a trained human expert is time-consuming and highly variable [9,10]. The lack of human observers, infrastructure, and awareness are key challenges that need to be overcome. Over the past decades, automatic retinal vessel segmentation methods were mainly classified into two categories: supervised and unsupervised. Unsupervised methods are always dependent on threshold filter responses, handcrafted features, or other rule-based techniques. In contrast, supervised methods train a classifier to obtain discrimination between the vessel and non-vessel pixels. Recently, deep learning (DL) has achieved tremendous diagnostic performance in segmenting the retinal vessel [11,12]. The diagnostic accuracy of DL in retinal vessel segmentation has been shown to be comparable to the accuracy that was achieved by human experts. DL-based automatic systems offer potential benefits by reducing the manual work and achieving faster segmentation with reduced costs and resources. DL-based automatic tools can be incorporated into real-world screening programs that are not widely implemented or routinely practiced [13].

Goal of Investigation
Herein, we report the results of a comprehensive systematic review of DL algorithms studies that investigated the performance of DL algorithms for retinal vessel segmentation in digital fundus photographs. Our primary objective was to precisely gauge the performance of DL methods for retinal vessel segmentation from color fundus images. The evaluation of DL performance can help policymakers to understand how DL could be a clinically effective tool for segmenting retinal vessels in under-resourced areas with a severe shortage of experts and infrastructure.

Artificial Neural Network (ANN)
ANNs are one of the main tools used in AI. ANNs are inspired by the neurons of a biological brain that is intended to mimic the way that humans learn. ANN consists of input, hidden, and output layers. The input layer is the first layer that receives inputs in the form of numbers, documents, texts, images, or audio files. The middle layer is called the hidden layer, and a single layer neural network is called a perceptron. However, it can consist of multiple layers and output single or multiple outcomes.
In Figure 1, x 1 , x 2 , x 3 , and x 4 represent four inputs (independent variables) to the network. Each of the four inputs is multiplied by a random weight. The weights are represented as w 1 , w 2 , w 3 , and w 4 . Weight represents the strength of each node, while b is called the bias. A bias value lets the activation function go up or down. The following output is generated in the activation function: x 1 .w 1 +x 2 .w 2 +x 3 .w 3 +x 4 .w 4 (1) images, or audio files. The middle layer is called the hidden layer, and a single layer neural network is called a perceptron. However, it can consist of multiple layers and output single or multiple outcomes. In Figure 1, 1 , 2 , 3 , and 4 represent four inputs (independent variables) to the network. Each of the four inputs is multiplied by a random weight. The weights are represented as 1 , 2 , 3 , and 4 . Weight represents the strength of each node, while b is called the bias. A bias value lets the activation function go up or down. The following output is generated in the activation function: The activation function determines where a neuron would be activated or not by the sum of weights and with the addition of the bias. The primary objective is to introduce non-linearity into the output of each neuron.

Convolutional Neural Network (CNN)
A CNN algorithm consists of several network layers, such as input, convolutional, max pooling, average pooling, and output layers. The total number of layers can be increased or decreased based on the size of the input used to train the model. Usually, a deeper network will perform better with large datasets. The advantage of using a CNN is that it does not need any feature extraction. In the CNN model, the features are automatically hierarchically extracted from the input and they are further classified using a fully connected layer. Figure 2 shows the architecture of the CNN model. The activation function determines where a neuron would be activated or not by the sum of weights and with the addition of the bias. The primary objective is to introduce non-linearity into the output of each neuron.

Convolutional Neural Network (CNN)
A CNN algorithm consists of several network layers, such as input, convolutional, max pooling, average pooling, and output layers. The total number of layers can be increased or decreased based on the size of the input used to train the model. Usually, a deeper network will perform better with large datasets. The advantage of using a CNN is that it does not need any feature extraction. In the CNN model, the features are automatically hierarchically extracted from the input and they are further classified using a fully connected layer. Figure 2 shows the architecture of the CNN model.

Convolutional Layer
The convolutional layer always utilizes a convolutional function on the given input variables, such as digital images. A filter is moved over the given input variables with a stride (which describes how many pixels a filter will be translating horizontally and vertically), and the size of the stride is usually determined by the providers. It generates feature maps and it is used as the input of the subsequent layer.

Activation Function
Different types of activation functions are applied in the convolutional layers. They help to create a non-linear relationship between the data and the output class.
Let layer be a non-linearity layer that takes the feature volume ( −1) from a convolutional layer ( − 1) and generates the activation volume ( ) .
There are several types of activation functions, such as tanh, sigmoid, or ReLu, which are used to classify output variables. However, ReLu is a widely used activation function, because of its capability to reduce the exploding/vanishing gradient problem.

Max Pooling
A Max pooling layer is used to reduce the size of a feature. The value of the stride is selected according to the maximum value/average value ( Figure 3). The maximum/average value is taken by the stride to generate a matrix. However, the output size of the layer is smaller than the previous layer.

Convolutional Layer
The convolutional layer always utilizes a convolutional function on the given input variables, such as digital images. A filter is moved over the given input variables with a stride (which describes how many pixels a filter will be translating horizontally and vertically), and the size of the stride is usually determined by the providers. It generates feature maps and it is used as the input of the subsequent layer.

Activation Function
Different types of activation functions are applied in the convolutional layers. They help to create a non-linear relationship between the data and the output class.
Let layer l be a non-linearity layer that takes the feature volume Y (L−1) I from a convolutional layer (l − 1) and generates the activation volume Y There are several types of activation functions, such as tanh, sigmoid, or ReLu, which are used to classify output variables. However, ReLu is a widely used activation function, because of its capability to reduce the exploding/vanishing gradient problem.

Max Pooling
A Max pooling layer is used to reduce the size of a feature. The value of the stride is selected according to the maximum value/average value ( Figure 3). The maximum/average value is taken by the stride to generate a matrix. However, the output size of the layer is smaller than the previous layer.

Fully Connected Layer
The neuron of the previous layer i.e., the max-pooling layer is connected to each and every neuron in this layer. The output layer of the MLP will have 1 ( − ) outputs. In the output neurons, denotes the number of the layer in the MLP ( Figure 4). If − 1 is a fully connected layer;

Retinal Image Processing
The retinal vessel structure is compounded, and there are always immense differences between the vessels in various local areas in terms of size, shape, and intensity [14]. Thus, it is very difficult to build a model that can explain the compounded vessel structure. Some features are similar in shape and intensity with vessels (e.g., striped hemorrhage). Moreover, micro-vessels are very thin, and the width of the vessels varies (from one to several pixels), depending on the sizes and image resolutions. Therefore, it is challenging to differentiate retinal vessels from other similar features or noises. Multiple methods that were developed using vector geometry, image filters, and machine learning techniques have been used to generate the low-level feature vectors, which can detect the vessels [15,16]. The performance of these models sometimes relied on high-quality image features or heuristic presumptions. However, these traditional methods did not utilize generalized learning patterns to create feature vectors. Recently, deep learning algorithms have been used in retinal vessel segmentation, due to their ability to higher-level abstractions from diverse data by using multiple layers. Retinal vessel segmentation is conducted through pixel-wise processing. Vessel segmentation

Fully Connected Layer
The neuron of the previous layer i.e., the max-pooling layer is connected to each and every neuron in this layer. The output layer of the MLP will have m (l−i) 1

Fully Connected Layer
The neuron of the previous layer i.e., the max-pooling layer is connected to each and every neuron in this layer. The output layer of the MLP will have 1 ( − ) outputs. In the output neurons, denotes the number of the layer in the MLP ( Figure 4). If − 1 is a fully connected layer;

Retinal Image Processing
The retinal vessel structure is compounded, and there are always immense differences between the vessels in various local areas in terms of size, shape, and intensity [14]. Thus, it is very difficult to build a model that can explain the compounded vessel structure. Some features are similar in shape and intensity with vessels (e.g., striped hemorrhage). Moreover, micro-vessels are very thin, and the width of the vessels varies (from one to several pixels), depending on the sizes and image resolutions. Therefore, it is challenging to differentiate retinal vessels from other similar features or noises. Multiple methods that were developed using vector geometry, image filters, and machine learning techniques have been used to generate the low-level feature vectors, which can detect the vessels [15,16]. The performance of these models sometimes relied on high-quality image features or heuristic presumptions. However, these traditional methods did not utilize generalized learning patterns to create feature vectors. Recently, deep learning algorithms have been used in retinal vessel segmentation, due to their ability to higher-level abstractions from diverse data by using multiple layers. Retinal vessel segmentation is conducted through pixel-wise processing. Vessel segmentation If l − 1 is a fully connected layer;

Retinal Image Processing
The retinal vessel structure is compounded, and there are always immense differences between the vessels in various local areas in terms of size, shape, and intensity [14]. Thus, it is very difficult to build a model that can explain the compounded vessel structure. Some features are similar in shape and intensity with vessels (e.g., striped hemorrhage). Moreover, micro-vessels are very thin, and the width of the vessels varies (from one to several pixels), depending on the sizes and image resolutions. Therefore, it is challenging to differentiate retinal vessels from other similar features or noises. Multiple methods that were developed using vector geometry, image filters, and machine learning techniques have been used to generate the low-level feature vectors, which can detect the vessels [15,16]. The performance of these models sometimes relied on high-quality image features or heuristic presumptions. However, these traditional methods did not utilize generalized learning patterns to create feature vectors. Recently, deep learning algorithms have been used in retinal vessel segmentation, due to their ability to higher-level abstractions from diverse data by using multiple layers. Retinal vessel segmentation is conducted through pixel-wise processing. Vessel segmentation is considered to b a pixel-wise binary classification problem (vessel pixel versus non-vessel pixel). The CNN model with multiple layers differentiates images by analyzing them pixel by pixel, without considering the whole structure of the retinal vasculature [17]. The CNN model also combines multi-level features to provide higher segmentation performance. It can produce a vessel probability map while using the same size retinal images and a single forward propagation process.

Research Design
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), which is based on the Cochrane's Handbook for Systematic Reviews, was used to conduct the current study [18,19]. A review of the written protocol was drafted (Supplementary Table S1). The process of this study is given below:

Electronic Database Search
We systematically searched in the widely used search engines, namely EMBASE, PubMed, Google Scholar, Scopus, and Web of Science, to obtain potentially relevant studies that were published between 1 January 2000 and 15 January 2020, using the most appropriate free keywords ("Retinal vessel segmentation" or "Retinal blood segmentation" and ("Deep learning" or "DL" or "Convolutional neural network" or "CNN", or "Deep neural network", or "Automated technique", or "Artificial intelligence") ( Figure 5). J. Clin. Med. 2020, 9, x FOR PEER REVIEW 6 of 20 is considered to b a pixel-wise binary classification problem (vessel pixel versus non-vessel pixel). The CNN model with multiple layers differentiates images by analyzing them pixel by pixel, without considering the whole structure of the retinal vasculature [17]. The CNN model also combines multilevel features to provide higher segmentation performance. It can produce a vessel probability map while using the same size retinal images and a single forward propagation process.

Research Design
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA), which is based on the Cochrane's Handbook for Systematic Reviews, was used to conduct the current study [18,19]. A review of the written protocol was drafted (Supplementary Table S1). The process of this study is given below:

Electronic Database Search
We systematically searched in the widely used search engines, namely EMBASE, PubMed, Google Scholar, Scopus, and Web of Science, to obtain potentially relevant studies that were published between 1 January 2000 and 15 January 2020, using the most appropriate free keywords ("Retinal vessel segmentation" or "Retinal blood segmentation" and ("Deep learning" or "DL" or "Convolutional neural network" or "CNN", or "Deep neural network", or "Automated technique", or "Artificial intelligence") ( Figure 5).

Searching for Other Sources
We also carefully searched the bibliography of obtained studies that we deemed to be eligible and relevant previous review studies for additional study inclusion.

Eligibility Criteria
Eligibility was restricted to studies that examined the performance of DL algorithms for retinal vessel segmentation while using digital images. Studies were included if they fulfilled the following inclusion criteria: (1) published in English, (2) provided an outcome of DL algorithms and retinal vessel segmentation, (3) provided information on any of the evaluation metrics, such as accuracy, the area under receiver operating curve, sensitivity, or specificity, (4) provided clear information about

Searching for Other Sources
We also carefully searched the bibliography of obtained studies that we deemed to be eligible and relevant previous review studies for additional study inclusion.

Eligibility Criteria
Eligibility was restricted to studies that examined the performance of DL algorithms for retinal vessel segmentation while using digital images. Studies were included if they fulfilled the following inclusion criteria: (1) published in English, (2) provided an outcome of DL algorithms and retinal vessel segmentation, (3) provided information on any of the evaluation metrics, such as accuracy, the area under receiver operating curve, sensitivity, or specificity, (4) provided clear information about the image database and the number of images, (5) provided a clear definition of retinal vessel segmentation, and (6) clearly described the DL algorithms and process used in the retinal vessel segmentation.
Studies were excluded if they were published in the form of a review, editorial, research letter, letter to editor, or short communication.

Selection Process
Two authors (MMI, TNP) independently screened all of the titles and abstracts of previously obtained studies for inclusion in our systematic review and meta-analysis. They selected relevant studies that are based on the predefined selection criteria. Any disagreement at this stage was resolved by discussion with a prior agreement; any unsettled conflict was finally settled by discussion with the chief investigator (YC, L).
The same two authors used data collection forms to extract the relevant information from the previously obtained studies. MMI and TNP then assessed the obtained studies for duplication by comparing the publication date, author name, journal name, and sample sizes. Any duplicated study was excluded.

Data Extraction
The primary outcome measures were AUROC, sensitivity, and specificity of the performance of the DL algorithms for retinal vessel segmentation. We also recorded the total number of images used in the training and testing set. We also recorded data regarding the true positive, true negative, false positive, and false negative rate. Other data of interest included general information: author name, publication year, location, sensitivity, specificity, accuracy, AUROC, DL model, camera information, image pixels, and database.

Assessment of Bias Risk
Systematic reviews with meta-analysis of diagnostic studies might have heterogeneous findings due to differences in their study design [20]. Therefore, MMI and TNP independently utilized the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool for assessing the quality of the included diagnostic studies. The QUADAS-2 scale [21] comprises four domains: patient selection, index test, reference standard, and flow and timing. The first three domains are used for evaluating the risk of bias in terms of concerns regarding applicability. The overall risk of bias was categorized into three groups (low, high, and unclear risk bias) (Supplementary Table S2).

Statistical Analysis
The Meta-Disc (Version: 1.4, U. de Bioestadística, Madrid (España)) software was used to calculate the evaluation metrics, such as AUROC, sensitivity, specificity, and diagnostic odds ratio. The Meta-Disc was used to (a) perform statistical pooling from each individual study, (b) assess the homogeneity with a variety of statistics, including chi-square and I-squared. Six evaluation criteria were developed, including the area under the ROC (AUROC) curve, sensitivity (SN), specificity (SP), positive likelihood ratio (LR+), negative likelihood ratio (LR−), and diagnostic odds ratio.
The value of the AUROC curve ≥90, <0.90, <0.80, <0.70, and <0.60 were considered to be excellent, good, fair, poor, and failed, respectively. An I 2 value was calculated to assess the statistical heterogeneity among the included studies. An I 2 value of 0∼25%, 25∼50%, 50∼75%, and >75% were considered as very low, low, medium, and high heterogeneity, respectively [7]. The value of I 2 was computed, as follows: Here, Q = Cochrane's heterogeneity statistic and d f = degree of freedom. Negative values of I 2 are considered as zero; the I 2 value is between 0% (no observed heterogeneity) and 100% (maximum heterogeneity). It allows for calculating the AUC and Q* index, along with their standard errors.
The SE, SP, LR+, LR−, and diagnostic odds ratio are defined, as follows: where TP = Vessel pixels classified correctly, FN = Vessel pixels misclassified as non-vessel pixels, TN = Non-vessel pixels classified correctly, and FP = Non-vessel pixels misclassified as vessel pixels. The diagnostic odds ratio (DOR) was also computed for assessing how much greater the odds of having DR are for the people with a positive test result than for the people with a negative test result. DOR is calculated by the following equation, The likelihood ratios were calculated to express how much more frequent the respective finding is among the individuals with DR than among the individuals without DR.
The pooled AUROC was plotted with the SE versus (1 − SP) by varying the threshold. The perfect classifier achieved an AUC value that was equal to 1.

Study Screening
Our initial studies search of the five search engines yielded 2637 studies. 2520 studies were excluded because of duplication, and 82 out of 117 studies were excluded after reviewing the titles and abstracts that were based on our pre-specified inclusion criteria. We then reviewed the remaining 35 full-text studies and checked their reference lists for further relevant studies. However, we did not find any additional potential study. Three more studies were excluded for insufficient data, and one study was excluded because it was a review. Consequently, we included the remaining 31 studies for this systematic review and meta-analysis [11][12][13]. Figure 6 presents the flow diagram of the systematic studies search.  Table 1 presents a total of 31 studies that evaluated the performance of DL algorithms for retinal vessel segmentation. The publication years ranged from 2015 to 2019. All of the studies used DL algorithms, like CNN, MResU-Net, or U-Net, for retinal vessel segmentation. The range of accuracy and AUROC was between 0.85 and 0.99. Seven types of publicly available databases, such as DRIVE, STARE, CHASEDB1, HRF, TONGREN, DRIONS, and REVIEW, were used in their studies ( Table 2). The REVIEW database only had 16 images and the DRIONS database had a maximum number of 110 images. Each image had pixel-level vessel annotation provided by experts, and ground truth was used for image annotation (Figure 7).  Table 1 presents a total of 31 studies that evaluated the performance of DL algorithms for retinal vessel segmentation. The publication years ranged from 2015 to 2019. All of the studies used DL algorithms, like CNN, MResU-Net, or U-Net, for retinal vessel segmentation. The range of accuracy and AUROC was between 0.85 and 0.99. Seven types of publicly available databases, such as DRIVE, STARE, CHASEDB1, HRF, TONGREN, DRIONS, and REVIEW, were used in their studies ( Table 2). The REVIEW database only had 16 images and the DRIONS database had a maximum number of 110 images. Each image had pixel-level vessel annotation provided by experts, and ground truth was used for image annotation (Figure 7).     The dataset includes retinal images with 193 vessel segments, demonstrating a variety of pathologies, and vessel types (8 high-resolution, 4 vascular diseases, 2 central light reflex, 2 kickpoint). It also contains 5066 manually marked profiles. It has been marked by three observers.

Deep Learning Performance in Retinal Vessel Segmentation
The summary estimate for the retinal vessel segmentation sensitivity of DL systems was 0.77 (95% CI: 0.77-0.77) and the specificity was 0.97 (95%CI, 0.97-0.97) based on the 23 studies that utilized the DRIVE data set ( Table 3). The summarized AUROC was 0.96 ( Figure 8).

Deep Learning Performance in Retinal Vessel Segmentation
The summary estimate for the retinal vessel segmentation sensitivity of DL systems was 0.77 (95% CI: 0.77-0.77) and the specificity was 0.97 (95%CI, 0.97-0.97) based on the 23 studies that utilized the DRIVE data set ( Table 3). The summarized AUROC was 0.96 ( Figure 8). The 18 studies that used the STARE data sets had significantly higher sensitivity but the same specificity as the DRIVE data set. The summarized AUROC was 0.97, and the pooled sensitivity and specificity were 0.79 (95% CI: 0.79-0.79) and 0.97 (95%CI, 0.97-0.97), respectively ( Figure 9). The 10 studies that used the CHASEDB1 data set for evaluating the performance of DL in retinal vessel segmentation had a summarized AUROC of 0.96, and the pooled sensitivity and specificity were 0.78 (95%CI, 0.78-0.78) and 0.97 (95%CI, 0.97-0.97), respectively ( Figure 10). Furthermore, six studies that used the HRF data set to assess the performance of DL in retinal vessel segmentation had a summarized AUROC of 0.94, and the pooled sensitivity and specificity were 0.81 (95%CI, 0.81-0.81) and 0.92 (95%CI, 0.92-0.92) ( Table 3), respectively (Supplementary Figure F1-F4). The 18 studies that used the STARE data sets had significantly higher sensitivity but the same specificity as the DRIVE data set. The summarized AUROC was 0.97, and the pooled sensitivity and specificity were 0.79 (95% CI: 0.79-0.79) and 0.97 (95%CI, 0.97-0.97), respectively ( Figure 9). The 10 studies that used the CHASEDB1 data set for evaluating the performance of DL in retinal vessel segmentation had a summarized AUROC of 0.96, and the pooled sensitivity and specificity were 0.78 (95%CI, 0.78-0.78) and 0.97 (95%CI, 0.97-0.97), respectively ( Figure 10). Furthermore, six studies that used the HRF data set to assess the performance of DL in retinal vessel segmentation had a summarized AUROC of 0.94, and the pooled sensitivity and specificity were 0.81 (95%CI, 0.81-0.81) and 0.92 (95%CI, 0.92-0.92) ( Table 3), respectively (Supplementary Figure F1-F4).        Table 4 provides the performance of the unsupervised models that were proposed in the literature in relations of the typical incoherency measurements. Note: SE = Sensitivity, SP = Specificity, ACC = Accuracy.

Principal Findings
This systematic review with meta-analysis assessed the performance of the automated DL algorithms for retinal vessel segmentation from fundus retinal images. Our key findings are: (a) DL algorithms showed great performance when they assessed images that were available from four publicly available databases in terms of sensitivity and specificity; and, (b) the performance of DL was comparable to that of human experts (Table 3). Our findings suggest that the application of DL-based tools for retinal vessel segmentation could provide a substitute solution for eye screening, especially in clinical settings with a limited number of ophthalmologists and a scarcity of resources. The implementation of AI screening tools in real-world clinical settings can speed up the screening process, reduce cost, and improve patients care, since the performances of DL-based tools and human graders were similar.

Research and Clinical Implications
Automatic segmentation of retinal vessels is one of the most important elements in precision treatment dealing with huge datasets of retinal images. Manual segmentation is time-consuming and complex due to its structure and variability across human graders [41,53]. The automatic tool is clinically effective in segmenting retinal images, and identification might improve accurate diagnosis by non-retinal experts; therefore, the application of automatic tools to analysis of retinal images could provide an alternative solution for large-scale fundus images screening, especially in areas with limited access to ophthalmologic experts [6]. However, the automatic segmentation of retinal images is not an easy task, and several factors, including light exposure, camera focus, motion artifact, and existing diseases, can hamper the image quality [54][55][56]. These potential factors are often responsible for inhomogeneous image quality and, thus, hamper vessel segmentation. Accordingly, extensive efforts have been made to segment retinal vessels automatically while using machine learning techniques, but failed to show superior performance over human graders.
The DL algorithms have shown promising performance comparable to expert segmentation in fundus images [28,38]. The most unique advantage of DL is the ability to precisely learn and capture a huge amount of image features with varying hierarchies and locations. It has the great ability to optimally integrate these features to obtain a desirable finding. The results of our study show that the DL algorithms were able to segment retinal vessels with performances that were comparable to that of human experts. The accurate segmentation of retinal vessels assists in making appropriate clinical decisions. It will help for screening high-risk patients that need to receive proper treatment, such as retinopathy, whereas accurate segmentation can guide retinal disease management. DL-based automatic tools in retinal vessel segmentation could markedly change how retinal disease diagnosis and management is conducted in the near future. Automatic segmentation of retinal vessels could become popular in an era where fast, accurate, and low-cost treatment is recommended [40]. It would be particularly helpful for ophthalmologists who are not trained experts in retinal image identification. It would also assist experienced eye specialists to make a decision quicker and more accurately. Precise risk stratification for eye disease treatment, such as glaucoma, DR, would become possible. However, high quality of the image database is a prerequisite for successfully implementing DL-based automatic tools into retinal vessel segmentation.

Strengths and Limitations
Our study has several strengths that need to be addressed. First, this is the first systematic review and meta-analysis that addressed DL performance in retinal vessels detection. Second, this study included a total of 31 studies that had used seven different databases to assess the performance of DL.
Our results indicate that DL has immense potential to improve care. Third, we have compared the DL performance with the performance of human experts and other types of unsupervised methods. Our study also has some limitations. First, more than two-thirds of the included studies used the same three databases (namely, DRIVE, STARE, HRF); therefore, we cannot generalize our results as much as if more databases had been involved in our meta-analysis. However, some studies used the remaining four databases and achieved similar performances. Second, we did not include any study that evaluated a machine learning model to assess retinal vessels detection. Third, inherited retinal degeneration diseases are genetically heterogeneous. Therefore, changes in retinal vessels in fundus images could be different between patients with the same retinal diseases, and the performance of deep learning could vary. However, a robust design and a trained CNN model or novel post-processing can solve this problem.

Conclusions
Our current study findings show that DL algorithms achieved clinically acceptable performances in retinal vessel segmentation. The implementation of DL-based tools in retinal vessel segmentation can reduce manpower, the cost of retinal vessel screening, and resolve the problem of intra-grader and inter-grader variability. In the near future, DL techniques may play a significant role in determining ophthalmological diseases and predicting the prognosis for eye disease patients in an individualized manner. More careful, comprehensive designs and planning are needed in order to expedite this process.