Automatic Segmentation and Classiﬁcation Methods Using Optical Coherence Tomography Angiography (OCTA): A Review and Handbook

Featured Application: Provision of a review and a handbook for automatic quantiﬁcation and classiﬁcation methods using optical coherence tomography angiography. Abstract: Optical coherence tomography angiography (OCTA) is a promising technology for the non-invasive imaging of vasculature. Many studies in literature present automated algorithms to quantify OCTA images, but there is a lack of a review on the most common methods and their comparison considering multiple clinical applications (e.g., ophthalmology and dermatology). Here, we aim to provide readers with a useful review and handbook for automatic segmentation and classiﬁcation methods using OCTA images, presenting a comparison of techniques found in the literature based on the adopted segmentation or classiﬁcation method and on the clinical application. Another goal of this study is to provide insight into the direction of research in automated OCTA image analysis, especially in the current era of deep learning.


Introduction
Optical coherence tomography angiography (OCTA) is an imaging technology that is able to produce images of vasculature that have an unprecedented resolution in a non-invasive and quick fashion [1]. It was originally introduced in the mid-1990s and was based on a combination of time domain optical coherence tomography and Doppler velocimetry [2]. Since then, OCTA imaging has further improved thanks to technological advancements, especially in recent years [3]. OCTA imaging is based on structural optical coherence tomography (OCT) imaging which produces images by measuring the amplitude and delay of reflected or backscattered light in an interferometrical manner [1]. One measurement takes the name of A-scan, whereas one B-scan (i.e., cross-sectional image) is generated by acquiring many A-scans one after another as the light beam is scanned in the transverse direction. The final volumetric information is generated by sequentially acquiring multiple B-scans. Figure 1 shows an example of how the acquired OCT data is arranged. OCTA images are instead obtained by taking advantage of the fact that everything but blood within the imaged volume is mostly stationary. Hence, if multiple B-scans are acquired at the same location, the obtained images should be the same except for the sites where blood is flowing. Then, by looking for pixel-to-pixel differences, which represent the reflectivity or scattering changes from one scan to the next, it is possible to image blood flow and obtain a final image volume of the vasculature. There are various algorithms that are employed to determine the final OCTA image with motion-contrast, also known as optical microangiography (OMAG) [4]. In OCTA imaging, the most popular algorithms use the OCT signal amplitude, the OCT signal phase, or both (also called complex amplitude). In particular, the split-spectrum amplitudedecorrelation angiography (SSADA) algorithm [5] was one of the first algorithms that was implemented within commercially available OCTA systems. Figure 2 depicts a block diagram example of an OCT system together with the signal processing unit to obtain OCTA A-scan signals and two example OCTA images. OCTA imaging presents many advantages when compared to other imaging modalities for vasculature, such as being quick and non-invasive, providing volumetric data that can allow the localization of pathology, and the ability to show both structural and blood flow information with a high resolution. Some of its current limitations include a relatively small field of view, a low penetration depth, and being prone to motion artefacts [1]. Hence, OCTA imaging is an ideal solution for a non-invasive quantitative analysis of superficial vasculature that does not cover too large of a surface area. In fact, the first clinical OCTA application is in ophthalmology, which is quite established in a clinical setting. In recent years, clinical applications of OCTA have also started branching out more as well, particularly for dermatological applications, which has recently been reviewed in [6].
As in numerous other medical imaging fields, there has been an extensive focus in recent years on quantifying and analyzing acquired OCTA images in an automatic or semi-automatic way to help physicians in making a diagnosis. This is known as the development of computer-aided diagnosis (CAD) systems. These systems aim to automatically extract quantitative information useful to clinicians or to automatically classify acquired images/volumes as healthy or pathological as a second opinion to experienced clinicians.
There are numerous reviews in literature that focus on the clinical applications of OCTA imaging, especially when considering ophthalmology and specific diseases, such as, but not limited to, diabetic retinopathy (DR) [7,8], age-related macular degeneration (AMD) [9] or glaucoma [10]. Other reviews found in literature focus on current and future clinical applications of OCTA imaging [3,11,12], to name a few. A couple recent studies focus on quantitative OCTA imaging, providing a nice overview of quantitative parameters that can be employed for artificial intelligence classification or comparing traditional and deep learning-based segmentation methods [11,12], but both are still limited to ophthalmological applications and do not go into much detail about the various automated methods. Hence, a review and handbook focusing on the actual analysis methods, such as specific segmentation and classification techniques, is still lacking for OCTA imaging. The objectives of this work are (1) to select high-quality papers that use an automated segmentation or classification method applied to OCTA images, (2) to highlight and compare the most commonly used methods for OCTA image segmentation and classification tasks, (3) to provide a handbook containing useful information on how to approach the issue of automatically analyzing OCTA images, and (4) to provide some insight on the direction of research in automated OCTA image analysis.

Literature Search Strategy and Study Selection
The PubMed, Scopus and Google Scholar electronic databases were used between March and August 2021 to find articles that employed an automated method for assessing OCTA images, regardless of the specific application (i.e., DR, dermatology, etc.). The keywords that were used for the electronic database search within the title and/or abstract were as follows: "optical coherence tomography angiography", "OCTA", "quantification", "quantifying", "segmentation", "automatic", "classification". In particular, the specific query that was used to search was ("optical coherence tomography angiography" OR "OCTA") AND ("quantification" OR "quantifying" OR "segmentation" OR "automatic" OR "classification"). The database search was limited to initial studies that were published after January 2016. Once the electronic database search was concluded, the reference lists of the identified articles were further analyzed in order to select any additional relevant studies.
Once the initial electronic database search was completed, the articles were screened by reading the titles, the abstracts, and briefly analyzing the Methods section to establish their suitability for inclusion in this review. Specifically, articles were excluded if they (i) were not written in English, (ii) were too similar to other studies, (iii) were not available in full text, (iv) did not enroll a sufficient number of subjects (<5 subjects) or only provided preclinical phantom or animal studies, (v) did not provide enough detail regarding the quantification/classification algorithm or if only a commercial software was employed or if only manual segmentations were employed, (vi) required multi-modal images for the correct implementation of the algorithm (e.g., OCTA image analysis based on fundus image), and (vii) were focused mainly on the characterization of quantitative features for a specific clinical disease and not on the quantitative feature extraction or classification. Furthermore, articles were excluded if they were out-of-topic with respect to the aims of the present review, such as methods or algorithms for the sole purpose of artefact removal for OCTA images. Hence, we excluded studies that focused only on OCTA image preprocessing, and studies that have an OCTA application but use mainly structural OCT data for the method implementation (e.g., retina layer segmentation) [14,15].

Data Extraction
After the initial database screening, the remaining studies were analyzed individually and the following information was extracted: study title, first author name, year of publication, imaging device used, imaging area field of view (FOV), anatomy of interest (e.g., eye, skin, etc.), if the proposed method had a final aim of segmentation and/or classification, the main category of the method used (e.g., segmentation based on thresholding or clustering, etc.), details of the proposed method, if 2D or 3D data were used, database information, validation methods, and the final performance results. During this process, some initially included studies were removed as after a more detailed analysis, it was found that they did not meet the inclusion criteria (e.g., preclinical murine model studies).
This review and handbook is organized as follows: Section 3 provides an initial overview of the global findings after the literature review and then goes into detail regarding the studies found, dividing them into ones focusing on automatic segmentation methods (Section 3.1) or ones focusing on an automatic classification (Section 3.2). Going into more detail, the segmentation and classification methods are subsequently divided into the main categories that were found to be employed for each individual specific task (i.e., segmentation or classification). Section 4 then discusses the main findings and the future scopes for research and Section 5 provides the conclusions of this review.

Results
The initial literature search resulted in finding 193 studies that were screened for title and abstract. After this screening, 109 studies were removed, and the remaining 84 papers were analyzed individually. Figure 3A displays a flowchart of the study selection.
A total of 56 articles were selected for this review and are reported here. Thirty-eight studies (67.9%) focused exclusively on the automatic or semi-automatic segmentation of a structure of interest (e.g., vasculature or foveal avascular zone). The remaining 18 articles (32.1%) had a final goal of classifying the images into pathological or healthy or disease staging, either based on extracting hand-crafted features and then employing a machine learning technique, or end-to-end deep learning methods. A number of studies (n = 9, 16.1%) presented both a segmentation and a classification method, all of which employed a machine learning classification method based on extracted features that first required the segmentation of a structure of interest (e.g., vasculature parameters or the foveal avascular zone (FAZ) area). These 9 studies are included in both Section 3.1 on segmentation tasks and in Section 3.2 on classification tasks, hence making the final number of analyzed studies focusing on segmentation equal to 47. Studies that included the comparison of various segmentation or classification methods (e.g., thresholding vs. machine learning for segmentation) are included in each relevant section. The methods for segmentation were global or local thresholding (n = 23/47, 48.9%), deep learning (n = 11/47, 23.4%), clustering (n = 6/47, 12.9%), active contour models (n = 5/47, 10.6%), edge detection (n = 1/47, 2.1%), or machine learning (n = 1/47, 2.1%). For classification tasks, machine learning was the majority (n = 12/18, 66.7%) over deep learning techniques (n = 6/18, 33.3%). Figure 3B shows a pie chart of the segmentation and classifications tasks.

Segmentation Tasks
In this section, the main methods used for the segmentation of structures of interest within the OCTA image are briefly described and compared. When considering ocular applications, the structures of interest that are segmented within the image correspond to either the vasculature or the FAZ. On the other hand, when considering dermatology applications, the structures of interest are mainly the vasculature and, if necessary, the tissue surface. Due to the different segmentation tasks that were found and the importance of comparing different techniques (e.g., thresholding vs. clustering) for one task (e.g., FAZ segmentation), all of the analyzed methods are described in Table 1 and are divided by segmentation task and then by segmentation method. Figure 4 illustrates examples of these segmentation methods.

Thresholding
As can be noted from the large percentage of studies (n = 23, 48.9%), thresholding is the go-to method for segmenting structures of interest in OCTA images. Simply put, it is a method that marks all pixels that have an intensity lower (i.e., darker) or higher (i.e., brighter) than a specifically determined threshold as the object in the obtained binary image. How the intensity threshold is determined can vary greatly and can be divided into two main categories: global or local (also referred to as adaptive).
Global thresholding determines one threshold value for the entire image frame and is determined by an analysis of the whole image intensity histogram. The Otsu method [17] is a commonly used automatic thresholding technique for OCTA images [18][19][20][21][22][23] and is based on finding a threshold that minimizes the intraclass variance of the thresholded black and white pixels. Other global thresholding methods are based on finding a specific percentile of the image intensity histogram [24], the progressive weighted mean of the image intensity histogram [25,26], or by simply fine-tuning a specific gray level [27]. Many analyzed studies employed a global thresholding technique without specifying exactly how the final threshold was determined [22,[28][29][30][31][32][33][34].
Local, or adaptive, thresholding is based on analyzing the image in smaller areas, defined by a user-specified neighborhood. A threshold is therefore determined for each pixel, typically using first-order statistics, such as the mean and standard deviation of the pixel intensity within each considered neighborhood. The most commonly found local adaptive thresholding technique in OCTA images is the Phansalkar method [35] which was employed in numerous studies reported in this review [19,34,36,37]. Importantly, Chu et al. [38] provided an interesting outlook on using the Phansalkar thresholding technique for quantifying choriocapillaris, demonstrating the need of careful optimizing of the method's parameters for an accurate segmentation. Other common local thresholding methods used in OCTA images are the local mean [39] and local median [37,40], and one study employed a signal-to-noise adaptive binarization method [41]. A couple studies used adaptive thresholding without specifying the exact method [30,42].
Thresholding was the most common technique when considering the segmentation task of vasculature, both in ophthalmology and dermatology applications (see Table 1), but it is difficult to compare its performance with other techniques as the majority of the studies did not provide a quantitative validation of the vessel segmentation but rather either continued on to classify a specific disease or compared quantitative parameters computed on the segmentation (healthy vs. pathological subjects) or correlated the parameters with disease staging. The study by Zhang et al. [27] provided a quantitative validation of the obtained segmentation using global thresholding on optimally oriented flux filtered images, showing a Dice coefficient (DSC) equal to 0.8587 for healthy subjects, 0.8434 for proliferative diabetic retinopathy (PDR) subjects, and 0.8520 for severe non-proliferative DR (NPDR) subjects. Although the study was a rare one that employed 3D volumes instead of 2D en face images, the segmentation validation was performed on the 2D projections of the segmentation. Some other studies provided a segmentation comparison with a semi-automated segmentation, such as the one by Meiburger et al. [25], and compared quantitative parameters obtained using the various segmentations (i.e., semi-automatic vs. automatic). This study also provided an intra-operator variability analysis, showing a high variability when using the semi-automatic software for segmentation. When considering the task of segmenting the FAZ, the study by Xu et al. [22] used Otsu thresholding and reaching a maximum DSC equal to 0.90.
Four interesting studies to note when considering thresholding techniques are the work by Rabiolo et al. [43], Laiginhas et al. [19], Terheyden et al. [20], and Mehta et al. [44]. Each of these studies compared several different thresholding techniques for the quantification of OCTA images, and the main finding from each of them is that the absolute quantification values calculated with different thresholding algorithms are not directly interchangeable. Laiginhas et al. found that local thresholding strategies are significantly superior to global ones [19] when considering choriocapillaris and flow deficit parameters. These studies demonstrate how there is still an unmet need for a uniform strategy to quantify OCTA images, and care must be taken when comparing quantitative parameters computed from different thresholded OCTA images.

Deep Learning
Recently, the use of deep learning frameworks for analyzing medical images has seen an exponential growth. Deep learning implies the use of deep neural networks, which is an artificial neural network that has many layers between the input and output layer. Convolutional Neural Networks (CNNs) are specifically used in image analysis applications, as they apply numerous convolutions on the input image [45]. The main advantage of CNNs is that they can automatically learn high-level features and then provide a semantic segmentation by associating each pixel of the input image to a label or class. The drawbacks to deep learning methods are (a) the need of a large annotated database, which has somewhat, but not totally, been mitigated with the employment of transfer learning [46], (b) their complexity (i.e., requirement of an immense number of training parameters) and (c) the difficulty of interaction with any single layer of the network, which can contribute to the view of deep networks as black-boxes that do not explain their predictions in a way that is easily understandable by humans [47].
All of the studies that employed deep learning techniques were based on ophthalmological applications, so either for FAZ segmentation or eye vascular segmentation. This can most likely be explained by the fact that larger databases are available for ocular applications, whereas the dermatological applications are still in the research stage and are not used on a daily basis in a clinical setting. The majority of the studies used already-known architecture styles with some modifications, such as the UNet [11,[48][49][50][51][52], VGG [53][54][55], and ResNet [13,56], but two studies also employed custom-made networks [57,58].
The performance of the deep learning methods for eye vasculature segmentation was quite high, as demonstrated by the study by Li et al. [55] that employed a network that took as input the 3D acquired volume and then produced a 2D segmentation using a plane perceptron to enhance the perceptron ability in the horizontal direction. The authors obtained DSC values equal to 0.8941 with images with a 6 × 6 mm 2 FOV, and equal to 0.9274 with images acquired on a 3 × 3 mm 2 FOV. Another study that showed promising results was by Giarratano et al. that first produced both an open dataset and also provided their source code [11]. Moreover, it provides an interesting comparison between deep learning techniques, specifically the UNet and CS-Net [59], and traditional methods. The best Dice coefficient was obtained using the deep learning methods (DSC = 0.89), yet the traditional adaptive thresholding method on filtered OCTA images also showed high Dice coefficient values (DSC = 0.86). Their study also emphasizes the importance of evaluating segmentation performance in terms of clinically relevant metrics [11]. When considering the FAZ determination, deep learning techniques also outperformed the other methods, as demonstrated by the study by Guo et al. [60] that used a dataset of 405 images and a final DSC value equal to 0.9760. The study by Wang et al. [61] also presented a deep learning method for CNV segmentation, with a maximum Intersection over Union (IoU) equal to 0.88.

Clustering
Clustering is the grouping of similar instances, objects, or pixels in this specific case. In order to group pixels together, there must be some sort of measure that can determine whether they are similar or dissimilar. The two main types of measures used to estimate this relation are distance measures and similarity measures [62].
In the case of OCTA image segmentation, the majority of the analyzed studies used pixel intensity as a way to group together objects, using common methods such as k-means clustering [63][64][65], or other clustering algorithms such as fuzzy c-means clustering [66] and a modified CLIQUE clustering technique [67]. An interesting study that used local features for clustering and not pixel intensity is a method by Engberg et al. [68] which was based on building a dictionary using pre-annotated data and then processing the unseen images by computing the similarity/dissimilarity.
Clustering methods were employed in two clinical applications: general eye vasculature segmentation and choroidal neovascularization (CNV)/Choriocapillaris segmentation. The study by Engberg et al. [68] was a rare study that provided a quantitative validation of general eye vessel segmentation, even though only one image was used for validation. On this image, the DSC was equal to 0.82 for larger vessels and 0.71 for capillaries. For the CNV/Choriocapillaris application, the study by Xue et al. [67] had a final DSC equal to 0.84.

Active Contour Models
The model-based segmentation methods, also known as active contours, can be divided into parametric models, or snakes, and geometric models, which are based on the level set method. These deformable models rely on the definition of both an internal and external energy and an initial contour which evolves until the two energy functions reach a balance.
The five studies that employed a model-based segmentation framework were all focused on ocular applications, either segmenting the retinal vessels [69][70][71] or the FAZ [72,73]. In the first case, the best results were achieved by Sandhu et al. [70] using a database of 100 images and obtaining a final DSC of 0.9502 ± 0.0443. In the same study, the best results were also obtained for FAZ determination, with a DSC equal to 0.93 ± 0.06. Both parametric and geometric active contours were found. One study compared two different ImageJ macros that implement the level set method and the Kanno-Saitama macro [72] with the built-in software for FAZ segmentation, whereas the other three studies used customwritten software implementing the Global Minimization of the Active Contour/Snake model (GMAC) [71], a generalized gradient vector flow (GGVF) snake model [73], and a joint Markov-Gibbs random field (MGRF) model [69].

Edge Detection
Edge detection methods in OCTA images are used rarely as the main segmentation method (n = 1, 2.1%). Briefly, numerous edge detection methods exist, and are based on computing the image gradient, which highlights the sections of the image that present a transition from dark to light or from light to dark along a specific direction.
The study that employed an edge detection method used the Canny method [74], which calculates the gradient using the derivative of a Gaussian filter. The Canny method exploits two thresholds to detect strong and weak edges, including weak edges in the output if they are connected to strong edges. Thanks to the use of these two thresholds, this method is robust to noise and is likely to detect true weak edges. The study using edge detection was found to be employed for determining the FAZ [75] in ocular applications, showing a Jaccard index equal to 0.82. Another study focusing on dermatological applications also employed an edge detection method, but as a preprocessing stage, that is, for determining the tissue surface in skin burn scars [76]. Hence, this type of segmentation method has not been found to segment vasculature, which can be explained by the vasculature complexity and difficulty of detecting connected edges at each angle of the image.

Machine Learning
Machine learning is a type of artificial intelligence technique that is based on the extraction of hand-crafted features which are then fed into a classifier. This method is more commonly used for classification tasks and will be described in more detail in Section 3.2.1, but it can also be employed for segmentation tasks. In this case, the features that are extracted from regions of interest (ROIs) of the image are fed into a classifier to determine whether the current ROI belongs to the object of interest (or to which of the objects of interest they belong in the case of multi-object segmentation) or to the background.
A machine learning method for a segmentation task was found in only one of the analyzed articles and was focused on the choriocapillaris segmentation [77]. The method was based on the extraction of features from the structural OCT images and the inner retinal and choroidal angiograms. In particular, the features included the standard deviation and directional Gabor filters at multiple scales which were then fed into a random forest classifier. This technique showed a final Jaccard index equal to 0.81 ± 0.12. Structure-constraint UNet architecture with feature encoder module, feature decoder module, and structure constraint blocks (SCB) for depth map estimation. From 2D segmentation to 3D space.

No segmentation validation.
Depth prediction method is validated.

Classification Tasks
In this section, the main methods used for the classification of OCTA images are briefly described and compared. There were no studies found that focused on the classification of skin vasculature, so all analyzed studies aimed at classifying ocular OCTA images. The main focus of classification tasks were the detection of retinal diseases, such as DR, AMD, glaucoma, and choroideremia. Two analyzed studies instead focused on the classification between arteries and veins within the OCTA image, which can provide important information for early disease detection and better stage classification [30,78]. Due to the different classification tasks that were found and the importance of comparing different techniques (i.e., machine learning vs. deep learning) for one task (e.g., DR detection), all of the analyzed methods are described in Table 2 and are divided by classification task and then by classification method. Figure 5 illustrates examples of how these classification methods work.

Machine Learning
Machine learning is an artificial intelligence technique that is based on the extraction of hand-crafted features which are then fed into a classifier, such as neural networks (NNs), support vector machines (SVM), or random forests (RF) [79].
In the context of retinal diseases, a recent review has been presented in literature that analyzes the quantitative parameters of retinal OCTA images that have been used in numerous studies [12]. Briefly, the main quantitative parameters that have been used are: blood vessel tortuosity (BVT), blood vessel caliber (BVC) or vessel diameter, blood vessel density (BVD or just VD), vessel perimeter index (VPI), foveal avascular zone area (FAZ-A), foveal avascular zone contour irregularity (FAZ-CI), vessel complexity index (VCI) such as the fractal dimension (FD), branchpoint analysis (BPA), differential arteryvein (A-V) analysis, flow analysis using parameters such as the flow index (FI) or flow void (FV), vessel branching coefficient, vessel branching angle, branching width ratio, and choroidal neurovascular (CNV) analysis. The mathematical description of these quantitative parameters is out of scope of this review, so interested readers can refer to the study by Yao et al. [12] for a comprehensive analysis and definition of these parameters in quantitative OCTA image analysis. These quantitative parameters are based on the segmentation of the FAZ or of the blood vessels. When considering the vasculature parameters listed above, they are typically computed not on the output segmented image or volume but a thinning technique, often called skeletonization [80], is rather applied to the vessel segmentation. This method reduces the vasculature to a centerline of the vessels and has been used in numerous other studies and imaging modalities [81,82].
A few studies instead computed texture features, such as those based on a local binary pattern (LBP) analysis [83] or the wavelet transform [84], and either used only these features for classification or combined them with other standard quantification parameters that were previously listed.
The most common machine learning method that was found for OCTA image classification was the support vector machine (SVM) [85]. This classifier was used for single disease detection, such as DR [70,84] and glaucoma [24,29], and was also employed for more complex classification tasks, such as DR staging [33] and distinguishing between different retinopathies [42]. The other classifiers that were used were NNs [32,83,86], k-means clustering [42], logistic regression [84], and a gradient boosting tree (XGBoost) [84].
Machine learning classification methods were used in basically all clinical applications, which included DR classification and staging, glaucoma classification, AMD classification, artery/vein classification, sickle cell retinopathy (SCR) classification and general retinopathy classification. When considering a general retinopathy classification, the study by Alam et al. [42] used the features extracted from different areas (BVT, BVC, VPI, BVD, FAZ) and FAZ contour irregularity features within an SVM classifier and obtained a maximum accuracy of 97.45% when classifying between healthy and diseased images. When considering the different pathologies, the accuracy was slightly lower: 94.32% (DR vs. SCR). Alam et al. [87] also presented a study for SCR classification, using the same features of Alam et al. [42] and three different classifiers: SVM, KNN, and discriminant analysis. The best results were obtained using an SVM classifier, with a final accuracy equal to 97%. Again, Alam at el. [30] presented a study also for artery/vein classification using a k-means clustering method, presenting an accuracy equal to 96.57% when considering all vessels. When considering AMD classification, Alfahaid et al. [83] used rotation invariant uniform local binary pattern texture features computed on 184 images couple with a KNN classifier to obtain a maximum accuracy of 100% when considering the choriocapillaris layer, and an accuracy of 89% for all layers. For glaucoma classification, Ong et al. [29] presented a promising study using Haralick's texture features and other global and local features which were then classified using an SVM to obtain an Area Under the Curve (AUC) equal to 0.98, considering a database of 158 images (38 glaucoma). When considering DR classification, which is the most commonly found clinical application in the analyzed studies, the most promising results were presented by Abdelsalam et al. [33], using multifractal parameter computation with an SVM classifier which showed an accuracy of 98.5% computed on a database of 80 DR patients and 90 healthy subjects.

Deep Learning
As mentioned in Section 3.1.2, deep learning implies the use of deep neural networks, and typically CNNs for image analysis. CNNs can automatically learn high-level features from the input image and therefore have the advantage of not requiring the extraction of hand-crafted features for classification [88], simply needing the input image and the correct class to which it belongs. The drawbacks of deep learning for classification are the same as those mentioned for segmentation tasks in Section 3.1.2. An advantage that classification tasks have over segmentation tasks when considering deep learning is the fact that it is typically less painstaking to obtain the expert ground truth, since manual segmentations can be very time consuming and require the usage of basic image processing software whereas manual classification of images is usually quicker and easier.
[91] that did not use the 3D acquired volume but stacked 2D images of the retinal layers of interest, obtaining a 93.4% testing accuracy at binary classification of neovascular AMD vs. non-AMD.
Deep learning methods were employed in many clinical applications of classification tasks: DR classification, AMD classification, artery/vein classification, and Central Serous Chorioretinopathy (CSC) classification. Aoyama et al. [92] presented a deep learning method based on a VGG16 pretrained model for CSC classification and obtained a final accuracy of 95%. For artery/vein classification, Alam et al. [78] used a fully connected network based on the UNet for classifying 30 DR and 20 healthy images, obtaining an accuracy equal to 86.75%, showing lower performances than those presented by the same authors [42] using a machine learning technique (accuracy = 96.57%). When considering AMD classification, Thakoor et al. [91] presented an interesting study employing a custommade 3D CNN and using as input a stack of 2D images of retinal layers of interest. When using a two-class classification (i.e., NV-AMD vs. healthy), the classification accuracy was quite high (93.4%), but when considering a three-class classification (NV-AMD vs. non-NV-AMD vs. healthy), the accuracy decreased (77.8%). For DR classification, numerous approaches were presented, and the most promising was the study by Zang et al. [90] that used a densely and continuously connected neural network with adaptive rate dropout. The obtained accuracy was equal to a maximum of 96.5% for two-class classification and minimum 67.9% considering a four-class classification. Another study to note is the one by Heisler et al. [86] that employed an Ensemble network and obtained an accuracy equal to 92 ± 1.92%. Higher accuracy values were obtained using a machine learning method [33]; however, it must also be pointed out that the databases in the deep learning methods are also almost double or triple in size.

Discussion
In this review and handbook, we aimed to provide the reader with an overview of the most common segmentation and classification methods that are employed for automatic OCTA image or volume analysis. In this section, some key findings and future prospects are discussed.
A first find is that the vast majority of studies (53 out of 56, 94.6%) focus on ocular applications, which can be explained by the fact that there are numerous clinical devices available for this specific field. The main clinical devices that were used in the analyzed studies were the: (a) Avanti OCTA system (Optovue, Inc., Fremont, CA, USA), (b) DRI OCT Triton or DRI OCT-1 Triton plus, (Topcon Medical Systems, Paramus, NJ, USA), and (c) PLEX Elite or Cirrus system (Carl Zeiss Meditec, Dublin, CA, USA). Three (5.4%) studies instead focused on the analysis of OCTA data acquired on human skin, two of which used custom-made laboratory OCT/OCTA systems [25,41] and one of which employed a fiber-based swept-source polarization-sensitive OCT system (PSOCT-1300, Thorlabs) [76]. Hence, it can be observed how the use of OCTA imaging is quite established for ocular applications, but it is starting to move in other interesting directions, such as the noninvasive analysis of vasculature in skin. The fact that the upcoming research field of OCTA imaging is found in dermatology can be explained by the fact that the limited penetration depth of OCT/OCTA imaging makes the analysis of superficial vasculature an ideal application.
A second important overall aspect to discuss is the type of data analyzed, either twodimensional or three-dimensional. The acquired OCTA data from devices are inherently three dimensional, yet the vast majority of studies employ segmentation or classification methods on 2D images instead of the 3D volumes. The 2D images are typically obtained as a Maximum Intensity Projection (MIP) en face image of a specific retinal layer in the case of ocular applications, or of the entire acquired volume in the case of dermatological applications. A few recent studies have instead employed algorithms using the acquired volumetric data, in both ophthalmological and dermatological applications [27,29,36,53]. To note is an interesting study by Yu et al. [52] that employs a structure-constraint CNN architecture for a depth map estimation to map a segmentation obtained on 2D images into a 3D space. Especially when considering the up and coming research field of OCTA imaging in dermatology applications, the usage of the 3D volume should be considered preferable as it can provide an important 3D visualization of the vasculature and, more importantly, a more accurate vascular analysis and quantification [1].
A third overall aspect to take into consideration is the imaging area FOV. Considering a scan step size that is proportional to the FOV, the scan density for a smaller FOV (e.g., 1 × 1 mm 2 ) is higher than that for a larger FOV (e.g., 12 × 12 mm 2 ), providing a better scan resolution and hence a better ability to delineate detailed microvasculature. On the contrary, a larger FOV covers a wider area of scan coverage and is hence more likely to detect the presence or absence of pathological features such as non-perfusion and microaneurysms [94]. The FOV in the analyzed studies (not considering the depth which was not always reported) ranged from 1 × 1 mm 2 up to 12 × 12 mm 2 . For ocular applications, most of the studies employed a FOV equal to 3 × 3 mm 2 or 6 × 6 mm 2 , with only three studies employing a larger FOV and one study employing a smaller FOV. Interestingly, each of these four studies adopted either machine learning or deep learning techniques for segmentation and/or classification. For skin applications, the imaging FOV varied and was not consistent throughout the three analyzed studies, employing both a small FOV (i.e., 2.5 × 2.5 mm 2 ) and a larger FOV (i.e., 10 × 10 mm 2 ). When 3D volumes were analyzed, the scanning depth ranged from 1.2 mm to 3 mm.
In this review, preprocessing methods for enhancing OCTA images and postprocessing methods for improving the segmentation or classification results were not taken into consideration. Preprocessing and postprocessing methods can improve segmentation and classification outcomes. This has been demonstrated both with traditional techniques on OCTA images, such as thresholding [36], and with deep learning methods in digital pathology, which can also be extended to other research fields [95]. In OCTA imaging, the most commonly found preprocessing steps are those focusing on vessel enhancement. These filters aim to enhance structures within the image or volume that appear to have a vessel-like structure and reduce the signal if not. The most commonly used vesselness filter found in literature is the one proposed by Frangi et al., known as the Frangi filter [96]. This filter is characterized by a scale parameter that determines the dimensions of the vessels that are recognized and then enhanced in the image/volume. It is also possible to combine multiscale measurements (i.e., combine different scale parameter values) and hence recognize both smaller and larger vessels. Other common filters for vessel enhancement include the optimally oriented flux (OOF) filter [97], Gabor [98], and SCIRD-TS [99]. All of these filters also require parameter tuning similar to the Frangi filter. The next common preprocessing method is histogram normalization and contrast enhancement using methods such as CLAHE [100]. When considering 3D volumes, an important preprocessing method is projection artefact removal, a common OCTA artefact that causes the signal from a superficial vessel to protrude deeper within the volume than it should [101]. Numerous techniques for projection artefact removal have been proposed in literature [102]. One analyzed study combined stripe removal, another common artefact in OCTA images, with an active contour model [71]. Regarding segmentation postprocessing methods, the main techniques that were used were hole filling, small object removal and morphological operators to smooth the final boundaries.
Another important factor to note is the difficulty of direct comparisons between studies. This can be observed when considering quantitative parameters obtained using different segmentation techniques, which has been accurately demonstrated for various thresholding methods [21,45,46], but can be extended to include any segmentation technique. Any segmentation method that is used will provide a different final binary image and therefore will change, even if only slightly, the obtained quantitative parameter. As mentioned previously, this calls for the dire need of a consensus across the research community for OCTA image quantification. This can be partially attributed to the fact that the majority of the studies that presented an automated technique for combined segmentation and classification using quantitative parameters did not actually validate the segmentation method against a manual segmentation but only validated the final classification results with a manual classification. Other studies that presented a segmentation technique did not actually validate the obtained segmentation but rather focused on the repeatability of the measurements, such as the studies by [21,33,41,46], or on the statistical differences or correlation between quantitative parameters obtained on images from healthy and pathological subjects, such as the studies by [36,42,78]. Another comparison difficulty is simply the fact that almost all studies used proprietary databases. Fortunately, the open science movement has recently also reached OCTA imaging applications in the ophthalmological field, and a few recent studies provide not only a segmentation method for retinal OCTA images but also an open dataset. Specifically, Giarratano et al. [11] published the first open dataset of retinal parafoveal OCTA images with their associated ground truth manual segmentations, including a database of 55 ROIs from OCTA images acquired on 11 subjects. Yuhui et al. [13] presented the ROSE dataset that contains 229 OCTA images with vessel annotations at either centerline-level or pixel level, and Mingchao et al. [55] presented the OCTA-500 method and dataset which contains data acquired on 500 subjects with two FOV types. The dataset includes both OCT and OCTA volumes, six types of projections, four types of text labels, and two types of pixel-level labels. Very recently, a preprint by Untracht et al. [103] was made available that presents OCTAVA, an open-source toolbox for the quantitative analysis of optical coherence tomography angiography images. The authors present a Matlab GUI to help automate the quantitative analysis of en face OCTA maximum intensity projection images in a standardized workflow, including preprocessing, segmentation, and quantitative parameter computation steps. Thanks to these datasets and tools and the trend of making datasets and also automatic methods open for researchers to use, the problem of a lack of consensus should be mitigated in the coming years.
Among the methods that presented a segmentation validation, from Table 1 it can be seen how the methods that employed a thresholding technique were mainly also those that did not present any segmentation validation, but rather focused the study on the analysis of specific parameters obtained from the segmentation with a clinical aspect. On the other hand, the other segmentation methods tend to include a validation of the segmentation and are more strictly focused on the presentation of a unique segmentation algorithm. When considering a complicated segmentation task, such as vasculature segmentation, the GGMRF models by Eladawi et al. [69] and Sandhu et al. [70] show very promising results, with a DSC equal to 0.95, but are limited to a database of slightly over 100 images. The more recent deep learning methods include much larger databases, such as the one presented by Li et al. [55] which includes 500 images and shows very promising results (DSC = 0.9274) when considering a 3 × 3 mm 2 FOV. When considering easier segmentation tasks, such as the FAZ segmentation, it can be observed how the highest state-of-the-art segmentation results are reached only by deep learning methods, showing a 5-10% increase in segmentation performance parameters.
From the methods analyzed in this review, it can be observed that machine learning methods are still the majority and also typically present the highest performance results for now, in terms of accuracy, when considering classification tasks. For example, for diabetic retinopathy classification, the highest accuracy was obtained by Abdelsalam et al. [33], reaching a 98.5% accuracy on a database of 170 images using an SVM classifier. Still, the DcardNet presented by Zang et al. [90] showed very similar, albeit slightly lower, results with a 96.5% accuracy on a dataset that was almost twice the size (303 images). Overall, what can be observed with both machine learning and deep learning classification methods is that, as the classification task increases in complexity (e.g., disease staging or multiple disease classification), the obtained classification results tend to decrease when using a similar-sized dataset, which can be expected.
Quantitative OCTA imaging and the employment of automatic segmentation and classification methods is an emerging field, with a solid basis of various techniques for ophthalmological applications and the beginnings of a foundation of methods for dermatological applications. Although still the minority in literature for ocular applications, recent studies have begun to focus on the valuable volumetric information OCTA imaging provides, and it could be that the tendency in upcoming years will keep building on these recent studies and that the usage of only flattened 2D OCTA images may eventually become obsolete. This is not to say that valuable information cannot be extracted from 2D en face images, but rather that a 3D analysis enrichens the information and can provide a more comprehensive analysis of healthy and pathological situations. As mentioned in the previous paragraph, open databases of OCTA images are starting to become more available; due to this, it is likely that segmentation tasks in OCTA imaging will gradually see less and less studies that apply only traditional methods, such as thresholding, and that there will be an increase in the application of deep learning methods. The actual segmentation step of OCTA images may also become less common, as deep learning methods can also directly classify images without computing any hand-crafted features. Still, the 3D visualization and quantitative analysis of vasculature is bound to keep its importance, especially in fields where the non-invasive analysis of neovascularization and vascular network complexity are of fundamental importance, such as cancer [104]. In the case of direct classification of images using deep learning methods, recently there has been a significant increase of also employing "explainability" methods, such as Grad-CAM [105], that can highlight what part of the image is the most influential for the final classification decision. Future studies focusing on the classification of OCTA images need to continue this trend, as it is fundamental for comparing and evaluating developed methods.

Conclusions
In this review, we summarized the state-of-the-art methods and techniques for automatic segmentation and classification of OCTA images. OCTA imaging is an emerging method in some research fields and the automatic quantification and classification are of fundamental importance. Upcoming studies should focus on continuing the trend of open science and contributing to the standardization of automatic OCTA image analysis methods.