Automatic Segmentation of Choroid Layer Using Deep Learning on Spectral Domain Optical Coherence Tomography

: The purpose of this article is to evaluate the accuracy of the optical coherence tomography (OCT) measurement of choroidal thickness in healthy eyes using a deep-learning method with the Mask R-CNN model. Thirty EDI-OCT of thirty patients were enrolled. A mask region-based convolutional neural network (Mask R-CNN) model composed of deep residual network (ResNet) and feature pyramid networks (FPNs) with standard convolution and fully connected heads for mask and box prediction, respectively, was used to automatically depict the choroid layer. The average choroidal thickness and subfoveal choroidal thickness were measured. The results of this study showed that ResNet 50 layers deep (R50) model and ResNet 101 layers deep (R101). R101 U R50 (OR model) demonstrated the best accuracy with an average error of 4.85 pixels and 4.86 pixels, respectively. The R101 ∩ R50 (AND model) took the least time with an average execution time of 4.6 s. Mask-RCNN models showed a good prediction rate of choroidal layer with accuracy rates of 90% and 89.9% for average choroidal thickness and average subfoveal choroidal thickness, respectively. In conclusion, the deep-learning method using the Mask-RCNN model provides a faster and accurate measurement of choroidal thickness. Comparing with manual delineation, it provides better effectiveness, which is feasible for clinical application and larger scale of research on choroid.


Introduction
The choroid, the vascular layer of the eye, lies between the retina and the sclera. The choroid provides oxygen and nourishment to the outer layers of the retina [1]. Therefore, in clinical diagnosis, many studies have revealed a pathophysiological association between choroidal thickness and certain chorioretinal diseases [2], such as polypoidal choroidal vasculopathy (PCV), central serous chorioretinopathy (CSCR), age-related macular degeneration (AMD) pathologic myopia, etc. The estimation of choroidal thickness can be used as an indicator for clinical diagnosis [3].
Optical coherence tomography (OCT) is a noninvasive imaging technology that reconstructs micrometer-resolution images of posterior visual segment, including vitreous, retina and choroid, via light rays reflected from different layers of ocular structures. It enables physicians to assess pathological changes in blood vessels and nerve fibers present in retina and choroid layers and is useful in diagnosing retina and choroid diseases and monitoring retina and choroid changes following different therapies. Qualitative OCT images offers the basis of individualized treatment for various patients, such as the choice of different anti-VEGF drugs, and treat-and-extend regimen versus fixed regimen treatment protocol for age-related macular degeneration (AMD), polypoidal choroidal vasculopathy (PCV), and diabetic macular edema (DME). Many factors influence the quality of an OCT image, such as age, pupil size, cataract and vitreous opacity [4]. Hence, in the development of automatic segmentation of the choroid boundary, difficult clinical situations, such as abnormal ocular anatomy and nonignorable noise, may be encountered. Various approaches [5,6] have been proposed for segmentation procedures of automated choroid layer segmentation in OCT images of the posterior visual section. However, Schlegl et al. propose a framework only for internal limiting membrane (ILM) and choroidal inner boundary (CiB) of segmentation [5]. For the choroidal outer boundary (CoB), segmentation should be more difficult than ILM and CiB. Since there are numerous issues such as the heterogenous texture of tissues, artifact speckles and boundary discontinuities, which often lead to difficulties in extracting accurate boundaries of the choroid layer in OCT imaging, the CoB blur extent will be different. Although the CNN classifier and l2-lq (0 < q < 1) fitter model proposed by Lin et al. is able to depict Cob, its structure consists of the framework of LeNet-5, which is part of the traditional convolutional neural networks (CNN) [6]. The LeNet-5 convolution layers have 15 layers and an extract feature only in shallow. When compared to the mask region-based convolutional neural network (Mask R-CNN), which combines shallow and deep layer features, LeNet-5 might lose critically important location information. Figure 1a shows an original OCT image with a contrast-enhanced area (the dotted rectangle) that includes various noises and an artifact speckle. Figure 1b shows the proposed method that segments the OCT image into three layers, namely, the ILM, CiB, and CoB. Segmentation procedures are complicated and especially time-consuming. In this study, we propose an efficient method for automatic choroidal segmentation and thickness estimation to reduce the time required to sketch precise boundaries.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 14 treatment protocol for age-related macular degeneration (AMD), polypoidal choroidal vasculopathy (PCV), and diabetic macular edema (DME). Many factors influence the quality of an OCT image, such as age, pupil size, cataract and vitreous opacity [4]. Hence, in the development of automatic segmentation of the choroid boundary, difficult clinical situations, such as abnormal ocular anatomy and nonignorable noise, may be encountered. Various approaches [5,6] have been proposed for segmentation procedures of automated choroid layer segmentation in OCT images of the posterior visual section. However, Schlegl et al. propose a framework only for internal limiting membrane (ILM) and choroidal inner boundary (CiB) of segmentation [5]. For the choroidal outer boundary (CoB), segmentation should be more difficult than ILM and CiB. Since there are numerous issues such as the heterogenous texture of tissues, artifact speckles and boundary discontinuities, which often lead to difficulties in extracting accurate boundaries of the choroid layer in OCT imaging, the CoB blur extent will be different. Although the CNN classifier and l2-lq (0 < q < 1) fitter model proposed by Lin et al. is able to depict Cob, its structure consists of the framework of LeNet-5, which is part of the traditional convolutional neural networks (CNN) [6]. The LeNet-5 convolution layers have 15 layers and an extract feature only in shallow. When compared to the mask region-based convolutional neural network (Mask R-CNN), which combines shallow and deep layer features, LeNet-5 might lose critically important location information. Figure 1a shows an original OCT image with a contrast-enhanced area (the dotted rectangle) that includes various noises and an artifact speckle. Figure 1b shows the proposed method that segments the OCT image into three layers, namely, the ILM, CiB, and CoB. Segmentation procedures are complicated and especially time-consuming. In this study, we propose an efficient method for automatic choroidal segmentation and thickness estimation to reduce the time required to sketch precise boundaries.

Data Acquisition
This research utilized an OCT image database comprising data from 30 healthy subjects. Eyes with history of previous intraocular intervention (e.g., intravitreal injection) or surgery (e.g., vitrectomy), intraocular implants, other ocular diseases that may influence the normal contour of retina and choroid layer (such as intraretinal fluid, subretinal fluid, large drusens or tumor) were excluded. The database included only one OCT image sequence from each patient. All patients underwent SD-OCT scan (Heidelberg Engineering Inc., Heidelberg, Germany) with enhanced depth imaging (EDI) methods which provided a more detailed image of the choroid layer compared with conventional SD-OCT [7]. The

Data Acquisition
This research utilized an OCT image database comprising data from 30 healthy subjects. Eyes with history of previous intraocular intervention (e.g., intravitreal injection) or surgery (e.g., vitrectomy), intraocular implants, other ocular diseases that may influence the normal contour of retina and choroid layer (such as intraretinal fluid, subretinal fluid, large drusens or tumor) were excluded. The database included only one OCT image sequence from each patient. All patients underwent SD-OCT scan (Heidelberg Engineering Inc., Heidelberg, Germany) with enhanced depth imaging (EDI) methods which provided a more detailed image of the choroid layer compared with conventional SD-OCT [7]. The scan protocol composed a 6 × 6 mm area, which was determined by 25 scans. Each scan was separated by 24 µm. One pixel represented 4 µm in this study. The quality of the scan was assessed by the physicians. Scans with low signal strength or severe motion artifacts were repeated until an adequate quality was achieved. Mydriatic agents were used if necessary. Each monochrome OCT imaging was quantized into eight bits with 256 gray levels. The entire database was supplied by coauthors, Dr. Hsia and Dr. Chang, from the Department of Ophthalmology, Taichung Veterans General Hospital, Taiwan.
Each OCT image sequence contained 25 two-dimensional (2D) scans, and the spatial resolution of each scan was 1008 × 596 pixels. In order to discard redundant image contents, every 2D scan was cropped to form a 480 × 480 image slice. Figure 2a illustrates the relationship between an original 2D scan and the extracted image slice. The inner border of the choroid was deemed to be Bruch's membrane, while the choroid-scleral interface (CSI) was considered to represent the outer border of the choroid.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 14 scan protocol composed a 6 × 6 mm area, which was determined by 25 scans. Each scan was separated by 24 μm. One pixel represented 4 μm in this study. The quality of the scan was assessed by the physicians. Scans with low signal strength or severe motion artifacts were repeated until an adequate quality was achieved. Mydriatic agents were used if necessary. Each monochrome OCT imaging was quantized into eight bits with 256 gray levels. The entire database was supplied by coauthors, Dr. Hsia and Dr. Chang, from the Department of Ophthalmology, Taichung Veterans General Hospital, Taiwan. Each OCT image sequence contained 25 two-dimensional (2D) scans, and the spatial resolution of each scan was 1008 × 596 pixels. In order to discard redundant image contents, every 2D scan was cropped to form a 480 × 480 image slice. Figure 2a illustrates the relationship between an original 2D scan and the extracted image slice. The inner border of the choroid was deemed to be Bruch's membrane, while the choroid-scleral interface (CSI) was considered to represent the outer border of the choroid.

Deep-Learning Model
Mask R-CNN [8] is a two-stage framework: Figure 2b illustrates the first stage scans of the image output feature map and output through the region proposal network (RPN)generated proposals. The second stage classifies the proposals and generates bounding boxes and masks. In this study, the Mask R-CNN model utilized the deep residual networks (ResNet) [9] plus the feature pyramid network (FPN) [10] backbone with standard convolution and fully connected heads for mask and box prediction, respectively. This model obtains the best speed/accuracy trade-off. Generally speaking, the shallower features mainly provide detailed information, while deeper information provides semantic information. Mask R-CNN extended from faster region-based convolutional neural network (Faster R-CNN) for instance segmentation. Compared to Fast R-CNN, Mask R-CNN has three outputs, a class label, bounding box and object mask for each candidate object. However, the mask requires extraction of details from an object. Mask R-CNN improves region of interest pooling (RoIPool) for misalignment problem and propose region of interest align (RoIAlign). It aims to avoid any quantization of the region of interest (RoI) boundaries and adopts bilinear resampling. This method plays an important role for boundary of object mask. The benefit of this method is that the accuracy of location of boundary is significantly improved. These deep-learning techniques hold considerable promise as a robust automatic sketching scheme for evaluating the value of choroidal thickness on OCT imaging without human intervention.

Deep-Learning Model
Mask R-CNN [8] is a two-stage framework: Figure 2b illustrates the first stage scans of the image output feature map and output through the region proposal network (RPN)generated proposals. The second stage classifies the proposals and generates bounding boxes and masks. In this study, the Mask R-CNN model utilized the deep residual networks (ResNet) [9] plus the feature pyramid network (FPN) [10] backbone with standard convolution and fully connected heads for mask and box prediction, respectively. This model obtains the best speed/accuracy trade-off. Generally speaking, the shallower features mainly provide detailed information, while deeper information provides semantic information. Mask R-CNN extended from faster region-based convolutional neural network (Faster R-CNN) for instance segmentation. Compared to Fast R-CNN, Mask R-CNN has three outputs, a class label, bounding box and object mask for each candidate object. However, the mask requires extraction of details from an object. Mask R-CNN improves region of interest pooling (RoIPool) for misalignment problem and propose region of interest align (RoIAlign). It aims to avoid any quantization of the region of interest (RoI) boundaries and adopts bilinear resampling. This method plays an important role for boundary of object mask. The benefit of this method is that the accuracy of location of boundary is significantly improved. These deep-learning techniques hold considerable promise as a robust automatic sketching scheme for evaluating the value of choroidal thickness on OCT imaging without human intervention.

ResNet and FPN
ResNet is a very powerful and resilient model and has become a widely used deep learning model. The ResNet model solved the problem of deep network degradation caused by increasing the number of layers. In principle, the network is unable to learn Appl. Sci. 2021, 11, 5488 4 of 13 new parameters when the number of layers is increased. The new layer will appear in the status of identity mapping, so that the model still maintains the same shallow network performance before the learning step has finished. The ResNet 101 layers (R101) have deeper features that could provide semantic information. In contrast, the ResNet 50 layers (R50), which have shallow features, could provide a pixel relationship. Figure 3a uses R101 to sketch a rigorous description of CoB and CiB boundary rendering. Figure 3b uses R50 to sketch a rough description of CoB and CiB boundary rendering. Deep features are significant for boundary sketching, but the features lose location information at the up-sampling step. However, location information is extremely important for the sketching scheme. Therefore, Mask R-CNN utilizing ResNet to provide the features and adding FPN improves the problem of lost location information. The FPN model is mainly used to predict the output of different features, and it then combines the output results to obtain a final output result, as shown in Figure 4.
ResNet is a very powerful and resilient model and has become a widely used deep learning model. The ResNet model solved the problem of deep network degradation caused by increasing the number of layers. In principle, the network is unable to learn new parameters when the number of layers is increased. The new layer will appear in the status of identity mapping, so that the model still maintains the same shallow network performance before the learning step has finished. The ResNet 101 layers (R101) have deeper features that could provide semantic information. In contrast, the ResNet 50 layers (R50), which have shallow features, could provide a pixel relationship. Figure 3a uses R101 to sketch a rigorous description of CoB and CiB boundary rendering. Figure 3b uses R50 to sketch a rough description of CoB and CiB boundary rendering. Deep features are significant for boundary sketching, but the features lose location information at the upsampling step. However, location information is extremely important for the sketching scheme. Therefore, Mask R-CNN utilizing ResNet to provide the features and adding FPN improves the problem of lost location information. The FPN model is mainly used to predict the output of different features, and it then combines the output results to obtain a final output result, as shown in Figure 4.     ResNet is a very powerful and resilient model and has become a widely used deep learning model. The ResNet model solved the problem of deep network degradation caused by increasing the number of layers. In principle, the network is unable to learn new parameters when the number of layers is increased. The new layer will appear in the status of identity mapping, so that the model still maintains the same shallow network performance before the learning step has finished. The ResNet 101 layers (R101) have deeper features that could provide semantic information. In contrast, the ResNet 50 layers (R50), which have shallow features, could provide a pixel relationship. Figure 3a uses R101 to sketch a rigorous description of CoB and CiB boundary rendering. Figure 3b uses R50 to sketch a rough description of CoB and CiB boundary rendering. Deep features are significant for boundary sketching, but the features lose location information at the upsampling step. However, location information is extremely important for the sketching scheme. Therefore, Mask R-CNN utilizing ResNet to provide the features and adding FPN improves the problem of lost location information. The FPN model is mainly used to predict the output of different features, and it then combines the output results to obtain a final output result, as shown in Figure 4.

Transfer Learning with COCO Dataset
Training is an extremely important step in the proposed method and requires a lot of data to promote better performance. A small amount of data would cause an overfitting situation. In addition, medical images require ophthalmologist's professional annotation, thus the training data would not be sufficient to support learning from scratch [11]. Therefore, this study performed transfer learning to transfer the weights from the COCO (Common Object in Context) dataset pre-training to the OCT dataset for training. The COCO dataset was provided by Microsoft. For the semantic scene labeling, it included 1.5 million objects belonging to 80 classes (for object detection) and 91 stuff classes. It has a variety of weights and is suitable for various migration learning tasks.
Regarding frozen stages, the default is to turn off one layer. Its function is to freeze all of the convolutional layers of the pre-trained model. It is only necessary to train one's own customized fully connected layers, so that the model is able to grasp the important features quickly. This study attempts to turn off three layers and this training performance was the best. Turning off more than three layers results in worse performance and even overfitting occurs. It is possible that more detailed features are needed for the prediction mask. The reason for closing the last three layers is to stably transfer weights while being able to obtain more detailed imaging information.

Model Ensembles
The proposed method trained two Mask R-CNN models, i.e., 50-layer ResNet (denoted R50 model) and 101-layer ResNet (denoted R101 model), which both used FPN settings. The only difference lay in the depth of the model. The captured features from the R50 model were relatively rough and the advantages were that the location information would be richer. However, the location information is important for mapping the location of the Cib and CoB. The R101 model acquired relatively deeper features and could perform better representation of boundary information. Since the two models had their own advantages and disadvantages, the proposed method attempted to perform two different boundary predictions on the same OCT image. Moreover, this study developed an intersection version (denoted AND model) and a union version (denoted OR model) of the sketching results from the R50 and R101 models. It is hoped that the results of this study provide ophthalmologists with a comprehensive reference value and reduce the burden on doctors. In the union and intersection versions, the sketching results from the union version most closely resembled the doctor's drawing.

Post-Processing
Mask-RCNN performed generation, classification, bounding box, and mask for target prediction. In this study, the classification and bounding box were removed, and the remaining masked parts were utilized to obtain two boundaries with only Cob and Cib. Even if the boundaries of CoB could be generated quickly through mask-RCNN, there would be some unreasonable dents on the boundaries, as shown in Figure 5a. To solve the dent problem, the proposed method created a vector for each side and then used the cross product of adjacent sides to test the convexity. The vector of all sides of the convex polygon was performed to determine whether it was a concave polygon or not. The proposed post-processing procedure was used in this study to connect the first concave polygon of a corner with the next x-axis of a corner. Then, the unreasonable dents could be filled. Figure 5b is the result after post-processing and the unreasonable dents was disappearance. The post-processing procedure was effortless and efficient for refining the sketching results using the proposed R50 and R101 models.

Results
In this study, experiments were conducted using a total of 30 cases to test the accuracy of the proposed method for an automatic sketching scheme. The k-fold cross-validation method was used to estimate the performance of the proposed method, and the k value is three. This study sets up the random initialization and keeps the threshold of 0.5 for each training and testing. We change to thresholds of 0.75, but there is no significant difference between threshold of 0.5 or 0.75. For this experiment, changing the freeze layers benefits more than changing the thresholds. It increases 2~3% in average precision. Therefore, freeze layers are set to 3 in each training in our model. The parameters used to execute the experiments is shown in Table 1. In an image coordinate plane, the distance between two points is usually given by the Euclidean distance (2-norm distance). The distance from a point to a line is the shortest distance from a fixed point to any point on a fixed line in Euclidean geometry. This study evaluated the thicknesses between the Cib and CoB by using the Euclidean distance. The distance between the layers was estimated as the choroidal thickness.

Results
In this study, experiments were conducted using a total of 30 cases to test the accuracy of the proposed method for an automatic sketching scheme. The k-fold cross-validation method was used to estimate the performance of the proposed method, and the k value is three. This study sets up the random initialization and keeps the threshold of 0.5 for each training and testing. We change to thresholds of 0.75, but there is no significant difference between threshold of 0.5 or 0.75. For this experiment, changing the freeze layers benefits more than changing the thresholds. It increases 2~3% in average precision. Therefore, freeze layers are set to 3 in each training in our model. The parameters used to execute the experiments is shown in Table 1. In an image coordinate plane, the distance between two points is usually given by the Euclidean distance (2-norm distance). The distance from a point to a line is the shortest distance from a fixed point to any point on a fixed line in Euclidean geometry. This study evaluated the thicknesses between the Cib and CoB by using the Euclidean distance. The distance between the layers was estimated as the choroidal thickness. In this study, k was three, and each group had ten OCT cases. Table 2 demonstrates the average error of choroid segmentation via the proposed method. In the simulations, the R50 model obtained the smallest error, i.e., 4.85 pixels. The result from the OR model with 4.86 pixels was relatively close to that of the R50 model. However, there were still differences in actual paintings, so the model could be used by ophthalmologists as a second reference. Figure 6a shows the final result of the cropped OCT image slice in Figure  5 by using the proposed method. Figure 6b illustrates the evaluated choroidal thickness (μm) in Figure 6a. Doctors could also refer to the four proposed automatic sketching models. Figure 6c shows an accurate depiction of this method after overlapping Figure 6a,b.  In this study, k was three, and each group had ten OCT cases. Table 2 demonstrates the average error of choroid segmentation via the proposed method. In the simulations, the R50 model obtained the smallest error, i.e., 4.85 pixels. The result from the OR model with 4.86 pixels was relatively close to that of the R50 model. However, there were still differences in actual paintings, so the model could be used by ophthalmologists as a second reference. Figure 6a shows the final result of the cropped OCT image slice in Figure 5 by using the proposed method. Figure 6b illustrates the evaluated choroidal thickness (µm) in Figure 6a. Doctors could also refer to the four proposed automatic sketching models. Figure 6c shows an accurate depiction of this method after overlapping Figure 6a,b. Figure 6d shows a accuracy rate of measuring choroidal thickness in all 30 cases based on different choroidal thickness. Most of the cases achieve good accuracy of choroidal thickness measurement with an average accuracy rate of 87.5%. There are three cases with relative lower accuracy rate (70.3%, 66.6% and 54.2%, respectively, Table S1). This could be explained that there are some large vessels passing through the choroid-scleral interface in these cases which obscures the inferior border of choroid layer. After excluding these three cases, the average accuracy rate of choroidal thickness measurement is 90.2%. Figures 7 and 8 show the results applied to the proposed segmentation method in various cases. The average execution time of each case was less than 6 s, and the case included 25 images. These simulations were made using a single CPU PC (Intel Core i7-9700 3.0 GHz) and GeForce RTX 2070 SUPER graphics card on a Linux operating system. The programming development environment utilized Python on Jupyter notebook. in these cases which obscures the inferior border of choroid layer. After excluding these three cases, the average accuracy rate of choroidal thickness measurement is 90.2%. Figures 7 and 8 show the results applied to the proposed segmentation method in various cases. The average execution time of each case was less than 6 s, and the case included 25 images. These simulations were made using a single CPU PC (Intel Core i7-9700 3.0 GHz) and GeForce RTX 2070 SUPER graphics card on a Linux operating system. The programming development environment utilized Python on Jupyter notebook.

Discussion
Many choroidal imaging biomarkers have been proposed in previous studies, from the boundaries of the choroid to the whole layer and components of the choroid, such as choroidal morphology, choroidal thickness, choroidal volume, thickness of choroidal vessel layer, and choroidal vascularity index [12]. This study attempted to develop a measurement method for choroidal thickness with greater convenience and accessibility.
There are five components of the choroid layer, including Bruch's membrane, Haller's layer (the outer large vessel layer), Sattler's layer (the inner medium vessel layer), choriocapillaris layer and suprachoroid [13]. Initially, we were unsure as to whether the Haller layer should be included as part of the choroidal thickness or not in consideration of the ambiguous boundary of the Haller's layer on OCT images. Esmaeelpour et al. compared the differences in the sublayers of choroid in intermediate AMD eyes with or without neovascular AMD in the fellow eye. They found significant thinning of Sattler's and Haller's layer in intermediate AMD with fellow eye nAMD, compared with healthy eyes and intermediate AMD without fellow eye nAMD [14]. Baek et al. noted different patterns of pachyvessel in chorioretinal diseases with diffuse distribution of pachyvessel in CSCR and thick choroid PCV and focal distribution of pachyvessel in non-neovascular AMD, neovascular AMD, and conversely thin choroid PCV [15]. Foo and associates noticed reduced choroidal vascularity index (CVI) of subfoveal Haller's layer as the initial presentation in diabetic patients before onset of diabetic retinopathy [16]. The importance of Haller's layer has been mentioned in multiple investigations, and we therefore included the Haller's layer in our study. Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 14

Discussion
Many choroidal imaging biomarkers have been proposed in previous studies, from the boundaries of the choroid to the whole layer and components of the choroid, such as choroidal morphology, choroidal thickness, choroidal volume, thickness of choroidal vessel layer, and choroidal vascularity index [12]. This study attempted to develop a measurement method for choroidal thickness with greater convenience and accessibility.
There are five components of the choroid layer, including Bruch's membrane, Haller's layer (the outer large vessel layer), Sattler's layer (the inner medium vessel layer), choriocapillaris layer and suprachoroid [13]. Initially, we were unsure as to whether the Haller layer should be included as part of the choroidal thickness or not in consideration of the ambiguous boundary of the Haller's layer on OCT images. Esmaeelpour et al. compared the differences in the sublayers of choroid in intermediate AMD eyes with or without neovascular AMD in the fellow eye. They found significant thinning of Sattler's and Haller's layer in intermediate AMD with fellow eye nAMD, compared with healthy eyes and intermediate AMD without fellow eye nAMD [14]. Baek et al. noted different patterns of pachyvessel in chorioretinal diseases with diffuse distribution of pachyvessel in CSCR and thick choroid PCV and focal distribution of pachyvessel in non-neovascular AMD, neovascular AMD, and conversely thin choroid PCV [15]. Foo and associates noticed reduced choroidal vascularity index (CVI) of subfoveal Haller's layer as the initial presentation in diabetic patients before onset of diabetic retinopathy [16]. The importance of Haller's layer has been mentioned in multiple investigations, and we therefore included the Haller's layer in our study.
Various methodologies exist for depicting the CSI and measuring choroidal thickness; in addition, they can be divided into two types. The first is non-deep learning, which Various methodologies exist for depicting the CSI and measuring choroidal thickness; in addition, they can be divided into two types. The first is non-deep learning, which is image processing and semi-automatic. The disadvantage of the image processing methodology is its time-consuming nature, which limits its convenience for use in clinical practice. The second is the deep-learning method, which overcomes the confounding effect of the noise and artifacts in the previous image processing course. Fang et al. presented CNNs combined with the graph search technique (CNN-GS) to distinguish the SD-OCT retinal layers of 39 non-exudative AMD patients with an error of 1.26 pixels between manual segmentation and CNN-GS methods [17]. Kugelman et al. replaced CNN with RNNs (recurrent neural networks) for depicting OCT retinal layers [18]. Seven retinal layers of healthy children and three retinal layer boundaries (including inner limiting membrane, inner boundary of RPE and Bruch's membrane) were automatically segmented through RNNs, and the average errors were 0.53 pixels and 1.17 pixels, respectively. Unsupervised anomaly detection based on generative adversarial networks (GANs) is able to produce a synthesized image and discriminator to identify the synthesized image [5,19]. They are independent and have their own learning state. Furthermore, GANs compete to enhance learning ability. GANs also have the benefit of being unmarked, which saves a lot of time, but hidden worries exist when used on medical images. This is because medical images should be labeled by physicians instead of using a discriminator to identify the synthesized image. Supervised learning learns from data provided by physicians, and these data are traceable for the standard answer of labeling. For rare diseases, data augmentation achieved via GAN can tackle the difficulty of collecting cases. GAN without 1 loss has been used to synthesize images of lung nodules with sufficient diversity [19]. Han et al. proposed a two-step unsupervised medical anomaly detection generative adversarial network (MADGAN) which was composed of the reconstruction step via Wasserstein loss with Gradient Penalty + 100 1 loss and the diagnosis step via 2 loss to discriminate subtle lesions on brain MRI in patients with early stage Alzheimer's disease and brain metastasis [20]. Its ability to identify diffuse lesions in multiple slices may play a role in recognizing multifocal chorioretinal lesions on OCT images. There are still inevitable noises in synthesized images, which may affect the strategy of the selection of training models and need further study.
Previous studies provided different deep-learning networks for automatic segmentation of retinal layers in OCT, in which the accuracy and practicability were found to be reliable. With respect to the use of deep-learning for depicting choroid, Masood et al. provided a three-step method for measuring choroidal thickness, which combined morphological manipulation and deep-learning technique [21]: binarization and reconstruction of Bruch's membrane, choroid segmentation with CNN networks, measurement of distance between Bruch's membrane, and CSI. Kugelman et al. used U-net to depict CSI [18]. They compared three networks including Cifar CNN (CNN 1), Complex CNN (CNN 2), and RNN networks for identifying boundaries of ILM, RPE, and CSI. Overall, the RNN model revealed the best results with a statistically significant lower error of 3.64 pixels in CSI localization, compared with Cifar CNN (error of 3.74 pixels) and Complex CNN (error of 3.97 pixels). In our study, we set 4 µm per pixel, 400 × 400 pixels per image. The choroid contour was defined with Bruch's Membrane and the choroid-scleral interface (CSI). An alternative deep-learning network other than CNN and RNN, which was composed of ResNet and FPN, was used. Four different methods of training models were utilized. However, the R50 model and OR model were significantly more accurate for depicting OCT choroid layer (errors of 4.85 pixels and 4.86 pixels, respectively) than the R100 and AND models (errors of 5.06 pixels and 5.04 pixels, respectively) as shown in Table 1. The increased tendency of overfitting accompanying the increasing training neural networks [22,23] may be the reason why error conversely increased when more detailed training data were given. The utilization of unsupervised training models for deep neural network was provided to overcome the issue of overfitting [23,24]. Table 1 provides information pertaining to average error and average execution time of different models, which allows physicians to choose the models they need for different clinical goals: OR model (R101 U R50) has higher sensitivity but is more time-consuming for screening pachychoroid diseases. The AND model (R101 ∩ R50) has the advantage of being time saving and has an acceptable error, so it can also be an alternative choice for clinical application. Mask-RCNN was used in our study, which multiplies feature channels rather than summation, providing faster and higher quality of choroid segmentation. A choroidal contour composed of continuous numerous points rather than a single point was acquired through this model, and the measurement of distance in various locations was available, which provided access to study peripapillary, subfoveal, or peripheral choroidal thickness.
There were two limitations in this study. First, there was still the difficulty of accurately depicting the boundary of the choroid layer in the cases with a very thick choroid layer, even though we had used EDI-OCT, which provides more detailed images of the choroid layer without pupillary dilation compared to conventional SD OCT [25]. The average choroidal thickness (defined as the subfoveal distance from RPE to CSI) measured with EDI OCT in previous studies ranged from 292 µm to 335 µm [25,26]. Kong et al. compared the accuracy of subfoveal choroidal thickness (SFCT) measurement of a healthy population using conventional SD OCT and EDI OCT. The accumulated accuracy of SFCT measurement with SD OCT was 82.2% in the SFCT ≤ 320 µm group, but accuracy decreased to 48.1% in the SFCT 281-320 µm group. The result became unreliable if SFCT > 360 µm, with an accuracy of 0% [27]. In contrast, Ayyildiz et al. found that there were consistent results in the measurement of peripapillary choroidal thickness between conventional SD-OCT and EDI-OCT [28]. Based on the findings described above, the choroidal thickness appears to matter more than the different kinds of OCT. Thus, in the current study, cases with an extremely ambiguous demarcation line between choroid and sclera due to too thick SFCT, even after adjusting the brightness and contrast of images, and cases with obvious media opacity, such as corneal diseases, dense cataract, severe vitreous opacity or haze, were excluded in order to improve the accuracy of input data. In our study, the accuracy rates of choroidal thickness measurements were 88.5% to 90% depending on different models, which was better than previous studies (Table 1). There were only three cases with choroidal thickness of more than 280 µm (288 µm, 296 µm, 352 µm) and accuracy rates were 93%, 97%, and 78%, respectively. The second limitation is that Mask-RCNN provides better velocity based on the designation of a feature map composed of various features rather than multi-stage analyzation for every single feature, but lower accuracy of every single channel of feature, compared to u-net.

Conclusions
This study proposes a new choroid segmentation method using the MASK R-CNN model to automatically depict Cob and Cib to combine the shallow and deep information of the ResNet of backbone through the FPN architecture to obtain a more complete spatial edge rendering. These methods overcome the major disadvantages, i.e., more time-consuming and poor accuracy compared with manual segmentation performed by physicians, of the previous graph and texture-based image processing methods. When compared to the Cifar CNN, Complex CNN, and RNN models used in recent studies [18], the Mask R-CNN model provides faster and more precise outcomes. However, the accuracy of depicting choroid layer declined in the extremely thick group in this study, a finding that has also been previously reported. Ophthalmologists have put focus on pathophysiology between choroid layer and various ocular diseases ranging from anterior segment (keratoconus, glaucoma) to posterior segment (AMD, PCV, optic neuropathy) [29][30][31][32][33][34]. Through the experimental thirty cases in our study, the feasibility of this Mask R-CNN model has been proved. A validation of the proposed automatic segmentation method needs to be performed in the future. Plans are underway to construct a choroidal thickness profile for different parameters, including every decade of age, as well as various spherical equivalent refractive error and axial lengths, of healthy people and patients with ocular diseases mentioned above, and non-ocular diseases, including perfusion-related diseases such as cardiovascular disease [35] and Alzheimer's disease [36]. The choroidal vascularity index (CVI), a novel OCT parameter that evaluates the choroid vasculature portions with the advantages of less variability, has been gradually emphasized [37]. We attempt to use this Mask R-CNN model to depict the luminal area and stromal area of choroid layer in order to analyze both choroidal thickness and choroidal vascularity index from OCT images in future study. Recent studies reveal the correlation between the change of choroidal thickness and anti-VEGF injections [38,39]. To explore this association via the proposed model is also part of our future plans. Overall, this model has potential for use as a robust automatic sketching scheme to evaluate the value of choroidal thickness on OCT images without human intervention and as a useful tool in clinical application to help diagnose and modify the treatment strategy for chorioretinal diseases.

Data Availability Statement:
The data supporting the findings of this study are available within the article and its Supplementary Materials. The image data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.