Retinal Image Analysis for Diabetes-Based Eye Disease Detection Using Deep Learning

: Diabetic patients are at the risk of developing di ﬀ erent eye diseases i


Introduction
Diabetes is a disease in which blood sugar levels are too high. Victims are at the risk of developing different eye diseases i.e., diabetic retinopathy (DR) [1], diabetic macular edema (DME) [2], and glaucoma [3] that may result in a form of complete vision loss. DR is a disease that damages the retina, and blurred vision, floaters, and sudden vision loss are the main DR symptoms. Hemorrhages, microaneurysms, hard and soft exudates are the abnormality signs [4] of DR.

•
Hard exudates are bright yellow-colored spots with a waxy appearance on the retina, which are formed because of the leakage of blood from vessels. • Soft exudates are white lesions on the retina that occur due to occlusion of the arteriole. • Hemorrhages develop due to blood leakage from damaged vessels and appear as dark red spots. • Microaneurysms developed due to distortions in the boundary of blood vessels and appear as small red dots on the retina. DME is another eye disease that occurs when a patient already suffers from DR and is a cause of vision loss. Additional medical conditions resulting from poor blood sugar control increase the risk of blindness for people with DME that can occur at any phase of DR, although it is more likely to occur later as the disease progresses. DME leads to an accumulation of fluid in the macula region resulting in swelling of the macula, which is the central part of the retina that is dedicated to central vision. Loss of central vision occurs due to any damage in the macula region. Blurry vision, double vision, floaters and blindness are the common symptoms of DME if untreated [5].
Glaucoma is an eye disease that damages the optic disk (OD) and optic cup (OC) and causes vision loss in advanced stages, calculated to affect approximately 80 million people around the world [6,7]. Some basic structural glaucoma signs are disc size, cup to disc ratio (CDR), the ratio of neuroretina rim in the inferior, superior, nasal and temporal quadrants, and peripapillary atrophy etc. are typically focused on the OD. OD is the morphological structure seen in the cross-sectional view of the optic nerve linking to the retina, and OC is the central part of OD. Glaucoma damages the optic nerve due to the imbalance in intraocular pressure (IOP) inside the eye. The affected nerve fibers deteriorates the retinal layer and increases the CDR and OD [8]. IOP causes injury to the nerve fibers comprising the optic nerve, and the OD initiates formation of a crater like-hole in front of the optic nerve head. Glaucoma causes the boundary of the disc to enlarge and its color changes from pink to pale.
Eye diseases are normally diagnosed by measuring IOP, obtaining a patient's medical history, performing visual field loss tests accompanied by visual evaluation of disease through ophthalmoscopy to analyze the color, size, and shape of the optic nerve [8]. Therefore, segmentation of the affected region is not only beneficial for further intensive clinical evaluation by experts but also effective for preparing a computer-based automatic procedure for classification and is further robust to localization errors [9]. Initially, ophthalmologists diagnose eye diseases through visual checkup of DR lesions, OD, by determining the CDR, diameter, and ratio of disc area, and border irregularity. However, because of the limited number of available ophthalmologists, early disease diagnosis is usually delayed [10], when the timely diagnosis and cure of disease have the potential to avoid vision loss. To deal with these shortcomings, research is aiming for automated glaucoma detection through computer-aided diagnosis (CAD)-based solutions.
For automated detection of eye diseases, handcrafted features have been used to differentiate between affected and normal regions of the images [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. However, these features cannot effectively represent the DR, DME and glaucoma regions because of color, size, higher intra-class variations, bright regions other than OD, thus resulting in unsatisfactory results of CAD solutions [26]. For performance optimization of CAD systems, glaucoma classification is performed through a correctly segmented glaucoma region called region of interest (ROI), which enhances the detection ability, as the affected region gives a good representation of the glaucoma characteristics. Therefore, the segmentation is the main phase before the classification for the performance improvement of CAD systems.
To locate and classify the DR lesions into different classes according to the severity level of moles.
The method suffers from high computational cost.
Zhang et al. [40] A DL framework named DeepDR was presented for DR detection. In addition, a new database for DR labelled DR images was also introduced.
The proposed network has attained the sensitivity value of 97.5%, along with the specificity value of 97.7%.
The introduced model needs to be evaluated on more complex and larger dataset.
Torre et al. [41] A DL based method was used to predict the expected DR class and assign scores to individual pixels to exhibit their relevance in each input sample. The assigned score was employed to take final classification decision.
The introduced DL framework acquired more than 90% of sensitivity and specificity values.
The evaluation performance of the presented algorithm can be improved through appropriate measures.
Rekhi et al. [2] The method was based on geometrical, morphological, and orientation features. The classification was performed through SVM.
Grading and classification of DME from fundus images with an accuracy of 92.11%.
The detection accuracy needs further improvement.
Kunwar et al. [5] The method was based on texture features and the SVM classifier.
high-risk DME detection with accuracy of 86%.
Experiments were performed on small dataset.
Marin et al. [30] The method was based on thresholding and regularized regression techniques. DME risk detection with 0.90 sensitivity.
The detection performance requires improvement.
Perdomo et al. [31] The presented method was composed of two-stage CNNs The method detects regions of interest in the retinal image and then predicts its class of DME The technique is computationally complex.
Jiang et al. [32] The end-to-end Region-based Convolutional Neural Network was used for OD and OC segmentation.
OD and OC segmentation with AUC of 0.901. The method is robust to glaucoma detection.
The method is computationally complex because it employs two separate RCNNs to compute the bboxes of the OC and OD, respectively.
Bajwa et al. [37] The localization was achieved through RCNN, while the other stage used deep CNN to classify the computed disc into glaucomatous or healthy.
Localization and classification of glaucoma with AUC of 0.874.
The method is computationally complex as it takes two-stage framework to localize and classify the glaucoma. The performance is affected by increasing the network hierarchy as it results in losing the discriminative set of features.
Zheng Lu et al. [38] The Modified U-Net model was improved by minimizing the original U-shape structure through adding 2-dimensional convolutional layer.
Before OD segmentation, the ground-truths were generated through the GrabCut method.
The presented technique requires less training, however, shows lower segmentation accuracy as compare latest approaches because of missing ground truths.
Ramani et al. [39] The region-based pixel density calculation method based on Circular Hough Transform with Hough Peak Value Selection and Red Channel Super-pixel method.
The technique is robust and efficient to optic disc segmentation.
The detection accuracy is affected over the images having pathological distractions.
Appl. Sci. 2020, 10, 6185 4 of 21 In the presented work, we present a deep learning (DL) approach that detects the bounding box (bboxes) of disease regions. Finally, instead of segmenting the whole image, we segment the localized regions. Experimentation proved that this approach is efficient to both complexity and time consumption. The proposed method is based on FRCNN and FKM clustering.
Our proposed method deals with the inadequate sample problem using unsupervised pre-training and supervised fine-tuning regions for localization. FRCNN is a single-stage training technique that classifies region proposals and refines their localization. In the test phase, FRCNN produces some class-independent region proposals of images and gets a fixed-size feature descriptor form each proposal using a CNN; afterward, the softmax layer classifies the regions accurately. Finally, FKM clustering precisely extracts the regions from localized regions. The proposed method detects the abnormalities of DR, DME, and Glaucoma regions simultaneously using FRCNN.
The contributions of our work are given below: • Early and automated detection of diabetes-based eye diseases regions using machine learning-based segmentation is a complex task. In the presented methodology, we used the FRCNN-based method for localization of disease regions. Our findings conclude that the combination of FRCNN with FKM clustering results in accurate localization of the affected areas, which ensure the precise recognition of the disease in an automated manner. • To accomplish the human-level performance over the challenging dataset i.e., ORIGA and MESSIDOR, the retinal images are represented by the FRCNN deep features, that are then segmented through the FKM clustering.

•
The proposed method can detect the signs of disease including early signs simultaneously and has no issue in learning to detect an image of a healthy eye.

•
The available datasets do not have bbox ground truths, so first, we developed the bbox annotations from given ground truths of the dataset which are necessary for the training of FRCNN.
The rest of our paper is organized as follows: Section 2 briefly explains the proposed architecture i.e., localization and segmentation of localized regions. In Section 3, the experimental results and their discussion are presented to highlight the significance of findings. Section 4 presents the general discussion and the last section follows with the conclusions and future work suggestions.

Proposed Methodology
Diabetes-based eye diseases detection from fundus images is considered a two-way method. The first is the detection and localization of disease, and the second step is the segmentation of localized regions using the FKM clustering. In the localization step, we utilize the FRCNN method. We develop the annotations for three diseases and passes to the FRCNN training which extracts the features from images and passes to the RoI pooling layer as an input of the group and bbox regression fully connected layer. The model is evaluated by using the test images to localize the affected portions with a score of regression confidence. At the last, FKM clustering is applied for the segmentation, which is considered as a robust method, especially for image segmentation. Figure 1 shows the framework of the proposed method. Figure 1. Framework of the proposed method. In the object detection phase, the ROI for the concerned disease is detected using FRCNN. In the segmentation phase, the detected regions of all three diseases are segmented out through FKM clustering.

Ground Truth Generation
The ground-truth bbox against each image is required to identify the affected region for the training process. The LabelImg [42] tool is used to annotate the retinal images and manually create a bboxes for each image. Error! Reference source not found. shows an example of an original image and the corresponding ground truth image. The annotations are saved in .xml files which includes the class of object and their bbox values i.e., xmin, ymin, xmax, ymax, width and height. Xml file is created against each image and these files are used to create the csv file, train.record file is created from csv file which is later used in the training process. We have generated the annotations of three diseases according to their signs, as shown in Error! Reference source not found.. The DR affected image has the five classes i.e., microaneurysms, soft exudates, hard exudate, hemorrhages, and background, DME has the two classes i.e., DME and background. The third disease glaucoma has the three classes i.e., OD, OC, and background class. In the object detection phase, the ROI for the concerned disease is detected using FRCNN. In the segmentation phase, the detected regions of all three diseases are segmented out through FKM clustering.

Ground Truth Generation
The ground-truth bbox against each image is required to identify the affected region for the training process. The LabelImg [42] tool is used to annotate the retinal images and manually create a bboxes for each image. Figure 2 shows an example of an original image and the corresponding ground truth image. The annotations are saved in .xml files which includes the class of object and their bbox values i.e., xmin, ymin, xmax, ymax, width and height. Xml file is created against each image and these files are used to create the csv file, train.record file is created from csv file which is later used in the training process.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 20 Figure 1. Framework of the proposed method. In the object detection phase, the ROI for the concerned disease is detected using FRCNN. In the segmentation phase, the detected regions of all three diseases are segmented out through FKM clustering.

Ground Truth Generation
The ground-truth bbox against each image is required to identify the affected region for the training process. The LabelImg [42] tool is used to annotate the retinal images and manually create a bboxes for each image. Error! Reference source not found. shows an example of an original image and the corresponding ground truth image. The annotations are saved in .xml files which includes the class of object and their bbox values i.e., xmin, ymin, xmax, ymax, width and height. Xml file is created against each image and these files are used to create the csv file, train.record file is created from csv file which is later used in the training process. We have generated the annotations of three diseases according to their signs, as shown in Error! Reference source not found.. The DR affected image has the five classes i.e., microaneurysms, soft exudates, hard exudate, hemorrhages, and background, DME has the two classes i.e., DME and background. The third disease glaucoma has the three classes i.e., OD, OC, and background class. We have generated the annotations of three diseases according to their signs, as shown in Table 2. The DR affected image has the five classes i.e., microaneurysms, soft exudates, hard exudate, hemorrhages, and background, DME has the two classes i.e., DME and background. The third disease glaucoma has the three classes i.e., OD, OC, and background class.

Localization Phase
The nature of the tasks is same i.e., to localize objects, therefore, the procedure applied remain consistent. Only difference comes in form of the number of classes that varies for all the three diseases.

Localization of DR Regions
The FRCNN takes the images and the region proposals as an input. Our method uses max-pooling and convolutional (C onv ) layers to processes the whole image and produces the feature map of C onv . From the C onv feature map, fixed-size feature vectors are extracted using the ROI pooling layer. And then supplied into a series of fully connected (fc) layers before branching into separate output layers, the first layer computes Softmax-probability estimates over k = 5 region classes and the other layer produces four real-valued numbers for five region classes which represent a position of the bbox for one of the five classes [43]. We have trained the model for multi-class DR object detection using FRCNN, which can accurately localize the multiclass objects from an image. For DR localization, there are 4 + 1 classes in total i.e., hemorrhages, microaneurysms, soft-, and hard-exudates, plus one background class.
The RoI pooling layer resizes the features into a fixed-size of hyper-parameters h × w (i.e., 7 × 7) by using the max-pooling process. In our proposed method, the RoI consists of a four-sided (d,e,f,g)) quadrilateral window, where (f,g) represent the height and width and (d,e) represent the top left corner. The max-pooling layer divides f × g, the window of the RoI, into F × G, a sub-division grid of size f /F × G/G, and then max-pooling applied independently in the sub-division values into corresponding output cell. The RoI pooling layer is an extraordinary instance of spatial pyramid layer utilized in SPPnets [44], in which there is just one pyramid pooling level. We utilized the calculation of pooling sub-window given in [44].

Localization of DME Region
FRCNN can be train to localize the binary class objects, we have used FRCNN for the localization of macula region from retinal image. In this case, we have two region classes i.e., background and macula region which are represented by 0 or 1. The region inside the bbox represents the macula and other portion of the image is considered as background.
Th max-pooling and C onv layers are used to processes the whole image and produces the feature map. From C onv feature map, fixed-size feature vectors are extracted using the ROI pooling layer. And then supplied into a series of fc layers before branching into separate output layers, the first layer computes softmax-probability estimates over k = 2 region classes and the other layer produces four real-valued numbers for two region classes of DME which represent a position of the bbox for one of the DME classes.

Localization of Glaucoma Regions
For recognition of the third eye disease i.e., glaucoma, we have trained our method for three classes i.e., OD, OC, and background. FRCNN localized the OD and OC regions, and rest of the portion of retinal image is considered as background. The max-pooling and C onv layers produces the feature map, fixed-size feature vectors are extracted using the ROI pooling layer from the C onv feature map. And then supplied into a series of fc layers before branching into separate output layers, the first layer computes Softmax-probability estimates over k = 3 region classes and the other layer produces four real-valued numbers for three region classes which represent a position of the box for one of the glaucoma classes.

Feature Extraction
Object localization approaches employ a sliding window to identify the region of interest, i.e., vehicles, buildings, etc. However, advances of DL methods like CNN have replaced the earlier detection approaches with good performance. However, these techniques are economically expensive as they stick to the sliding window method for localizing the objects, so FRCNN proposes regions by a selective search approach to improve the performance and can train all network weights with backpropagation. FRCNN is initialized by the CNN network and experiences three conversions: (i) the RoI layer substitutes the max-pooling layer with h = w = 7 for VGG16. (ii) the f c and softmax layers are changed with two sibling output layers which are the f c and Softmax over the k + 1 category and bbox regressor. (iii) Our proposed method takes two inputs: the list of images I(x,y) and the list of ROIs from the input images [45].
FRCNN can train all weights with backpropagation, which does not appear in SPPnet [26] because the SPP layer becomes inefficient when RoI derives from different sample images. In FRCNN training, mini-batches are sampled by using the "stochastic gradient descent" (SGD), first N images, and then R/N ROIs are sampled through every retinal sample. Mini-batch computation decreases as N decreases. For example, If N = 2, R = 128, and R/N become 64, which means the training method is 64 times faster than sampling a single RoI from 128 diverse samples (i.e., RCNN and SPPnets) [44].

Multi-Task Loss
FRCNN is the end-to-end learning framework that uses multi-task loss function to learn the class of region and the related bbox position and size. Each ROI is labeled with class name u and bbox regression target v. let p be the output of Softmax sibling layer that is the probability distribution over k classes including background class, and t be the result of regressor sibling layer that is bbox tuple for each class k [46].
We used a multi-task loss L on each labeled ROI to jointly train for classification loss and bbox regression loss as shown in Equation (1): The log loss for true class u is L cls (p,u) = -logp u and L loc (t u ,v) denotes a smooth L1 loss (S L1 ) for regression output as in Equation (2): where: In Equation (1), λ denotes a hypermeter that determines the relative weight of the regression loss versus the classification loss to the overall loss. we used λ = 1 for all experiments. The [u > = 1] means that there is no L loc defined for the background class. In Equation (3), x will be the distance between vectors. The RoI pooling layer uses backpropagation to transmit the derivatives as used in [47].

SGD Hypermeters
In the SGD hyperparameters process, fc layers of FRCNN use zero-mean Gaussian distributions along with standard deviations 0.01 and 0.001, respectively. In the initial step the values of biases are set to 0. Global learning rate (LR) for all layers is set to 0.001, and we execute SGD for iterating k Appl. Sci. 2020, 10, 6185 8 of 21 mini-batches. The parameter decay of 0.0005 and the value of momentum is 9 that is used on weights and biases.

Testing through FRCNN
Once the FRCNN model is trained, the localization expands by little extra than performing a forward pass, and object proposals are computed. The framework inputs the image and a list of R objects proposals to score. We used the probabilities of Softmax cross-entropy [48] to detect the DR affected regions, macula region, and glaucoma regions i.e., OD and OC. Each region has i-ths class, score and bbox value. To have a precise localization, the obtained proposals are applied to consider the region having a higher level IoU overlap.

Segmentation of Regions through Fuzzy K-Means Clustering
The localized regions are cropped using calculated coordinates to determine the accurate affected region's boundary. We have used the fuzzy k-means (FKM) technique separately for the segmentation of regions of three eye diseases. In the case of DR segmentation, the localized regions are segmented separately. The localized regions are set to foreground and the value of other portion is set to zero means black color. Similarly, the localized macular region is segmented through FKM clustering method. At last, we segmented the glaucoma localized regions i.e., OD and OC separately. The FKM method is described as follows: The method [49] divides the image into k regions where R i , (l = 1, 2, 3, . . . , k) are related to the cluster centered C r . There is a fuzzy relationship between regions and image data. The FKM method is based on minimizing distortion given by Equation (4): Here k is the number of clusters while the f is fuzzifier parameter that pointedly manipulates the data points and resultant clusters. While for a given data point X i and a cluster C j , b i,j ∈ [0,1] represents the belongings among the centers of the cluster C j and X i , while g i,j represents the distance among C j and X i . FKM clustering improves by mapping illustrative vectors and partitioning of data points [50]. It performs the following steps: (1) Specify the number of clusters C j .
(3) Compute the membership b(i, j) of all datapoints X i for each C j using following equation: where m is the fuzzification coefficient. (4) Update cluster centers using following equation: (5) Repeat from step 3, till the FKM is converged (the centroids updated between two passes is not greater than ε, the defined sensitivity threshold).
Finally, we have clustered the regions into affected and non-affected portions.

Datasets
We have employed five open source databases (i.e., Diaretdb1, MESSIDOR, HRF, DRHAGIS and ORIGA) that we utilized in our experiments for eye diseases detection, which do not have bbox ground truths, so we first developed the bbox annotations of the datasets which are necessary for the FRCNN training.
Diaretdb1 [51] is the dataset used for benchmarking of DR identification from retinal images. The Diaretdb1 database comprises 89 fundus samples of which 84 images contain the DR signs and the five remaining samples are considered as normal, that means they have no sign of DR. DR lesions are characterized by the following signs; hemorrhages, microaneurysms, hard and soft exudates. The images are of size 1500*1125 pixels and were taken with a fundus camera utilizing the same 50-degree field-of-view (FOV) with different image settings. The MESSIDOR database comprises 1200 images having two grades DR and DME. The images were captures with a 3CCD camera with a 45-degree FOV with a resolution of 2304 × 1536, 2240 × 1488, and 1440 × 960 pixels.
HRF stands for a high-resolution fundus database developed by the ophthalmology department, at Friendrich Alexander University Erlangen-Nuremberg (Erlangen, Germany) and the Brno University, Faculty of Electrical Engineering and Communication, Department of Biomedical Engineering, Brno (Czech Republic). The HRF database has a total of 45 images in which 15 images are of healthy patients, 15 images of patients with DR, and 15 images are from glaucomatous patients [52]. DRHAGIS stands for Diabetic Retinopathy Hypertension Age-related macular degeneration and Glaucoma images database, and was developed by Health Intelligence (Sandback, UK). The database consists of 40 images in which 10 images are affected with glaucoma [53]. ORIGA is the Online Retinal Fundus Image Database for Glaucoma analysis, with 650 data samples, containing 168 glaucomatous and 482 normal images. The images were collected from the Singapore Eye Research Institute [54] and were annotated by medical experts.

Evaluation Metrics
For analyzing the efficiency of our proposed technique, we have considered the following evaluation metrics: FRCNN performed the localization of eye-diseases by using the greedy intersecting measure of the predicted box and ground truth called Intersection over Union (IoU). The correct prediction is true positive TP and the other is a false positive: To calculate the average precision (AP) we iterate through test images as per precision. Equation (8) represents the mean average precision (mAP): Here T is the number of test images, AP(t i ) is the average precision for given test image category. This means that we calculate the AP for each category for given test image category, t i , and then the average of each category across all test images. Then all AP scores would then give us a single number which is mAP [55], it describes how good the trained model is for detecting bboxes with respect to bboxes of ground truth.
For segmentation, we have utilized the specificity (SP), sensitivity (SE), area under the curve (AUC), accuracy (Acc) and Dice coefficient (D c ) as the evaluation measures:

Results
This part demonstrates the proposed method results and the evaluation of the introduced technique with latest approaches.

Evaluation of FRCNN
To analyze the evaluation power of FRCNN, we evaluated it with other techniques like RCNN [56], and SPPnet [44], which use a similar pre-trained framework and bbox regression. SPPnet uses five scales in training and testing while FRCNN uses single scale testing and training. The RCNN technique achieves good performance by using a deep ConvNet, however, it has some limitations: i.e., training is expensive in time and space, ConvNet to SVMs to bbox regressors and the detection of objects is slow. We used FRCNN, that utilizes the ideas from SPP-net and RCNN and fixes the key issue in SPPnet by sharing computation of Conv layers among various proposals and exchanging the order of producing region proposals and running the CNN. FRCNN utilizes a backpropagation estimation, also added the bbox regression and classification head, and trained the model with a multi-task loss. FRCNN approach has a large improvement in mAP because of the fine-tuning process of Conv layers and reduces the disk storage due to no need for cache features. The proposed method achieves the precise localization of disease regions with mAP of 0.94 (in Table 3) in comparison with other approaches. Table 3. Performance comparison of the presented technique with other approaches.

Localization of DR Regions
For localization of the DR signs (i.e., hemorrhages, microaneurysms, hard and soft exudates), the affected regions are considered a positive example while other portion and background are considered a negative example. The overlapped region is labeled through the threshold value IoU, less than 0.3, considered the region as background. Similarly, IoU value greater than 0.7 the regions are considered as affected regions. We adopted the FRCNN method for DR lesions localization. The localization outcome of FRCNN as shown Figure 3 having 16 test retinal images with a confidence score. The test results show a higher score which is greater than 0.89 and up to 0.99. considered a negative example. The overlapped region is labeled through the threshold value IoU, less than 0.3, considered the region as background. Similarly, IoU value greater than 0.7 the regions are considered as affected regions. We adopted the FRCNN method for DR lesions localization. The localization outcome of FRCNN as shown Error! Reference source not found. having 16 test retinal images with a confidence score. The test results show a higher score which is greater than 0.89 and up to 0.99.

Localization of DME Regions
For performance evaluation of DME detection from retinal images, we have used the MESSIDOR dataset. FRCNN precisely localized the macula region and Softmax layer classify normal and DMEaffected regions. Visualization results of DME localization are shown in Figure 4, FRCNN localized the macular edema region at regressor layer with a mean average precision of 0.943.

Localization of DME Regions
For performance evaluation of DME detection from retinal images, we have used the MESSIDOR dataset. FRCNN precisely localized the macula region and Softmax layer classify normal and DMEaffected regions. Visualization results of DME localization are shown in Error! Reference source not found., FRCNN localized the macular edema region at regressor layer with a mean average precision of 0.943.

Localization of Glaucoma Regions
We adopted the FRCNN method for glaucoma localization. Given an image for the glaucoma region, RPN generates several random rectangular region proposals with associated region scores. The glaucoma localization outcome of FRCNN as shown in Error! Reference source not found. having 35 test images from three datasets. The test results show a higher score which is greater than 0.84 and up to 0.94. The precision of glaucoma localization is reported in Error! Reference source not found. on three datasets i.e., HRF, DR-HAGIS, and ORIGA (i.e., 0.946, 0.940, and 0.938 respectively) Our method achieved mAP over all datasets is 0.940, we can say our method can localize the glaucoma regions accurately.

Localization of Glaucoma Regions
We adopted the FRCNN method for glaucoma localization. Given an image for the glaucoma region, RPN generates several random rectangular region proposals with associated region scores. The glaucoma localization outcome of FRCNN as shown in Figure 5 having 35 test images from three datasets. The test results show a higher score which is greater than 0.84 and up to 0.94. The precision of glaucoma localization is reported in Table 4 on three datasets i.e., HRF, DR-HAGIS, and ORIGA (i.e., 0.946, 0.940, and 0.938 respectively) Our method achieved mAP over all datasets is 0.940, we can say our method can localize the glaucoma regions accurately.

Localization of Glaucoma Regions
We adopted the FRCNN method for glaucoma localization. Given an image for the glaucoma region, RPN generates several random rectangular region proposals with associated region scores. The glaucoma localization outcome of FRCNN as shown in Error! Reference source not found. having 35 test images from three datasets. The test results show a higher score which is greater than 0.84 and up to 0.94. The precision of glaucoma localization is reported in Error! Reference source not found. on three datasets i.e., HRF, DR-HAGIS, and ORIGA (i.e., 0.946, 0.940, and 0.938 respectively) Our method achieved mAP over all datasets is 0.940, we can say our method can localize the glaucoma regions accurately.

Segmentation Results
Extracting the localization of the affected regions not only generates a low dimensional initial sample that is economically effective but also enables deep NN to emphasis on the significant portion of the image.
The pixel-wise segmentation results of DR signs are presented in Figure 6. All four signs of DR are segmented separately which are localized through the FRCNN method. The segmented images are then compared with ground truth images. The proposed method results are evaluated by using the SE, SP, and Acc for all images of the test dataset. Table 5 shows that the proposed system obtained average scores of SE as 0.961, SP as 0.965, and Acc as 0.952. Our proposed method shows good performance due to the accurate localization of lesions by using FRCNN.
The localized macula region is segmented through FKM clustering. The visual segmentation results of DME are presented in Figure 7, which clearly shows the abnormalities i.e., exudates in macula region. The presented work achieved the average values of acc, SP, and SE as 0.958, 0.958, and 0.96, respectively.
found.. All four signs of DR are segmented separately which are localized through the FRCNN method. The segmented images are then compared with ground truth images. The proposed method results are evaluated by using the SE, SP, and Acc for all images of the test dataset. Error! Reference source not found. shows that the proposed system obtained average scores of SE as 0.961, SP as 0.965, and Acc as 0.952. Our proposed method shows good performance due to the accurate localization of lesions by using FRCNN.  The localized macula region is segmented through FKM clustering. The visual segmentation results of DME are presented in Error! Reference source not found., which clearly shows the abnormalities i.e., exudates in macula region. The presented work achieved the average values of acc, SP, and SE as 0.958, 0.958, and 0.96, respectively.   Error! Reference source not found. depicts the segmentation results of the glaucoma regions of three datasets. Error! Reference source not found. demonstrates that the proposed system attained average values of SE as 0.951, SP as 0.961, Acc as 0.952, and Di as 0.928. It is observed that the proposed methodology shows good segmentation performance due to the accurate localization of OD and OC by using FRCNN.   Table 6 demonstrates that the proposed system attained average values of SE as 0.951, SP as 0.961, Acc as 0.952, and Di as 0.928. It is observed that the proposed methodology shows good segmentation performance due to the accurate localization of OD and OC by using FRCNN.
The proposed method achieved good performance due to the accurate localization of OD and OC regions using FRCNN. However, in few images FRCNN detected the false OD regions (as shown in Figure 9 due to following reasons: (i) visual similarity of OD with brighter regions. (ii) Fails to detect the OC at low intensity areas.
Error! Reference source not found. depicts the segmentation results of the glaucoma regions of three datasets. Error! Reference source not found. demonstrates that the proposed system attained average values of SE as 0.951, SP as 0.961, Acc as 0.952, and Di as 0.928. It is observed that the proposed methodology shows good segmentation performance due to the accurate localization of OD and OC by using FRCNN.    The proposed method achieved good performance due to the accurate localization of OD and OC regions using FRCNN. However, in few images FRCNN detected the false OD regions (as shown in Error! Reference source not found. due to following reasons: (i) visual similarity of OD with brighter regions. (ii) Fails to detect the OC at low intensity areas.

Comparative Studies
In this section we compare our method for the DR, DME, and glaucoma detection against stateof-the-art methods. For DR detection we compared our method against the works of Zeng et al. [28], Gulshan et al. [57], Zhou et al. [58], Kaur et al. [59], Abbas et al. [60], and Colomer et al. [61] using a 10-fold cross validation scheme. The proposed method was implemented using Matlab-2019 and run on an Nvidia GTX1070 GPU-based system. The comparative results using accuracy (Acc), specificity (SP), sensitivity (SE), and area under the curve (AUC) are reported in Error! Reference source not found.. From the results it can be observed that our method achieved 95% accuracy, which is highest then all the compared methods. In terms of specificity, the work of Kaur et al. [59] was the closest to our method where the difference is just 0.5%. However, this method exhibited the lower sensitivity, where our method supersedes this approach by a significant margin of 8%. Although, the method of Zhou et al. [58] showed higher sensitivity then our method, where performance difference is 3%; but contrary to this, the method showed approximately 10% more performance loss than our method in terms of specificity. The higher sensitivity rate with lower specificity rate reflects that the method by Zhou et al. [58] mislabel the non-DR signs as DR, which may lead towards the imprecise diagnosis for the healthy patients. Similarly, in terms of AUC our method also outperformed all the comparative approaches. The consistent performance of the proposed method is attributed towards the low-resolution feature map generation based on the region proposals. The region proposals assist

Comparative Studies
In this section we compare our method for the DR, DME, and glaucoma detection against state-of-the-art methods. For DR detection we compared our method against the works of Zeng et al. [28], Gulshan et al. [57], Zhou et al. [58], Kaur et al. [59], Abbas et al. [60], and Colomer et al. [61] using a 10-fold cross validation scheme. The proposed method was implemented using Matlab-2019 and run on an Nvidia GTX1070 GPU-based system. The comparative results using accuracy (Acc), specificity (SP), sensitivity (SE), and area under the curve (AUC) are reported in Table 7. From the results it can be observed that our method achieved 95% accuracy, which is highest then all the compared methods. In terms of specificity, the work of Kaur et al. [59] was the closest to our method where the difference is just 0.5%. However, this method exhibited the lower sensitivity, where our method supersedes this approach by a significant margin of 8%. Although, the method of Zhou et al. [58] showed higher sensitivity then our method, where performance difference is 3%; but contrary to this, the method showed approximately 10% more performance loss than our method in terms of specificity.
The higher sensitivity rate with lower specificity rate reflects that the method by Zhou et al. [58] mislabel the non-DR signs as DR, which may lead towards the imprecise diagnosis for the healthy patients. Similarly, in terms of AUC our method also outperformed all the comparative approaches. The consistent performance of the proposed method is attributed towards the low-resolution feature map generation based on the region proposals. The region proposals assist in accurate localization of the DR even for the images suffering from the low illumination. Hence, based on the results it can be concluded that our method is more precise for DR sign detection as compared to the other approaches used for the comparison.  [68] and Xiaodong et al. [69] using the MESSIDOR dataset. The results of the comparison are presented in Table 8. In terms of sensitivity, our method has achieved the SE of 0.96 which is equal to Syed's method. However, in terms of specificity and accuracy, our method performed better than Syed's method and all other methods. Significant difference in terms of sensitivity can be observed against Li et al. [62], Lim et al. [65], and Rahim et al. [66], where this difference ranges from 11-26%. The method Lim et al. [65] which showed higher specificity in comparison to the sensitivity shows that the method considered the DME regions as non-DME regions, whereas, the methods, i.e., Rahim et al. [66] where sensitivity is higher than the specificity reflect that the method considered even the non-DME regions as DME. According to the Table 8, our method has achieved the SP of 0.958 which is higher than other methods (i.e., 0.55-0.95), although SP of Xiaodong's method is 0.97 but its SE value is lower than our method. Hence, the performance comparison reflects that our method reliably detects the DME.  [39], Parakash et al. [74] and Krishna et al. [12]. The comparison results using ORIGA, HRF, and DR HAGIS datasets are reported in Table 9. For glaucoma detection, the comparative methods have applied different sets of performance evaluation measures. For fair comparison, we have reported the performance of our method using all the performance evaluation measures i.e., SE, SP, AUC, dice score (D c ), and computation test time over all the three datasets. From the results we can observe that, our method has acquired the highest SE and AUC rates i.e., 0.945, and 0.947, respectively, that signifies the reliability of our approach. The proposed method achieved the specificity of 0.96 which is slightly lower (only for the HRF dataset) than the methods presented in [39] and [74], but both methods showed lower sensitivity rates, which clearly shows their inability to detect the glaucoma signs. In terms of SE over HRF dataset our method showed an approximately 10% performance gain over the work of Ramani et al. [39], and approximately 24% performance gain over the work of Parakash et al. [74]. In terms of AUC, and SE our method showed significant performance gain than all the comparative approaches. Liao et al. [70] also reported the performance of their method using D c measure, where our method also outperformed their work. Moreover, our technique can easily run on CPU or GPU machines and each image test time is 0.9 s which is faster than the work of Ramani et al. [39], which take 1.49 s. Hence, based on the result it can be concluded that our method is also equally reliable for the glaucoma detection.

Discussion
We have applied af RCNN technique for the localization and recognition of the diabetes-based eye diseases. Our method is based on fast RCNN and Fuzzy k-means clustering. Our main contribution is to present a consolidated model to target three eye diseases i.e., diabetic retinopathy, diabetic macular edema and glaucoma. In localization, the proposed model achieved mAP of 0.945, 0.943 and 0.941 for DR, DME and glaucoma, respectively. FRCNN utilizes a backpropagation estimation, also added the bbox regression and classification head, and trained the model with a multi-task loss. The proposed method detects the abnormalities of DR, DME, and glaucoma regions simultaneously using FRCNN and lastly, FKM clustering precisely extracts the regions from localized regions. For segmentation we have achieved the accuracy of 0.952, 0.958 and 0.9526 for DR, DME and glaucoma regions, respectively.
The existing literature emphasized on the recognition of these diseases individually and there is very limited amount of work available which expands the scope of the modern machine/deep learning models to simultaneously detect multiple eye diseases. The fundamental reason is that each disease is based on different abnormality signs which are specific to that particular eye disease. Moreover, the model which is optimized for one disease may give bad performance for other diseases. However, with this work we have addressed this misconception and came-up with a robust approach in the form of a model which can detect and recognize three different eye diseases with very high accuracy. Therefore, with this work we have proved that the deep learning models have the potential to detect multiple eye diseases just like an ophthalmologist. Moreover, this work is more towards the application end of the computer vision and our target audience is the engineers and manufacturers particularly those developing intelligent CAD systems.

Conclusions
In the presented work, a novel methodology is proposed based on FRCNN with FKM clustering for automated localization and recognition of diabetes-based eye diseases, i.e., glaucoma, DR and DME in retinal images. The proposed technique is composed of two phases: a disease detection and localization phase and the other is the segmentation of the localized regions through the FKM clustering. The FRCNN technique can extract the deep features with an optimal representation of eye diseases and increases the performance of segmentation in contrast with the latest solutions. The results demonstrate that the proposed solution achieved the mean IoU of 0.95 and mAP value above 0.94 into three diseases. Moreover, our proposed approach can also be utilized to resolve the different segmentation complexities of medical imaging as well. The research work will be extended by addressing other retinal image diseases i.e., cataracts, age-related macular edema degeneration, etc. in the future.