Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification

Ouda, Osama; AbdelMaksoud, Eman; Abd El-Aziz, A. A.; Elmogy, Mohammed

doi:10.3390/electronics11131966

Open AccessArticle

Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification

¹

Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia

²

Department of Information Technology, Faculty of Computers and Information, Mansoura University, Mansoura 35516, Egypt

³

Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia

⁴

Department of Information Systems and Technology, Faculty of Graduates Studies and Research, Cairo University, Giza 12613, Egypt

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(13), 1966; https://doi.org/10.3390/electronics11131966

Submission received: 3 May 2022 / Revised: 13 June 2022 / Accepted: 19 June 2022 / Published: 23 June 2022

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Designing computer-aided diagnosis (CAD) systems that can automatically detect ocular diseases (ODs) has become an active research field in the health domain. Although the human eye might have more than one OD simultaneously, most existing systems are designed to detect specific eye diseases. Therefore, it is crucial to develop new CAD systems that can detect multiple ODs simultaneously. This paper presents a novel multi-label convolutional neural network (ML-CNN) system based on ML classification (MLC) to diagnose various ODs from color fundus images. The proposed ML-CNN-based system consists of three main phases: the preprocessing phase, which includes normalization and augmentation using several transformation processes, the modeling phase, and the prediction phase. The proposed ML-CNN consists of three convolution (CONV) layers and one max pooling (MP) layer. Then, two CONV layers are performed, followed by one MP and dropout (DO). After that, one flatten layer is performed, followed by one fully connected (FC) layer. We added another DO once again, and finally, one FC layer with 45 nodes is performed. The system outputs the probabilities of all 45 diseases in each image. We validated the model by using cross-validation (CV) and measured the performance by five different metrics: accuracy (ACC), recall, precision, Dice similarity coefficient (DSC), and area under the curve (AUC). The results are 94.3%, 80%, 91.5%, 99%, and 96.7%, respectively. The comparisons with the existing built-in models, such as MobileNetV2, DenseNet201, SeResNext50, InceptionV3, and InceptionresNetv2, demonstrate the superiority of the proposed ML-CNN model.

Keywords:

ocular diseases; deep learning; multi-label classification; multi-label computer-aided diagnosis

1. Introduction

Vision-threatening ocular diseases (ODs), such as age-related macular degeneration (AMD), diabetic retinopathy (DR), cataracts, uncorrected refractive errors, and trachoma, have become remarkably common over the past two decades. A recent world report on vision from the World Health Organization (WHO) demonstrated that visually impaired persons worldwide exceed 2.2 billion. At least

45 %

of these cases could have been prevented or are yet to be addressed [1]. Trachoma, cataracts, and uncorrected refractive errors (e.g., myopia, astigmatism, hypermetropia, and presbyopia) are three leading causes of blindness and vision impairment. According to the WHO, more than 153 million people are visually impaired due to uncorrected refractive errors, almost 18 million people are bilaterally blind from cataracts, and approximately

10.6

million are diagnosed with trachoma [2]. Moreover, studies showed that AMD is the most common cause of blindness, particularly in developed countries, as it accounts for

8.7 %

(i.e., 3 million people) of all blindness worldwide. The number of cases is expected to increase to 10 million by 2040 [1,3]. Recent studies [4,5] also showed that out of the 37 million cases of blindness worldwide,

4.8 %

of cases are due to DR (i.e., 1.8 million persons). According to the WHO [6], more than 171 million people globally had diabetes in 2000. This number is projected to rise to 366 million by the year 2030. Nearly half of patients who have diabetes are unaware of their condition. About

2 %

of persons with diabetes become blind, and about

10 %

develop severe visual loss after 15 years. Moreover, more than

75 %

of patients will have some form of DR after 20 years of having diabetes. Thus, early diagnosis and timely treatment of ODs are vital to prevent irreversible vision loss. Ocular fundus imaging [7] is commonly utilized as an effective and economical tool for screening retina disorders and monitoring disease progression in ophthalmology. Compared to in-person ophthalmologist examination, retinal photography has high sensitivity, specificity, and inter-/intra-examination agreement. Thereby, retinal photographs can be used in place of ophthalmoscopy in many clinical situations. Advances in optical fundus imaging have made it easier to obtain high-quality retinal images, even without pupillary dilation. Fundus cameras offer several advantages. They are convenient for patients because of the single flash exposition of the floodlight. Moreover, they do not affect the image quality in different situations, such as a degradation reduction in cataract cases. In general, digital retinal photography can facilitate telemedical consultation, which provides increased access to accurate and timely sub-specialty care, particularly for under-served areas.

However, there are several challenges associated with diagnosing ocular diseases. First, common ODs, such as DR, cataracts, and AMD, progress with few initial visible symptoms, making it difficult to achieve precise diagnoses in the early stages [8]. Second, physicians may need a long time to diagnose the patient’s condition. Third, the diagnosis needs experts who are not available all the time. Fourth, despite the merits offered by ocular fundus imaging, it is sometimes difficult to obtain sufficient accurate fundus images, especially for some rare fundus diseases [9]. This is primarily because the produced fundus images have a minimal contrast and might contain features similar to eye anatomies, making it challenging to differentiate between them [10]. As a result, all signs of eye diseases might not be precisely discovered by an ophthalmologist. Thus, an accurate diagnosis of the exact disease grade would be challenging. To address the above-mentioned challenges, several computer-aided diagnosis (CAD) systems have been proposed to automate the process of OD detection [11]. Machine learning techniques have been widely employed for ocular disease diagnosis in such CAD systems [12]. Ocular diagnosis systems based on conventional classifiers, such as support vector machines (SVM) [13] and K-nearest neighbor (KNN) [14], demonstrated good performance for small datasets but work unwell in large-scale datasets. Thus, such methods might not suit OD detection, which is more challenging and specific. Moreover, conventional machine learning techniques require manual feature extraction, feature selection, and classification.

Deep learning (DL) has recently become the mainstream technology in computer vision. It has received extensive research interest in developing new medical image processing algorithms to support disease detection and diagnosis [15,16,17,18,19,20,21]. Compared to conventional machine learning technologies, DL methods avoid lesion segmentation and handcrafted feature identification and computation. These tasks, especially in retinal fundus images, burden the developer because of the previously mentioned problems. Thus, utilizing DL can develop CAD systems schemes more efficiently. Convolutional neural networks (CNN) have achieved revolutionizing success in fundamental computer vision and image processing problems, including classification and segmentation [22]. Highly discriminative features can be learned from raw pixel intensities using CNNs [23]. The first layers of a CNN can extract edges at particular locations and orientations in the image. The middle layers can detect structures composed of particular arrangements of edges. More complex structures that correspond to parts of familiar objects can be detected using the last layers [24].

Several different ODs can co-occur in the human eye. Therefore, a model needs to be optimized to diagnose the various ODs types based on multi-label classification (ML-C). ML-C means that each training example is associated with more than one label [25]. Prediction of a single label is the goal of typical classification tasks. However, it might be required to predict the likelihood across several class labels at the same time where the classes are generally assumed to be mutually exclusive. In ML-C, it is necessary to simultaneously output zero or more labels for each input sample [26]. Many studies were conducted to detect the various ODs, but a few utilized ML-C of the ODs as one patient can have more than one type of eye disease. Therefore, optimizing the models that support the concept of ML-C is very important. On the other hand, many studies merely apply the existing universal models in classifying ODs rather than building new or optimizing current architectures. That lessens the classification accuracy because the model can perform accurately in one field while it cannot be applied to the other.

This paper presents a novel CAD system for detecting various ODs using ML-C. We utilize a multi-label (ML) benchmark dataset, the retinal fundus multi-disease image dataset (RFMiD) [27], that contains 45 classes of fundus retinal images. Most of the images in the utilized dataset may include more than one disease simultaneously, as shown in Figure 1. Table 1 clarifies the different ODs shown in Figure 1, along with their labels and definitions. The proposed CAD system starts with preprocessing steps that include the normalization and augmentation by some transformation processes. After that, we train the divided data using our proposed CNN model. Then, we test the unknown images from different diseases. The results of the proposed CAD system are probabilities of the ODs. We utilize different evaluation metrics and compare the proposed system with various state-of-the-art schemes to demonstrate its efficiency and reliability. The main contributions of this work are summarized in the following points:

We propose an ML-CAD framework based on DL for simultaneous diagnosis of interleaved ODs from color fundus images.
The effectiveness of the proposed framework is verified utilizing a recent publicly available ML dataset (RFMiD) that contains a wide variety of challenging ocular diseases.
We compare the performance of the proposed framework, utilizing five different measures, with similar frameworks and built-in models. The experimental results illustrate the practicality and superiority of the proposed framework.
Compared to existing multiple ocular diseases frameworks that can detect at most ten ODs, the proposed framework can detect more than twenty-nine ODs.

The rest of this paper is organized into five sections. Section 2 presents the related works. It discusses the current limitations and highlights the main directions and solutions included in the proposed system to overcome the current shortcomings. Section 3 explains the detailed phases and techniques utilized in the proposed CAD framework. Section 4 describes the different experiments conducted and presents the findings. Section 5 introduces the discussion and provides a comparative analytical study of the proposed CAD system and other state-of-the-art techniques. Finally, Section 6 presents the conclusion of the work and findings and highlights future research directions.

Figure 1. Different retinal color fundus images with different ODs. (a) Normal, (b) DR, (c) RT, (d) MH and MS, (e) MH and DN, (f) MH, MYA, and ODC, (g) DR, LS, and TV, and (h) EDN, ODP, and TSLN.

Table 1. Different ODs with their labels, definitions, and indicators.

ODs	Label	Definition	Indicators
Diabetic Retinopathy [28]	DR	A microvascular complication of diabetes mellitus caused by high blood sugar levels damaging the back of the eye (retina).	Microaneurysms, retinal dot and blot hemorrhage, hard exudates, or cotton wool spots.
Age-related Macular Degeneration [29]	AMD	Known as macular degeneration and is caused by deterioration of the macula.	Multiple drusen in the macular region, geographic atrophy involving the fovea.
Media Haze [30]	MH	The opacity of media.	Cataracts, vitreous opacities, corneal edema, or small pupils.
Retinitis [31]	RS	It is inflammation of the retina.	Numerous microbes cause it. Vitreous inflammation, macular star, intraretinal hemorrhage, phlebitis, arteritis, and hyperemic disc.
Macular Scar [32]	MS	A scar at the central part of the retina.	The macular pucker separates from the retina.
Retinal Traction [33]	RT	The separation of the neurosensory retina from the underlying retinal pigment epithelium (RPE) due to the traction resulting from membranes in the vitreous or over the retinal surface.	Retinal ischemia and atrophy of the photoreceptor layer: neovascularization of iris (NVI), neovascularization of angle (NVA), and neovascular glaucoma (NVG).
Exudation [27]	EDN	Represented as a circle of exudates surrounding the macular area.	The hard exudates are white or yellowish lipid deposits with sharp edges.
Drusens [34]	DN	Yellow deposits under the retina. They are made up of lipids and proteins. Drusens likely do not cause ARMD, but they may be a sign of ARMD.	Tiny pebbles of debris that build up over time.
Myopia [35]	MYA	Objects in the distance appear blurred while close objects are often seen clearly.	Degenerative changes in the choroid, sclera. The eye is too long or the cornea is more curved.
Optic Disc Cupping [36]	ODC	The thinning of the neuroretinal rim such that the optic disc appears excavated. It is usually identified with glaucoma.	Congenital optic disc anomalies, ischaemic, hereditary, and traumatic optic neuropathies, or in situations in which the anterior visual pathway is compromised, such as intracranial aneurysms or tumors.
Laser Scars [27]	LS	Laser therapy treatment to stop the progression of vascular leaks.	Circular or irregularly shaped scars on the retinal surface.
Tortuous Vessels [27]	TV	Related to hypertension and diabetes.	Tortuosity of the retinal blood vessels.
Optic Disc Pallor [37]	ODP	The anatomic sequelae of atrophy of the anterior visual pathway with the loss of retinal ganglion cells.	Pale yellow discoloration of the optic disc and absence of many small vessels.
Tessellation [38]	TSLN	A common characteristic of myopic eyes and a clinical marker for the development of retinochoroidal changes.	It appears due to the thinning of RPE and choriocapillaris. The choroidal vessels are visible due to the reduced density of the pigments.

2. Related Work

This section reviews the existing diagnosis systems for ODs. We discuss the current limitations and highlight the main directions and remedies suggested in the proposed system to overcome the current shortcomings. For example, He et al. [8] presented a CAD system using a dense correlated network (DCNet) to classify color fundus images. They used a public dataset (ODIR 2019) based on seven types of ODs using ML-C. The authors utilized two fully connected layers with rectified linear unit (ReLU) activation in one of them. One dense layer is 512, and the other is 8 as the number of the output categories of the ODs. They employed an ML soft margin loss function. The main advantage of their method is that it can be used in multi-modal image analysis. However, they could not compare their model with other existing works because their method is patient-based, and the other studies are image-based. They shared the same backbone CNN to extract features from the right and left eyes to reduce the computation complexity. Hence, they could not handle the unbalanced distribution of patient cases.

Wang et al. [9] utilized transfer learning to extract features of the color fundus images and then utilized ML-C based on problem transformation. The authors utilized a multi-label dataset with eight labels. They utilized histogram equalization on gray and colored images. Then, they applied two classification models to the two images sets. Finally, they obtained the average of the sigmoid output probabilities from the two models. The main limitation of their work is the low network performance because of the variety of uncommon ODs categorized in the label “other diseases” in the utilized dataset. In addition, their system suffers from the data imbalance problem due to the limited data in some disease categories. Hence, some of the specific features learned are unknown.

Cheng et al. [39] used a graphical convolution network (GCN) to detect eight DR lesions (laser scars, drusen, cup disc ratio, hemorrhages, retinal arteriosclerosis, microaneurysms, hard and soft exudates) from color fundus images. They utilized ResNet-101, then two convolutional (CONV) layers with a

3 \times 3

kernel, stride 2, and adaptive max pooling for feature extraction. Their model’s accuracy (ACC) and receiver operating characteristic (ROC) values showed better detection results for laser scars, drusen, and hemorrhage lesions. In contrast, their system had poor detection ability for microaneurysms, soft exudates, and hard exudates. This is mainly because microaneurysms appeared as small red spots in the retinal capillaries. Thus, the model could not distinguish microaneurysms from the background of the fundus images. On the other hand, soft and hard exudate lesions often accompanied multiple other fundus lesions simultaneously, which made it difficult for the model to extract the features of all fundus lesions.

Dipu et al. [40] utilized transfer learning to detect eight ODs from the ODIR2019 dataset. They compared the performance of some state-of-the-art DL networks, such as Resnet-34, EfficientNet, MobileNetV2, and VGG-16. The authors trained the cutting-edge on the utilized dataset and reported the results. They evaluated the models’ performance by estimating the ACC. They ordered the models due to the resulting ACC into VGG-16, Resnet-34, MobileNetV2, and EfficientNet. However, the authors did not build a new model to detect the ODs. Moreover, calculating only ACC was not enough to estimate the model performance.

Choi et al. [41] proposed a CAD system using random forest transfer learning based on VGG-19. They utilized a small dataset in order to detect ten OD categories. They observed that the ACC increases by decreasing the classes that need to be detected to three. On the contrary, when they increased the categories to ten, the ACC decreased with a difference of about

30 %

. The authors tried to use an ensemble classifier with transfer learning and found a slight increase in ACC of

5.5 %

. Although the authors made augmentation, they could not achieve good performance because of the data imbalance.

Diaz Pinto et al. [23] utilized DL to detect glaucoma from color fundus images. First, they cropped the images around the optic disc. Then, other transformation processes were applied, such as random rotations, zooming by a range between 0 and 0.2, and horizontal and vertical flipping. They utilized VGG16, VGG19, InceptionV3, ResNet50, and Xception. Each architecture was followed by global average pooling. They used the softmax classifier. The authors used stochastic gradient descent (SGD) for updating weights. They set epochs to 100 and 250. The batch size was set to 8, the learning rate (LR) was set to

10^{- 4}

, and the momentum rate was 0.9. The fine-tuning performance was decreased when testing the CNNs on databases that were different from those used for training.

Tan et al. [42] classified AMD by using a CNN. Their proposed model consists of seven CONVs with a

3 \times 3

kernel, four max-pooling (MP) (

2 \times 2

), and three fully connected (FC) layers. The proposed model was fully automatic so that no hand-crafted feature extraction or selection was required. Moreover, no classifier was required as they involved meticulous engineering in designing a feature extraction module that could extract highly distinctive features for classification. The proposed model can be installed in a cloud system. On the other hand, the authors highlighted the issues of their model, as the overall diagnostic performance of their CNN model was poor and would improve with more extensive data. Furthermore, the proposed model suffered from convergence and overfitting problems. In addition, the training of the CNN model was slow and intensive. A summary of the most recent studies (published from 2018 to 2021) is shown in Table 2.

From the previous review of the current studies conducted recently, we can conclude their main limitations in diagnosing multiple ODs based on the ML-C concept as follows:

Increasing the number of classes decreases the model performance, mainly if the number of training samples is not sufficiently large;
Some systems are conservative and cannot be applied in the real world because of the imbalanced and/or insufficient datasets;
Some models suffer from overfitting;
The overall performance of the ML-C based models is minor compared to ML- or binary classification-based models because of overlapping labels.

To overcome the previous limitations, we propose a novel ML CAD system to accurately diagnose the different ODs from various ML color fundus images. We apply some augmentation processes to enlarge the training dataset and avoid overfitting and data imbalance issues. On the other hand, data augmentation improves the model performance. We utilize basic augmentation for preserving the labels after transformation. Moreover, we suggest a novel CNN model to extract feature maps (FM) of the augmented images. We customized the different hyper-parameters of the CONV layers, dense layers, dropout (DO), stride, kernel, filters, MP, optimizer, regularization, LR, loss function, number of epochs, batch size, and classifier to provide precise results. The proposed model can report the probability of each disease of the 45 ODs in each image. Finally, we evaluated the system performance using six different metrics and compared it with many current systems and models.

3. The Proposed ML CAD System

This section gives a detailed explanation of the proposed multiple OD ML-based CAD system. The proposed system consists of three phases. It starts by supplying the preprocessing phase with the ML dataset [27]. This phase aims to scale the images to a standard size and apply some transformation processes, such as vertical and horizontal flipping, rotation, and brightness, contrast, hue, and saturation adjustments. The normalized preprocessed images are then fed to the proposed CNN model in the feature extraction and ML-C phase. Finally, the prediction is obtained by testing the proposed CNN model. Figure 2 shows the proposed ML CAD system. We present below the three phases of the proposed system in detail.

3.1. Preprocessing

The preprocessing stage consists of two main phases, which are image resizing and data augmentation. These phases are discussed in the following subsections.

Image Resizing. All images are resized to a standard size of

128 \times 128

pixels. Although using the actual size could be helpful in learning, resizing the input images is essential to save memory and space and reduce the training time.

Data augmentation. Each image in the utilized dataset could contain multiple diseases out of forty-five different ODs. However, this dataset is imbalanced because the number of normal samples could greatly outnumber the samples of some abnormal classes (diseases). For instance, while the number of normal images (i.e., negative samples) in which no ODs appear in the training dataset is 401, the number of images in which some diseases appear, such as cotton-wool spots (CWS) and choroidal folds (CF), are less than 10. Thus, data augmentation methods should be utilized to increase the number of images in which rare diseases appear and reduce the positive-negative class imbalance.

Different transformation methods [43] can be applied to the image dataset to enlarge the dataset, solve class imbalances, and avoid overfitting. In this work, we utilized simple, safe, and manageable augmentation methods to preserve labels and encourage the model to learn more general features. The safety of a data augmentation method refers to its likelihood of preserving the label post-transformation. Precisely, we used different augmentation methods, including horizontal and vertical flipping, rotation, brightness change, saturation change, and hue change, to images of rare diseases to ensure that each label in the dataset occurs at least 100 times. Table 3 lists the utilized augmentation methods and the corresponding parameters’ values, and Figure 3 shows an example of an augmented image to 10 images. The number of training samples increased from 1920 to 4784 after data augmentation.

3.2. Feature Extraction and ML-C

The CNN general architecture is made up of CONV, pooling (PO), and FC layers [44,45]. Each component may at least consist of one layer. Different mapping functions and regulatory units, such as batch normalization (BN) and DO, are also included in the architecture to optimize the performance and avoid overfitting. The CONV operation picks up distinct features from the input color fundus image. It is performed with kernel filters to generate feature maps (FMs). Some down-samplings are included in the architecture, such as stride and PO. A stride is the number of units the filter slides upon for CONV and MP. On the other hand, each successive FM would get smaller after the CONV process without zero paddings. At last, the CONV layer output is passed to a non-linear activation function (AF), such as sigmoid and ReLU.

Pooling [46] regulates the CNN complexity and ensures the fixed size of the output. It decreases the learnable parameters and reduces FM size. Moreover, it increases the generalization and reduces overfitting. Pooling in CNNs can be MP or global pooling (GP), among other alternatives. The MP operation takes the highest value from each kernel. Finally, in FC (dense layer) [47], the output of CONV and PO layers is flattened and transformed into a 1D array. The weight connects each input with each output. The output nodes have the same number of classes.

It is necessary to focus on the arrangement of all components in CNN architecture. This arrangement plays a vital role in building new architectures and achieving the needed performance. Therefore, the proposed ML-CNN model consists of three CONV layers with 32 filters and a kernel size of

3 \times 3

, and the AF is ReLU. Then, the MP layer with a

2 \times 2

kernal is performed, followed by DO with 0.25 for regularization. Again, two CONV layers with 64 filters are performed, followed by one MP (

2 \times 2

) and DO with 0.25. After that, one flatten layer is performed, followed by one FC layer with 512 neurons. We add another DO with 0.5. Finally, one FC layer with 45 nodes is performed. The total trainable parameters are 29,589,613.

Figure 4 demonstrates in detail the layers of the proposed ML-CNN model, and Table 4 shows the proposed ML-CNN architecture. The configurations of the hyper-parameters utilized in the proposed ML-CNN model are presented in Table 5. The optimizer is the SGD with LR equal to 0.01, decay equal to

10^{- 6}

, and momentum rate equal to 0.9. The batch size is 32, the loss function is categorical cross-entropy, the AF of the final FC layer is sigmoid, and the number of epochs is 50.

We utilize the sigmoid function at the end of the ML-CNN model classifier to transform the raw output values into probabilities, which is the final understandable format. Moreover, the sigmoid function is suitable in the ML-C problem [48] because since there is more than one “right answer”, the outputs are not mutually exclusive. The probabilities produced by the sigmoid function are independent and are not constrained to sum to one. In addition, the sigmoid function allows us to have a high probability for all OD classes, some of them or none. For example, when classifying ODs in the color fundus image, the image might contain DR and/or RS, or none of those defects.

3.3. The Prediction

The last layer of the proposed model is a fully connected layer that consists of forty-five output neurons. The output of this layer provides the probabilities of the 45 ODs in the utilized RFMiD dataset. Based on the output probabilities, the model can predict whether each OD appears in the image presented to the model. For each label (disease), if the corresponding probability is larger than a preset cut-off value (0.50), our model confirms the presence of this disease in the presented image. Figure 5 shows an example of an image presented to the proposed model along with the top 10 ODs with the highest probabilities. It can be observed that only the top three ODs are confirmed by the model based on the preset cut-off value.

The proposed model is validated using the 10-fold cross-validation technique to reduce overfitting. The training set is split into k (=10) smaller subsets. The proposed ML-CNN model is trained using

k - 1

of the folds as the training set of data, and the resulting model is validated on the remaining part of the data. The output of the model and validation using five different metrics for evaluating the performance of the proposed model are discussed in detail in the next section.

4. Experimental Results

This section gives a detailed description of the dataset and explains the utilized performance metrics. Finally, we present the results obtained from applying the proposed ML-CNN model.

RFMiD is an ML dataset where each image contains multiple ODs. It consists of 3200 images, including normal and abnormal cases (669 images are normal, and the rest are abnormal). The abnormal images include 45 diseases or classes in which DR appears in 632 images, MH appears in 523 images, and ODC appears in 445 images. The remaining diseases appear in smaller numbers of images. The full list of diseases and the number of images in which each disease appears is shown in Table 6. In addition, the complete name of each disease can be found in [27]. All images were captured with three fundus cameras, namely, TOPCON 3D OCT-2000, TOPCON TRC-NW300 with a distance of 40.7 mm and a

45^{\circ}

field of view (FOV), and Kowa VX-10 with a distance of 39 mm and a

50^{\circ}

FOV. The images are in various resolutions, such as

2144 \times 1424

,

4288 \times 2848

and

2048 \times 1536

.

The utilized dataset contains a CSV file that includes the image ID, disease risk (presence of disease/abnormality), and 45 classes of ODs, which are found in the color fundus images. An image is assigned the value ‘0’, in the CSV sheet, if it does not have a specific disease (out of the 45 diseases) and labeled by ‘1’ otherwise. For instance, if the image includes DR, RS, and LS, but does not include the remaining ODs, only columns representing DR, RS, and LS, for this image, will be assigned the value ‘1’ in the CSV sheet.

4.1. The Performance Measures

We utilized five different metrics to evaluate the performance of the proposed ML-CAD system, namely, SEN/recall, the Dice similarity coefficient (DSC), accuracy (ACC), area under the ROC curve (AUC), and the positive predictive value (PPV), which are defined in Equations (1)–(5) [49].

S E N / r e c a l l = \frac{T P}{T P + F N},

(1)

D S C = \frac{2 \times T P}{2 \times T P + F P + F N},

(2)

A C C = \frac{T P + T N}{T P + T N + F P + F N},

(3)

P P V = \frac{T P}{T P + F P},

(4)

A U C (f) = \frac{\sum_{t_{0} \in D^{0}} \sum_{t_{1} \in D^{1}} 1 [f (t_{0}) < f (t_{1})]}{| D^{0} | \cdot | D^{1} |},

(5)

where

T P, F P, T N, F N

denote true positive, false positive, true negative, and false negative, respectively. f denotes the predictor (model),

D^{0}

is the set of negative examples,

D^{1}

is the set of positive examples, and

1 [\cdot]

denotes an indicator function that returns

1

if its argument is true and returns

0

otherwise. Each argument is determined for each class against the rest of the classes. This means that

T P

,

F P

,

T N

, and

F N

are evaluated for each category of classes separately.

4.2. The Results

The proposed framework was implemented using python 3.7 on the free cloud service “Google Colab”. We utilized the popular TensorFlow 2.4 machine learning library. We also utilized the open-source Python library OpenCV and “roboflow.ai” for the preprocessing steps and the DL Python open-source library “TFLearn” for classification. We ran all experiments using a local machine with a core i5/2.4 GHz, 8 GB RAM, and an NVIDIA VGA card with 1 GB VRAM.

4.2.1. Dataset Splitting (90% Training:10% Testing)

After we built the proposed ML-CNN model, we had to improve its performance by customizing the various hyperparameters such as the optimizer and its LR,

L_{1}

, and

L_{2}

regularization, number of epochs, and batch size (BS). We evaluated the model’s performance using the five performance measures described in the previous subsection for each tested combination to find the best combination of hyper-parameters. We split the dataset using the built-in library “sklearn” into

90 %

for the training set and

10 %

for the testing set. We assigned a random state (seeds) to be 114 and the shuffle to true. Table 7 shows the hyper-parameter optimization experiments of the proposed ML-CNN model.

The hyper-parameters have been customized in the network based on the experiments. We have considered the relationship between the loss function and learning rate, optimizer type, calculating stride, dropout values, and the number of epochs. The learning rate controls how quickly the model is adapted to the problem. Smaller learning rates require more epochs given the smaller changes made to the weights each update, whereas larger learning rates result in rapid changes and require fewer training epochs. From Table 7, we can observe that the hyper-parameters are 20 epochs, 4 BS, the Adam optimizer with (

L R = 10^{- 5}

), and 0.001 for

L_{1}

regularization. This achieved

100 %

for ACC,

100 %

for DSC,

94 %

for AUC,

67 %

for precision and recall curve.

Figure 6 and Figure 7 show the training and validation ACC and loss. Figure 8 shows the ROC curve for all the 45 classes of the utilized dataset. The figure shows that the micro average ROC curve area for all 45 classes or ODs is

94 %

. Figure 9 shows the precision and recall (PR) curve for all 45 classes of the utilized dataset. The figure shows that the micro average PR curve area for all 45 classes or ODs is

67 %

.

The proposed model outputs the images with a title that informs the physician of the predicted ODs by the system and the actual labels or ground truth (GT) of that image. Figure 10 shows an example of output predictions and the actual GTs. The image in (a) has TSLN disease, but it is predicted that it has DR, TSLN, and MH ODs. On the other hand, the image in (b) has the DR and LS diseases simultaneously, but it is predicted to have TSLN, DR, and MH ODs.

Now, we demonstrate the comparison between the proposed ML-CNN model and the other state-of-the-art models, such as MobileNetV2, DenseNet201, SeResNext50, Xception, InceptionV3, and InceptionresNetv2, through the five performance metrics defined above. Table 8 shows the resulting averages of ACC, SEN, PREC, DSC, Loss, and AUC for MobileNetV2, DenseNet201, SeResNext50, Xception, InceptionV3, InceptionresNetv2, and the proposed ML-CNN by using dataset split and 10-fold cross-validation techniques. We can observe that the proposed system with a 10-fold cross-validation technique is the best in SEN, PREC, and AUC.

Figure 11 shows the training and validation of ACC and loss of utilizing the InceptionV3 model on the same utilized dataset in our ML-CNN model. Figure 12 shows the training and validation of ACC and loss of utilizing the mobileNetV2 model on the same utilized dataset in our ML-CNN model. Figure 13 shows the training and validation of ACC and loss of utilizing the DenseNet201 model on the same utilized dataset in our ML-CNN model. Figure 14 shows the training and validation of ACC and loss of utilizing the seResNext model on the same utilized dataset in our ML-CNN model. Figure 15 shows the training and validation of ACC and loss of utilizing the Xception model on the same utilized dataset in our ML-CNN model. Figure 16 shows the training and validation of ACC and loss of utilizing the InceptionResNetV2 model on the same utilized dataset in the ML-CNN model.

4.2.2. K-Fold Cross-Validation

We also evaluated the proposed CAD system using 2-fold cross-validation (CV), 5-fold CV, and 10-fold CV. Table 9 shows the resulting averages of ACC, SEN, PREC, DSC, Loss, and AUC after applying 2-, 5-, and 10-fold CV. We can observe that the 10-fold CV achieves better results than the others, especially in SEN, PREC, and AUC. The values are

80 %

,

91.5 %

, and

96.7 %

, respectively. Table 10 presents in detail the AUC values for each class or ODs in the 2-, 5-, and 10-fold CV. We notice that all the k-folds CVs could not predict eight classes: TV, CWS, ODPM, HR, TD, VH, VS, and PLQ. On the contrary, DR, AMD, MH, DN, MYA, BRVO, TSLN, CSR, ODC, CRVO, LS, AH, ODP, ODE, and AION can be detected in all folds. The others may be detected in one or two folds.

5. Discussion

This section provides detailed analytical comparisons between the proposed ML-CNN system and the other studies reported in the literature for detecting various ODs. In the system proposed by Choi et al. [41], it was observed that increasing the classes to be predicted in MLC decreases the ACC. They achieved

30.5 %

for ACC of classification of ten classes. However, when they predicted only three classes, the ACC increased to

72.8 %

. Wang et al. [9] achieved

90 %

for ACC to make MLC of only eight classes from the ODIR2019 dataset by using the EfficientNet model, while Dipu et al. [40] achieved

93.82 %

for ACC in 2021. We achieved

100 %

,

100 %

,

94 %

,

81 %

, and

67 %

for ACC, DSC, AUC, PREC, and SEN, respectively. Apart from PREC and SEN, our system outperforms the system proposed by Wang et al. [9]. In the proposed ML-CNN, the AUC is higher than what is achieved by He et al. [8] by

1 %

. On the other hand, the proposed ML-CNN system achieved

94.3 %

,

80 %

,

91.5 %

,

99 %

, and

96.7 %

for ACC, SEN, PREC, DSC, and AUC, respectively, by using 10-fold CV.

Although ML-C suffers from the overlapped classes and is not mutually exclusive, it enables flexible (soft) classification. Each image may include more than one or two ODs simultaneously. If ML-C cannot predict all ODs, the patient is at risk. On the contrary, binary (hard) classification predicts only the presence/absence of the disease. ML-C gives the probability of occurring the ODs in each case. Our results are good enough compared to the other works that employ the ML-C concept to detect various ODs from color fundus images, such as Dipu et al. [40], He et al. [8], Wang et al. [9], Cheng et al. [39], and Choi et al. [41], as shown in Figure 17.

The k-fold CV reduces the overfitting but does not completely eliminate overfitting. Therefore, we will split data manually in the future. The limitations of our system are that it falls into overfitting in some epochs, and SEN/recall could be relatively low. In the case of increasing the epochs, the recall increases, and the ACC decreases.

6. Conclusions

ODs threaten human health. They can result from infections in other body parts or be indicators of any other disease. We developed a new ML-CNN system that depends on ML-C to diagnose various ODs from color fundus images. We utilized an ML dataset that includes 45 different diseases. After applying the augmentation processes required to enlarge the dataset, we split the dataset to

90 %

for training and

10 %

for testing. We validated the system using five performance measures: ACC, SEN, PREC, DSC, and AUC. We utilized a k-fold CV with different values of k to verify the obtained result. Specifically, we applied the 2-fold, 5-fold, and 10-fold CV. Furthermore, we compared the proposed ML-CNN system and other previously proposed systems to diagnose multiple ODs. Additionally, we compared our system with other built-in models by applying them to the same dataset. The proposed system gives promising performance for future research. In the future, we intend to establish a large and balanced ML dataset and split it manually to give us the flexibility to eliminate overfitting.

Author Contributions

Conceptualization, O.O., E.A. and M.E.; methodology, O.O., E.A. and M.E.; software, E.A.; validation, O.O. and M.E.; formal analysis, O.O. and M.E.; investigation, A.A.A.E.-A.; resources, E.A. and M.E.; data curation, E.A. and M.E.; writing—original draft preparation, E.A., M.E. and O.O.; writing—review and editing, O.O. and A.A.A.E.-A.; project administration, O.O. and M.E.; funding acquisition, O.O. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at Jouf University for funding this work through research grant no. (DSR2020-04-3654).

Data Availability Statement

This research study was conducted retrospectively using human subject data made available in open access by (accessed on 30 June 2021): https://idrid.grand-challenge.org/, https://www5.cs.fau.de/research/data/fundus-images/, https://ieee-dataport.org/open-access/retinal-fundus-multi-disease-image-dataset-rfmid.

Conflicts of Interest

The authors declare no conflict of interest.

References

Flaxman, S.R.; Bourne, R.R.; Resnikoff, S.; Ackland, P.; Braithwaite, T.; Cicinelli, M.V.; Das, A.; Jonas, J.B.; Keeffe, J.; Kempen, J.H.; et al. Global causes of blindness and distance vision impairment 1990–2020: A systematic review and meta-analysis. Lancet Glob. Health 2017, 5, e1221–e1234. [Google Scholar] [CrossRef] [Green Version]
WHO. World Report on Vision. 2019. Available online: https://www.who.int/publications/i/item/9789241516570 (accessed on 2 October 2021).
Wong, W.L.; Su, X.; Li, X.; Cheung, C.M.G.; Klein, R.; Cheng, C.Y.; Wong, T.Y. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: A systematic review and meta-analysis. Lancet Glob. Health 2014, 2, e106–e116. [Google Scholar] [CrossRef] [Green Version]
Cheyne, C.P.; Burgess, P.I.; Broadbent, D.M.; García-Fiñana, M.; Stratton, I.M.; Criddle, T.; Wang, A.; Alshukri, A.; Rahni, M.M.; Vazquez-Arango, P.; et al. Incidence of sight threatening diabetic retinopathy in an established urban screening programme: An 11-year cohort study. Diabet. Med. 2021, 38, e14583. [Google Scholar] [CrossRef] [PubMed]
Schultz, N.M.; Bhardwaj, S.; Barclay, C.; Gaspar, L.; Schwartz, J. Global burden of dry age-related macular degeneration: A targeted literature review. Clin. Ther. 2021, 43, 1792–1818. [Google Scholar] [CrossRef]
Wild, S.; Roglic, G.; Green, A.; Sicree, R.; King, H. Global prevalence of diabetes: Estimates for the year 2000 and projections for 2030. Diabetes Care 2004, 27, 1047–1053. [Google Scholar] [CrossRef] [Green Version]
Lim, G.; Bellemo, V.; Xie, Y.; Lee, X.Q.; Yip, M.Y.; Ting, D.S. Different fundus imaging modalities and technical factors in AI screening for diabetic retinopathy: A review. Eye Vis. 2020, 7, 1–13. [Google Scholar] [CrossRef] [Green Version]
He, J.; Li, C.; Ye, J.; Qiao, Y.; Gu, L. Multi-label ocular disease classification with a dense correlation deep neural network. Biomed. Signal Process. Control 2021, 63, 102167. [Google Scholar] [CrossRef]
Wang, J.; Yang, L.; Huo, Z.; He, W.; Luo, J. Multi-Label Classification of Fundus Images With EfficientNet. IEEE Access 2020, 8, 212499–212508. [Google Scholar] [CrossRef]
Azzopardi, G.; Strisciuglio, N.; Vento, M.; Petkov, N. Trainable COSFIRE filters for vessel delineation with application to retinal images. Med. Image Anal. 2015, 19, 46–57. [Google Scholar] [CrossRef] [Green Version]
Zhang, Z.; Srivastava, R.; Liu, H.; Chen, X.; Duan, L.; Wong, D.W.K.; Kwoh, C.K.; Wong, T.Y.; Liu, J. A survey on computer aided diagnosis for ocular diseases. BMC Med. Inform. Decis. Mak. 2014, 14, 1–29. [Google Scholar] [CrossRef]
Sarhan, M.H.; Nasseri, M.A.; Zapp, D.; Maier, M.; Lohmann, C.P.; Navab, N.; Eslami, A. Machine learning techniques for ophthalmic data processing: A review. IEEE J. Biomed. Health Inform. 2020, 24, 3338–3350. [Google Scholar] [CrossRef]
Burlina, P.M.; Joshi, N.; Pekala, M.; Pacheco, K.D.; Freund, D.E.; Bressler, N.M. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017, 135, 1170–1176. [Google Scholar] [CrossRef]
Chen, X.; Xu, Y.; Duan, L.; Yan, S.; Zhang, Z.; Wong, D.W.K.; Liu, J. Multiple ocular diseases classification with graph regularized probabilistic multi-label learning. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; Springer: Cham, Switzerland, 2014; pp. 127–142. [Google Scholar]
Li, Z.; Keel, S.; Liu, C.; He, Y.; Meng, W.; Scheetz, J.; Lee, P.Y.; Shaw, J.; Ting, D.; Wong, T.Y.; et al. An automated grading system for detection of vision-threatening referable diabetic retinopathy on the basis of color fundus photographs. Diabetes Care 2018, 41, 2509–2516. [Google Scholar] [CrossRef] [Green Version]
Richards, B.A.; Lillicrap, T.P.; Beaudoin, P.; Bengio, Y.; Bogacz, R.; Christensen, A.; Clopath, C.; Costa, R.P.; de Berker, A.; Ganguli, S.; et al. A deep learning framework for neuroscience. Nat. Neurosci. 2019, 22, 1761–1770. [Google Scholar] [CrossRef]
Wang, Z.; Keane, P.A.; Chiang, M.; Cheung, C.Y.; Wong, T.Y.; Ting, D.S.W. Artificial intelligence and deep learning in ophthalmology. Artif. Intell. Med. 2020, 103, 167–175. [Google Scholar]
Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016, 316, 2402–2410. [Google Scholar] [CrossRef]
Litjens, G.; Ciompi, F.; Wolterink, J.M.; de Vos, B.D.; Leiner, T.; Teuwen, J.; Išgum, I. State-of-the-art deep learning in cardiovascular image analysis. JACC Cardiovasc. Imaging 2019, 12, 1549–1565. [Google Scholar] [CrossRef]
Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
Rifaioglu, A.S.; Doğan, T.; Martin, M.J.; Cetin-Atalay, R.; Atalay, V. DEEPred: Automated protein function prediction with multi-task feed-forward deep neural networks. Sci. Rep. 2019, 9, 7344. [Google Scholar] [CrossRef] [Green Version]
Sultan, A.S.; Elgharib, M.A.; Tavares, T.; Jessri, M.; Basile, J.R. The use of artificial intelligence, machine learning and deep learning in oncologic histopathology. J. Oral Pathol. Med. 2020, 49, 849–856. [Google Scholar] [CrossRef]
Diaz-Pinto, A.; Morales, S.; Naranjo, V.; Köhler, T.; Mossi, J.M.; Navea, A. CNNs for automatic glaucoma assessment using fundus images: An extensive validation. Biomed. Eng. Online 2019, 18, 1–19. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Tsoumakas, G.; Katakis, I. Multi-label classification: An overview. Int. J. Data Warehous. Min. 2007, 3, 1–13. [Google Scholar] [CrossRef] [Green Version]
Wehrmann, J.; Cerri, R.; Barros, R. Hierarchical multi-label classification networks. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 5075–5084. [Google Scholar]
Pachade, S.; Porwal, P.; Thulkar, D.; Kokare, M.; Deshmukh, G.; Sahasrabuddhe, V.; Giancardo, L.; Quellec, G.; Mériaudeau, F. Retinal Fundus Multi-Disease Image Dataset (RFMiD): A Dataset for Multi-Disease Detection Research. Data 2021, 6, 14. [Google Scholar] [CrossRef]
Diabetic Retinopath. Available online: https://www.mayoclinic.org/diseases-conditions/diabetic-retinopathy/symptoms-causes/syc-20371611 (accessed on 2 December 2021).
Health Line. Available online: https://www.healthline.com/health/macular-degeneration (accessed on 2 December 2021).
Kawali, A.; Mahendradas, P.; Sanjay, S.; KM, P.; Yadav, J.; Panchagnula, R.; KS, S. Diagnostic and Therapeutic Challenges in Ocular Histoplasmosis—A Case Report. Ocul. Immunol. Inflamm. 2022, 30, 149–152. [Google Scholar] [CrossRef]
Hartong, D.T.; Berson, E.L.; Dryja, T.P. Retinitis pigmentosa. Lancet 2006, 368, 1795–1809. [Google Scholar] [CrossRef]
Associated Retina Consultants. Available online: https://associatedretinaconsultants.com/can-macular-pucker-heal/ (accessed on 4 December 2021).
Mishra, C.; Tripathy, K. Retinal traction detachment. In Statpearls [Internet]; StatPearls Publishing: Treasure Island, FL, USA, 2021. [Google Scholar]
American Academy of ophthalmology. Available online: https://www.aao.org/eye-health/diseases/what-are-drusen (accessed on 4 December 2021).
Optegra. Available online: https://www.optegra.com/conditions/myopia/ (accessed on 5 December 2021).
Rebolleda, G.; Noval, S.; Contreras, I.; Arnalich-Montiel, F.; Garcia-Perez, J.; Munoz-Negrete, F. Optic disc cupping after optic neuritis evaluated with optic coherence tomography. Eye 2009, 23, 890–894. [Google Scholar] [CrossRef]
Quigley, H.A.; Dunkelberger, G.R.; Green, W.R. Retinal ganglion cell atrophy correlated with automated perimetry in human eyes with glaucoma. Am. J. Ophthalmol. 1989, 107, 453–464. [Google Scholar] [CrossRef]
Yoshihara, N.; Yamashita, T.; Ohno-Matsui, K.; Sakamoto, T. Objective analyses of tessellated fundi and significant correlation between degree of tessellation and choroidal thickness in healthy eyes. PLoS ONE 2014, 9, e103586. [Google Scholar] [CrossRef]
Cheng, Y.; Ma, M.; Li, X.; Zhou, Y. Multi-label classification of fundus images based on graph convolutional network. BMC Med. Inform. Decis. Mak. 2021, 21, 1–9. [Google Scholar] [CrossRef]
Dipu, N.M.; Shohan, S.A.; Salam, K. Ocular Disease Detection Using Advanced Neural Network Based Classification Algorithms. Asian J. Converg. Technol. 2021, 7, 91–99. [Google Scholar] [CrossRef]
Choi, J.Y.; Yoo, T.K.; Seo, J.G.; Kwak, J.; Um, T.T.; Rim, T.H. Multi-categorical deep learning neural network to classify retinal images: A pilot study employing small database. PLoS ONE 2017, 12, e0187336. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tan, J.H.; Bhandary, S.V.; Sivaprasad, S.; Hagiwara, Y.; Bagchi, A.; Raghavendra, U.; Rao, A.K.; Raju, B.; Shetty, N.S.; Gertych, A.; et al. Age-related macular degeneration detection using deep convolutional neural network. Future Gener. Comput. Syst. 2018, 87, 127–135. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Chandore, V.; Asati, S. Automatic detection of diabetic retinopathy using deep convolutional neural network. Int. J. Adv. Res. Ideas Innov. Technol. 2017, 3, 633–641. [Google Scholar]
Srinivas, S.; Sarvadevabhatla, R.K.; Mopuri, K.R.; Prabhu, N.; Kruthiventi, S.S.; Babu, R.V. Chapter 2—An Introduction to Deep Convolutional Neural Nets for Computer Vision. In Deep Learning for Medical Image Analysis; Zhou, S.K., Greenspan, H., Shen, D., Eds.; Academic Press: Cambridge, MA, USA, 2017; pp. 25–52. [Google Scholar] [CrossRef]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [Green Version]
Ahmadi, M.; Vakili, S.; Langlois, J.P.; Gross, W. Power reduction in cnn pooling layers with a preliminary partial computation strategy. In Proceedings of the 2018 16th IEEE International New Circuits and Systems Conference (NEWCAS), Montreal, QC, Canada, 24–27 June 2018; pp. 125–129. [Google Scholar]
Rachel, D. Glass Box Machine Learning and Medicine. Available online: https://glassboxmedicine.com/2019/05/26/classification-sigmoid-vs-softmax/ (accessed on 21 October 2021).
Thada, V.; Vivek, J. Comparison of Jaccard, Dice, Cosine Similarity Coefficient To Find Best Fitness Value for Web Retrieved Documents Using Genetic Algorithm. Front. Comput. Sci. 2013, 2, 202–205. [Google Scholar]

Figure 2. The proposed ML CAD system for detecting and diagnosing various ODs from ML color fundus images.

Figure 3. An example of the augmented images.

Figure 4. The layers of the proposed ML-CNN model.

Figure 5. An example of an output image with the top 10 OD probabilities.

Figure 6. The training and validation accuracy through 20 epochs of the proposed ML-CNN mode.

Figure 7. The training and validation loss through 20 epochs of the proposed ML-CNN model.

Figure 8. The ROC curves of the proposed ML-CNN model.

Figure 9. The precision and recall curve of the proposed ML-CNN model.

Figure 10. The output images with the predicted ODs and the actual GTs (a) an image that has TSLN and predicted to have DR, TSLN, and MH ODs, and (b) an image that has DR and LS diseases and predicted to have TSLN, DR, and MH ODs.

Figure 11. The training and validation ACC and loss through 50 epochs of the InceptionV3 model.

Figure 12. The training and validation ACC and loss through 50 epochs of the mobileNetV2 model.

Figure 13. The training and validation ACC and loss through 50 epochs of the DenseNet201 model.

Figure 14. The training and validation ACC and loss through 50 epochs of the seResNext model.

Figure 15. The training and validation ACC and loss through 50 epochs of the Xception model.

Figure 16. The training and validation ACC and loss through 50 epochs of the InceptionResNetV2 model.

Figure 17. The comparison between some current ML-C studies and the proposed ML-CNN.

Table 2. A summary of some current studies from 2018 to 2021. AUC: area under curve, ACC: accuracy, SPE: specificity, SEN: sensitivity, Kappa score,

F_{β}

: F-Beta, and DSC: Dice similarity coefficient.

Table 2. A summary of some current studies from 2018 to 2021. AUC: area under curve, ACC: accuracy, SPE: specificity, SEN: sensitivity, Kappa score,

F_{β}

: F-Beta, and DSC: Dice similarity coefficient.

Study	Year	Analysis Type	Methodology	Dataset	Performance
He et al. [8]	2021	ML-C of eight ODs	ResNet-50, SCM	ODIR2019	Kappa = $0.637$ , DSC $= 91.3 %$ , AUC $= 93.0 %$ .
Wang et al. [9]	2020	ML-C of eight ODs	EfficientNet	ODIR2019	ACC $= 90 %$ , PREC $= 66 %$ , SEN $= 50 %$ , $F_{β} = 89 %$ , AUC = $67 %$ , Kappa $= 0.43$ , DSC $= 85 %$
Cheng et al. [39]	2021	ML-C of DR lesions	ResNet-101, GCN	Private	DSC and the average DSC per-class were $80.8 %$ and $79.2 %$ , respectively. AUC were $98.6 %$ , $95.4 %$ , $94.6 %$ , $95.7 %$ , $95.2 %$ , $88.9 %$ , $93.7 %$ and $92.6 %$ for laser scars, drusen, cup disc ratio, hemorrhages, retinal arteriosclerosis, microaneurysms, hard and soft exudates, respectively.
Dipu et al. [40]	2021	ML-C of eight ODs	ResNet-34, MobileNetV2, EfficientNet	ODIR2019	VGG-16 ACC $= 97.23 %$ ; Resnet-34 ACC $= 90.85 %$ ; MobileNetV2 ACC $= 94.32 %$ , and EfficientNet ACC $= 93.82 %$ .
Choi et al. [41]	2017	ML-C of ten ODs	VGG-19, RF, SVM	STARE	For 10 classes: ACC $= 30.5 %$ , Kappa $= 0.224$ . For 3 classes: ACC $= 72.8 %$ , Kappa $= 0.577$
Diaz Pinto et al. [23]	2019	Binary classification to detect Glaucoma	VGG16, 19, InceptionV3, ResNet50 and Xception	ACRIMA, HRF, Drishti-GS1, RIM-ONE, sjchoi86-HRF	For ACRIMA, AUC $= 76 %$ with a $95 %$ confidence interval (CI) of $68.41$ – $81.81 %$ was obtained. For the others, AUC $= 83, 80, 85$ and $77 %$ , respectively.
Tan et al. [42]	2018	Binary classification to detect ARMD	CNN model	Kaggle	ACC $= 91.17 %$ and $95.45 %$ with the blindfold and ten-fold cross validation strategies, respectively.

Table 3. Data augmentation parameters.

Augmentation Method		Parameters
Color Jitter	Brightness change	[−25% to +25%]
	Saturation change	[−25% to +25%]
	Hue change	[ $- 45^{\circ}$ to $+ 45^{\circ}$ ]
Affine Transformation	Flipping	Vertical, horizontal, and both directions
Affine Transformation	Rotation	$- 90^{\circ}, + 90^{\circ}$

Table 4. The proposed ML-CNN model.

Layers	Type	FMs	Neurons	Kernel	Trainable Params
0	Input	3	$128 \times 128 \times 3$	-	0
1	CONV	32	$128 \times 128 \times 32$	$3 \times 3 \times 3$	896
2	CONV	32	$126 \times 126 \times 32$	$3 \times 3 \times 32$	9248
3	CONV	32	$124 \times 124 \times 32$	$3 \times 3 \times 32$	9248
4	MP	32	$62 \times 62 \times 32$	$2 \times 2$	0
5	DO(0.25)	32	$62 \times 62 \times 32$	-	0
6	CONV	64	$62 \times 62 \times 64$	$3 \times 3 \times 32$	18,496
7	CONV	64	$60 \times 60 \times 64$	$3 \times 3 \times 64$	36,928
8	MP	64	$30 \times 30 \times 64$	$2 \times 2$	0
9	DO(0.25)	64	$30 \times 30 \times 64$	-	0
10	Flatten	-	57,600	-	0
11	FC	-	512	-	29,491,712
12	DO(0.5)	-	512	-	0
13	FC	-	45	-	23,085
Total parameters: 29,589,613
Trainable parameters: 29,589,613
Non-trainable parameters: 0

Table 5. The hyper-parameter values of the proposed ML-CNN model.

Configuration	Value
The optimizer	SGD
Optimizer parameters	LR = 0.01, decay = $10^{- 6}$ , momentum = 0.9
Batch size	32
Epochs	50
Loss	categorical cross entropy
Classifier	Sigmoid

Table 6. Annotation frequency for each class in the RFMiD dataset.

Disease	# Images	Disease	# Images	Disease	#Images
DR	632	RPEC	32	PTCR	6
MH	523	MS	27	CF	6
ODC	445	ERM	26	PRH	5
TSLN	304	AION	26	CRAO	4
DN	230	AH	25	VH	4
ARMD	169	RT	25	VS	4
MYA	167	EDN	24	BRAO	4
BRVO	119	PT	19	MNF	3
ODP	115	MHL	17	CB	2
ODE	96	ST	11	ODPM	2
LS	79	TV	10	PLQ	2
RS	71	RP	10	CL	2
CSR	61	TD	9	HR	1
CRS	54	CWS	8	MCA	1
CRVO	45	CME	7	HPED	1

Table 7. The hyper-parameters of the proposed ML-CNN model to diagnose various ODs.

Parameters	ACC (%)	DSC (%)	Prec (%)	Sen (%)	Loss
Epochs = 50, optimizer = SGD ( $L_{R} = 10 - 4$ )
Batch size (BS) = 4	98	88	82	44	0.1
BS = 8	45	97	15	50	0.6
BS = 12	96	96	81	48	0.18
BS = 16	97	94	87	47.2	0.1
BS= 32	97	96	88	48	0.2
BS = 64	17	99.3	32	55	0.6
Epochs = 20, BS = 4
Optimizer = Adam ( $L_{R} = 10 - 4$ )	95	99	81	58	0.08
Optimizer = Adam ( $L_{R} = 10 - 6$ )	100	100	81	67	0.05
Epochs = 50, optimizer = Adam ( $L_{R} = 10 - 3$ )
BS = 4	61	98	95	94	0.01
BS = 8	64	99	95	94	0.01
BS = 12	82	99	85	73	0.04
BS = 16	67	99.5	95	95	0.007
BS = 32	91	100	83	65	0.03
BS = 64	86	100	86	71	0.1
Epochs = 50, Adam ( $L_{R} = 5 \times 10 - 4$ )
BS = 8	65	100	98	97	0.004
Epochs = 50, Adam ( $L_{R} = 5 \times 10 - 5$ )
BS = 64	94	100	73	54	0.08
Epochs = 50, optimizer = RMSprop ( $L_{R} = 10 - 4$ )
BS = 4	98	100	83	56	0.06
BS = 8	93	100	83	61	0.05
BS = 12	84	100	89	79	0.03
BS = 16	80	100	88	80	0.02
BS = 32	84	100	86	71	0.04
BS = 64	90	98	82	62	0.05
BS = 4, optimizer = Adagrad ( $L_{R} = 0.01$ )	93	100	60	80	0.06
BS = 4, optimizer = Adadelta ( $L_{R} = 0.01$ )	98	95	80	48	0.09
BS = 4, optimizer = Adamax ( $L_{R} = 0.001$ )	84	90	70	56	0.24
BS = 4, optimizer = Nadam ( $L_{R} = 0.01$ )	100	91	82	45	0.1
BS = 4, optimizer = Ftrl ( $L_{R} = 0.001$ )	100	90	82	46	0.6

Table 8. The comparison of the proposed ML-CNN model with some state-of-the-art DL techniques.

Model	ACC	SEN	PREC	DSC	Loss	AUC
MobileNetV2	100%	61%	35%	100%	4.9	0.91
DenseNet201	35%	98%	5%	40%	22.5	0.55
SeResNext50	40%	100%	5%	42%	15	0.57
Xception	40%	59%	79%	40%	4.5	0.57
InceptionV3	100%	100%	5%	100%	0.07	1.00
InceptionresNetv2	38%	98%	5%	100%	4.6	0.56
ML-CNN (split)	100%	55%	82%	100%	0.02	0.94
ML-CNN (10-fold)	94%	80%	92%	99%	0.05	0.97

Table 9. The average validation values of ACC, SEN, PREC, DSC, and AUC by using k-fold CV.

k-Fold	ACC	SEN	PREC	DSC	Loss	AUC
2-fold	96%	49%	72%	96%	0.08	0.94
5-fold	94.6%	58.8%	78.8%	100%	0.06	0.96
10-fold	94.3%	80%	91.5%	99%	0.05	0.97

Table 10. AUC of all 45 ODs by using 2-, 5-, and 10-fold CVs.

	2-Fold CV		5-Fold CV					10-Fold CV
ODs	F1	F2	F1	F2	F3	F4	F5	F1	F2	F3	F4	F5	F6	F7	F8	F9	F10
DR	0.77	0.77	0.76	0.79	0.82	0.86	0.92	0.74	0.81	0.90	0.98	0.73	0.82	0.93	0.96	0.99	0.69
ARMD	0.82	0.82	0.80	0.82	0.87	0.89	0.94	0.83	0.79	0.91	0.97	0.83	0.89	0.96	0.98	0.99	0.70
MH	0.71	0.74	0.66	0.68	0.66	0.75	0.92	0.76	0.74	0.98	0.96	0.48	0.80	1.00	1.00	1.00	0.62
DN	0.79	0.75	0.83	0.79	0.90	0.96	0.95	0.86	0.81	0.98	0.98	0.91	0.91	0.98	0.99	1.00	0.94
MYA	0.83	0.82	0.88	0.80	0.98	0.89	0.99	0.84	0.94	0.97	1.00	0.70	0.90	1.00	1.00	1.00	0.64
BRVO	0.61	0.60	0.71	0.70	0.83	0.84	0.97	0.83	0.60	0.96	0.98	0.82	0.91	0.97	1.00	1.00	0.26
TSLN	0.70	0.74	0.77	0.72	0.76	0.89	0.94	0.75	0.69	0.90	0.96	0.71	0.82	0.99	1.00	1.00	0.79
ERM	0.80	0.50	Nan	0.84	0.94	1.00	1.00	Nan	Nan	1.00	Nan	Nan	Nan	1.00	Nan	1.00	Nan
LS	0.72	0.74	0.48	0.75	0.82	0.91	0.96	0.60	0.75	0.94	1.00	0.72	0.94	1.00	1.00	1.00	0.51
MS	0.85	0.78	0.52	0.84	Nan	Nan	1.00	0.36	Nan	1.00	Nan	0.18	Nan	1.00	Nan	Nan	0.26
CSR	0.86	0.85	0.87	0.83	0.85	0.93	0.98	0.83	0.97	1.00	1.00	0.63	1.00	1.00	1.00	1.00	0.70
ODC	0.57	0.64	0.56	0.56	0.76	0.81	0.88	0.61	0.82	0.85	0.94	0.62	0.76	0.97	0.99	1.00	0.60
CRVO	0.77	0.59	0.70	0.69	0.62	0.85	0.97	0.83	0.72	0.97	Nan	0.50	0.94	1.00	Nan	1.00	0.50
TV	Nan	0.62	Nan	Nan	0.96	0.93	1.00	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
AH	0.71	0.76	0.90	0.55	0.84	0.90	1.00	0.61	0.94	0.82	1.00	0.70	1.00	1.00	1.00	Nan	0.84
ODP	0.57	0.58	0.61	0.87	0.73	0.83	0.96	0.94	0.59	0.82	0.94	0.62	0.76	1.00	1.00	100	0.65
ODE	0.72	0.72	0.71	0.67	0.87	0.92	0.98	0.68	0.73	0.85	1.00	0.95	0.75	1.00	1.00	1.00	0.52
ST	0.50	0.59	0.62	Nan	0.72	1.00	1.00	0.83	0.86	Nan	Nan	0.50	0.98	Nan	Nan	1.00	0.51
AION	0.57	0.63	0.38	0.70	0.88	0.97	1.00	0.72	0.80	1.00	0.99	0.84	0.99	1.00	1.00	1.00	0.74
PT	0.81	0.94	Nan	0.84	0.87	0.91	1.00	Nan	Nan	1.00	1.00	Nan	Nan	1.00	1.00	1.00	Nan
RT	0.83	0.84	Nan	0.98	0.93	1.00	1.00	Nan	Nan	Nan	1.00	Nan	Nan	Nan	1.00	1.00	Nan
RS	0.89	0.89	0.87	0.93	0.92	0.87	0.97	0.68	0.90	0.82	0.90	0.66	0.98	1.00	1.00	1.00	0.50
CRS	0.81	0.74	0.63	0.86	0.81	0.90	1.00	0.85	0.59	0.97	1.00	0.51	0.68	1.00	1.00	1.00	0.63
EDN	0.55	0.51	0.50	0.79	0.70	0.92	0.99	Nan	0.79	Nan	1.00	Nan	1.00	Nan	1.00	1.00	Nan
RPEC	0.61	0.61	0.55	0.70	0.91	0.93	0.84	0.71	0.50	0.97	1.00	0.17	0.97	1.00	1.00	1.00	0.20
MHL	0.64	0.58	Nan	0.80	0.87	1.00	Nan	Nan	Nan	1.00	1.00	Nan	Nan	1.00	1.00	1.00	Nan
RP	0.87	0.74	0.91	Nan	0.98	0.98	1.00	Nan	0.94	Nan	Nan	Nan	1.00	Nan	Nan	1.00	Nan
CWS	Nan	0.89	Nan	Nan	Nan	1.00	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
CB	0.73	Nan	0.28	Nan	Nan	Nan	Nan	Nan	0.71	Nan	Nan	Nan	1.00	Nan	Nan	Nan	Nan
ODPM	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
PRH	0.48	Nan	0.41	Nan	Nan	Nan	Nan	0.25	Nan	Nan	Nan	0.56	Nan	Nan	Nan	Nan	0.30
MNF	0.70	0.88	Nan	Nan	0.99	Nan	1.00	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	1.00	Nan
HR	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
CRAO	0.36	Nan	Nan	0.94	Nan	Nan	Nan	Nan	Nan	Nan	1.00	Nan	Nan	Nan	1.00	Nan	Nan
TD	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
CME	0.88	0.87	0.88	Nan	Nan	0.96	1.00	Nan	0.92	Nan	Nan	Nan	1.00	Nan	Nan	Nan	Nan
PTCR	0.45	0.71	0.91	Nan	0.71	1.00	1.00	Nan	0.89	Nan	Nan	Nan	1.00	Nan	Nan	1.00	Nan
CF	1.00	0.97	0.94	Nan	0.98	Nan	Nan	99	Nan	Nan	Nan	0.55	Nan	Nan	Nan	Nan	0.93
VH	Nan	0.29	Nan	Nan	1.00	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
MCA	0.37	Nan	Nan	0.91	Nan	Nan	Nan	Nan	Nan	Nan	1.00	Nan	Nan	Nan	1.00	Nan	Nan
VS	Nan	0.68	Nan	Nan	0.97	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
BRAO	0.70	0.40	0.47	Nan	Nan	Nan	1.00	0.55	Nan	Nan	Nan	1.00	Nan	Nan	Nan	Nan	0.90
PLQ	Nan	0.99	Nan	Nan	1.00	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan	Nan
HPED	0.45	Nan	0.49	Nan	Nan	Nan	Nan	Nan	0.32	Nan	Nan	Nan	1.00	Nan	Nan	Nan	Nan
CL	0.42	Nan	0.80	Nan	Nan	Nan	Nan	Nan	0.64	Nan	Nan	Nan	0.99	Nan	Nan	Nan	Nan

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ouda, O.; AbdelMaksoud, E.; Abd El-Aziz, A.A.; Elmogy, M. Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification. Electronics 2022, 11, 1966. https://doi.org/10.3390/electronics11131966

AMA Style

Ouda O, AbdelMaksoud E, Abd El-Aziz AA, Elmogy M. Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification. Electronics. 2022; 11(13):1966. https://doi.org/10.3390/electronics11131966

Chicago/Turabian Style

Ouda, Osama, Eman AbdelMaksoud, A. A. Abd El-Aziz, and Mohammed Elmogy. 2022. "Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification" Electronics 11, no. 13: 1966. https://doi.org/10.3390/electronics11131966

APA Style

Ouda, O., AbdelMaksoud, E., Abd El-Aziz, A. A., & Elmogy, M. (2022). Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification. Electronics, 11(13), 1966. https://doi.org/10.3390/electronics11131966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification

Abstract

1. Introduction

2. Related Work

3. The Proposed ML CAD System

3.1. Preprocessing

3.2. Feature Extraction and ML-C

3.3. The Prediction

4. Experimental Results

4.1. The Performance Measures

4.2. The Results

4.2.1. Dataset Splitting (90% Training:10% Testing)

4.2.2. K-Fold Cross-Validation

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI