From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks

Zaylaa, Amira J.; Kourtian, Sylva

doi:10.3390/app15052684

Open AccessArticle

From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks

by

Amira J. Zaylaa

^1,*

and

Sylva Kourtian

²

¹

Program of Biomedical Engineering, Department of Electrical and Computer Engineering, Faculty of Engineering, Beirut Arab University, Debbieh P.O. Box 11-5020, Lebanon

²

Centre de Recherche du Centre Hospitalier, L’Université de Montréal, Montréal, QC H2X 0A9, Canada

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(5), 2684; https://doi.org/10.3390/app15052684

Submission received: 9 October 2024 / Revised: 14 February 2025 / Accepted: 26 February 2025 / Published: 3 March 2025

(This article belongs to the Special Issue Diagnosis and Therapy for Retinal Diseases)

Download

Browse Figures

Versions Notes

Abstract

The detection of diabetic retinopathy (DR) is challenging, as the current diagnostic methods rely heavily on the expertise of specialists and require the mass screening of diabetic patients. The prevalence of avoidable vision impairment due to DR necessitates the exploration of alternative diagnostic techniques. Specifically, it is necessary to develop reliable automatic methods to enable the early diagnosis and detection of DR from optical images. To address the lack of such methods, this research focused on employing various pre-trained deep neural networks (DNNs) and statistical metrics to provide an automatic framework for detecting DR in optical images. The receiver operating characteristic (ROC) was employed to examine the performance of each network. Ethically obtained real datasets were utilized to validate and enhance the robustness of the proposed detection framework. The experimental results showed that, in terms of the overall performance in DR detection, ResNet-50 was the best, followed by GoogleNet, with 99.44% sensitivity, while they were similar in terms of accuracy (93.56%). ResNet-50 outperformed GoogleNet in terms of the specificity (89.74%) and precision (90.07%) of DR detection. The ROC curves of both ResNet-50 and GoogleNet yielded optimal results, followed by SqueezeNet. MobileNet-v2 showed the weakest performance in terms of the ROC, while all networks showed negligible errors in diagnosis and detection. These results show that the automatic detection and diagnosis framework for DR is a promising tool enabling doctors to diagnose DR early and save time. As future directions, it is necessary to develop a grading algorithm and to explore other strategies to further improve the automatic detection and diagnosis of DR and integrate it into digital slit lamp machines.

Keywords:

diabetic retinopathy; fundus images; optical medical imaging; digital slit lamp machine; neural networks; deep learning; early detection and diagnosis

1. Introduction

Diabetic retinopathy (DR) is the most common diabetic eye disease and is the leading cause of blindness in those with diabetes. DR is caused by changes in the blood vessels of the retina and may worsen over time [1,2]. According to research performed by Varma et al., the number of Americans with DR will nearly double from 7.7 million in 2010 to 14.6 million by 2050 [3]. This statistic is demonstrated in the bar graph in Figure 1. Additionally, as highlighted by Spanakis and Golden, in the United States (US), minority populations are generally more prone to the development of DR compared to the white population [4], as illustrated in Table 1.

Globally, in 2010, DR was responsible for blindness among 0.8 million individuals and for visual impairments in 3.7 million [5]. However, with the rising prevalence of diabetes, it is projected that the number of individuals affected by DR will approximately reach 191 million by the year 2030 [6,7].

Some individuals with DR exhibit swollen blood vessels in the retina, which leak fluid or blood. Other DR patients exhibit new blood vessels with abnormal growth on the surface of the retina. The main indicators of DR through which a diagnosis can be obtained are shown in Figure 2. These can be used to indicate the severity of DR and provide clinicians with information about the necessary course of treatment.

DR can be clinically diagnosed based on the presence of one or more retinal lesions, including micro-aneurysms, hemorrhages, hard exudates, and soft exudates [1,2]. Some studies have examined patients with different types of DR and considered the early diagnosis and detection of this disease [8,9]. Another study focused on patients who received dexamethasone intravitreal implants [10]. For instance, Oliverio et al. used clinical and optical coherence tomography (OCT) biomarkers as prognostic factors when examining dexamethasone intravitreal implants for diabetic macular edema [10]. They relied on a database comprising patients who received dexamethasone implants, extracted several features, and explored the links between the biomarkers and the treated eyes through parametric and statistical significance analyses, i.e., using p-values [10].

Meanwhile, retinal fundus images have been used to examine patients with different types of DR (who have not undergone treatment) and diagnose retinal diseases early on. Some studies have relied on optical imaging, while others have relied on color Doppler imaging [9]. Oliverio et al. used features extracted from optical coherence tomography angiography (OCTA) in patients with diabetes mellitus types 1 and 2 [8]. Their study was a cross-sectional observational investigation, and they examined the statistical significance based on p-values. Meanwhile, Ratanapakorn et al. used image processing algorithms to detect and identify DR in fundus images. Specifically, they developed an algorithm using the MATLAB Image Processing Toolbox to extract the clinically significant features of DR pathologies and determine the severity [11]. The DR detection results were precise in comparison with the diagnosis of an ophthalmologist. The error rate was very small, which confirmed the accuracy of the algorithm in detecting DR. Moreover, the software had high accuracy in determining the severity of DR. However, Ratanapakorn et al. stated that the accuracy of the software could be improved by including additional image processing techniques or other methods based on artificial intelligence (AI) or deep learning (DL), although they did not explore this further [11].

Another study carried out by Gharaibeh et al. provided an effective image processing method for DR detection from retinal fundus images [12]. They presented a system that could convert red, green, and blue (RGB) images into hue, saturation, and intensity (HSI) images for processing and feature extraction. The main goal of feature extraction was to identify micro-aneurysms, exudates, and hemorrhages and then perform classification based on them. Gharaibeh et al. used machine learning (ML) algorithms such as a support vector machine (SVM) [13], a probabilistic neural network (PNN), and a support vector machine optimized with a genetic algorithm (SVMGA) for classification [12]. In order to evaluate the results, they compared the sensitivity, specificity, and accuracy of the three methods used. This showed that SVMGA was equal to or better than SVM and PNN. The first step that they applied was fundus image normalization to improve the quality of the images. The second step involved feature extraction from general fundus images; these were divided into the following three features: optic disks, fovea, and vessel structures. In the third step, algorithms were developed to cover DR pathologies corresponding to the severity classification. The proposed algorithms were separated according to the structure of each pathology in the fundus image. The last step was the determination of the severity of the DR detected [12].

Moreover, the studies carried out by Liu et al. and Mathews et al. examined the discrimination of DR using ML methods in optical coherence tomography angiography (OCTA) images [14] and optical coherence tomography (OCT) images [15], respectively. Liu et al. considered the evaluation of OCTA to discriminate between DR and healthy controls (HC) using 144 images [14]. Four ML models, namely logistic regression (LR) [14,16], logistic regression regularized with an elastic net (LR-EN) penalty, SVM, and the gradient boosting tree (XGBoost), were used to classify wavelet features between groups. OCTA data consisting of the superficial vascular plexus, deep vascular plexus, and retinal vascular network were acquired from 19 DR (38 eyes) patients and 25 HC (44 eyes). A discrete wavelet transform was applied to extract texture features from each image. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve, sensitivity, specificity, and diagnostic accuracy of the classifiers were obtained. The results showed that the LR-EN algorithm had high sensitivity and specificity in identifying DR, indicating that it may be a promising method in facilitating the early diagnosis of DR. However, the used database was insufficient, as it only comprised 144 images.

Other large-scale and multicenter studies have been carried out to assess the applicability of the LR-EN algorithm in DR and related eye diseases, seeking to improve the vision-related outcomes of patients [17,18,19,20]. Although previous studies have employed DL algorithms, some have focused on OCTA, others on fundus images, and some have focused specifically on detecting non-proliferative DR.

Moreover, in 2023, Fu et al. proposed a novel architecture combining ResNet-50 with a channel attention squeeze-and-excitation network (SENet), designed to extract features, and they introduced a disease attention module to supplement disease-specific information about diabetic macular edema (DME) [21]. DME is a specific complication of DR in which fluid accumulates in the macula, the central part of the retina that is responsible for sharp, central vision. However, DR is wide-ranging and is considered a general condition caused by damage to the blood vessels in the retina due to prolonged high blood sugar levels in diabetes. Thus, any study that explores DR will include DME, but the opposite is not true; i.e., studying only DME limits the cases that can be obtained by imaging, and the findings would be specific to DME rather than generalizable across all cases of DR.

In addition, Abdelsalam et al. presented a novel approach to early DR detection based on multifractal geometry analysis and SVM, focusing on OCTA macular images [22,23]. Their model involved using a supervised ML method, such as the SVM algorithm, on 170 eye images from patients with early-stage DR, which were divided into 90 healthy eyes and 80 eyes. These were obtained from the Ophthalmology Center at Mansoura University, Egypt. The training procedure involved using seven extracted features as the training dataset. The extracted features were specific to the two classified stages, indicating that they were a good choice for the classification process. These features were the

α

at the maximum of the singularity spectrum (

α

), the shift in the singularity spectrum’s symmetrical axis, the width of the singularity spectrum (W), the lacunarity, the dimension of box counting (DB), the dimension of information (DI), and the dimension of correlation (DC). These generalized features reflect the self-similarity, morphological characteristics, and pixel correlations. Furthermore, the objective was to classify the input data images into one of two classes, namely normal and non-proliferative diabetic retinopathy (NPDR). Classification was performed using an SVM classifier with a radial basis function (RBF) kernel. The authors showed that their proposed technique achieved the best performance in the early detection of DR compared to the others [22,23]. These results could be enhanced by increasing the training dataset [23,24,25,26,27].

Furthermore, Zaylaa et al. adopted AI to diagnose DR automatically from optical coherence tomography angiography (OCTA) images [28]. Their research focused on providing an adequate ML and DL technique to differentiate between normal images and images of patients with DR, using OCTA images obtained from 90 patients [28]. They used data collected prospectively over the course of a year from a comprehensive medical center in Lebanon. The main algorithms were the mixed convolutional neural network (CNN, SVM) algorithm, the feed-forward backpropagation NN, SVM with a linear kernel, and SVM with a polynomial kernel. The results showed that the proposed combination (CNN, SVM) offered the best detection as compared to the regular SVM, the polynomial-based SVM, and the NN [28]. However, the authors discussed the importance of employing more algorithms—specifically, pure DL algorithms—and comparing the results to those extracted from fundus images to enable the early detection and diagnosis of DR [28].

Detecting DR at an early stage is crucial for identifying the disease before its progression to an advanced state, characterized by problems such as vision loss, as early intervention allows for the implementation of suitable treatment methods. In this context, patients with diabetes are required to undergo annual or biannual retina monitoring [2,29]. Additionally, as stated by the National Eye Institute, up to 95% of instances of vision loss can be avoided through the early detection and treatment of DR [29]. The main goal that motivated the present research was the early detection of DR automatically from fundus images. In addition, any method for DR detection must provide adequate results to be considered efficient, i.e., the algorithm must have high accuracy and a low rate of false detection.

Based on the above, our research aimed to develop a framework that could detect DR automatically based on DL and based on the evaluation of a retrospective, controlled clinical database. This could improve the quality of life of patients with DR and improve the provision of healthcare in this context. This could be achieved by training pre-existing NNs, employing DL algorithms, and evaluating the results quantitatively through sensitivity, specificity, accuracy, and precision metrics, as well as calculating the cross-entropy and loss or error in DR diagnosis. Qualitatively, ROC curves could be used to assess the performance of different algorithms. Although our aim was partly to explore existing networks, several classical and combined networks were presented in an earlier research work [28], and this was extended in the current work based on a thorough review of the topic. One of the novelties is that this work explores “new processes” and/or “new levels” of diagnosis for patients with DR, thus enhancing the related healthcare services. Another novelty is the discussion of augmented and autonomous clinical decision making (augmented diagnosis), in addition to exploring the power of DL algorithms in the detection of this particular disease (DR) for preventative purposes (to avoid blindness). Moreover, we shed light on the importance of the types of images used, and we target automatic and optimal outputs and minimal errors. Furthermore, at the AI level, we explore the currently used DL algorithms and the optimization of their parameters, as well as calculating the error in DR detection. Meanwhile, at the machine level, we provide a framework that could be integrated with the software of the digital slit lamp in the lab to automate the diagnostic procedure, thus providing an AI-integrated digital slit lamp that could enhance the healthcare system.

This work is divided into five main sections. After introducing the research context and recent studies related to DR detection and diagnosis, Section 2 describes the experimental materials and methods used, including the collection of the experimental data, a priori information, features, classifiers, and the evaluation method and metrics used. Section 3 presents the results, which are divided into quantitative and qualitative results. Then, Section 4 discusses these results. Finally, Section 5 provides the conclusions and recommends future research directions that can be considered to promote the automatic diagnosis of DR.

2. Novel Diabetic Retinopathy (DR) Detection Framework and Experiment

The presented experiment focusing on detecting DR in optical fundus images involved the use of a fundus camera and a digital slit lamp, as well as software utilized for detection and evaluation. The framework was based on the block diagram shown in Figure 3. This diagram illustrates the steps followed and some of the materials used; the black box indicates the novel framework and the experimental steps. Moreover, the materials used in both the framework and in the experiment are described in detail in the following subsection.

2.1. Experimental Setup

The fundus images were collected through the digital slit lamp machine present at the Aravind Eye Hospital in India. The materials also included MATLAB software R2022a, used for the processing of the data and for DR detection. This was available at the Biomedical Engineering Laboratory at Beirut Arab University. Furthermore, some of the data used, i.e., fundus images, were derived from the publicly available Asia Pacific Tele-Ophthalmology Society (APTOS 2019) dataset [30]. The images were obtained using a specialized fundus camera consisting of a microscope attached to a flash-enabled camera through a digital slit lamp machine [30].

A photograph of the digital slit lamp machine used in the examination of a candidate at the Biomedical Imaging Lab at the Faculty of Engineering, Beirut Arab University, is shown in Figure 4a,b. Note that Figure 4a illustrates the preparation and initialization of the setup, where the doctor sits on the right and the candidate on the left, facing both the machine and the doctor. Meanwhile, the examination process is shown in Figure 4b.

2.2. Methodology

In order to conduct the exploratory study, the above-mentioned publicly available dataset, consisting of fundus images, was used to detect DR using eight pre-trained deep neural networks (DNNs), as shown in the block diagram in Figure 3. Starting with screening and data collection, the optical images were pre-processed; then, early detection algorithms were applied to the processed data. After training the networks, the results of each network were obtained, and a detailed evaluation was performed to compare them and determine the optimal network for DR detection and diagnosis.

2.2.1. Data Description

A large number of data, namely optical images, were obtained from the Kaggle Blindness Challenge dataset APTOS 2019, (https://www.kaggle.com/competitions/aptos2019-blindness-detection/overview), accessed on 25 February 2025, which contained separate cases for testing and training. The training dataset contained a total of 3662 images, divided into different classes (0—Normal; 1—Mild; 2—Moderate; 3—Severe; and 4—Proliferative) by expert clinicians. The distribution of the images belonging to each class is reported in Table 2.

A substantial number of images was captured at the Aravind Eye Hospital in India [31]. Numerous retinal images were captured using fundus photography under a variety of imaging conditions. Based on prior knowledge, expert clinicians assessed the severity of the DR in each image on a scale of 0 to 4. The fundus images of the eyes served as the default images. Fundus photography is the standard and most commonly used imaging technique used to capture detailed views of the interior surface of the eye, such as the retina, optic disc, macula, and blood vessels. This is due to several factors, such as (i) their diagnostic importance in detecting and monitoring conditions such as DR, glaucoma, macular degeneration, and hypertensive retinopathy; (ii) their common use in ophthalmology for routine eye scans and telemedicine, as well as in research; and (iii) their high resolution due to the use of a non-invasive method. Moreover, fundus images form the baseline for AI and research, as many models and medical databases use fundus images as primary references for the detection of eye diseases.

In the current research, the previously verified images were considered as a priori information, because they were verified, diagnosed, and rated according to standard screening and diagnostic criteria. Specifically, the images were obtained via the digital slit lamp in the lab and had been clinically verified before processing. Moreover, the two groups of patients were verified in the lab by labeling their conditions as normal or abnormal (including severe conditions) to establish the ground truth.

The ground truth is the most accurate and reliable label for a dataset. It is applied for the training, validation, and testing of frameworks. It ensures that the model learns the correct patterns and generates accurate predictions. The sources for the ground truth in DR diagnosis are the annotations of expert ophthalmologists, who manually grade the images according to standardized DR classification systems (e.g., the International Clinical DR (ICDR) scale). Specifically, multiple experts annotate the images, and consensus or majority voting is used to reduce the risk of bias and inter-observer variability. Other ground truth sources could be publicly available labeled datasets, such as APTOS, which provides labeled DR images with ground truth annotations that are verified by experts. In the current work, both aforementioned sources were used for additional verification.

2.2.2. DR Diagnosis Criteria

DR is diagnosed according to certain criteria through the clinical, comprehensive observation of the eye, and it depends strongly on the experience of the medical practitioner. It can also be performed indirectly based on the imaging of the eye and the analysis of the images based on visual evidence; this is also dependent on the experience of the doctor.

In the first method, a comprehensive dilated eye exam is performed, where drops are placed into the eyes to widen (dilate) the pupils and provide the doctor with a better view of the interior of the eyes. However, these drops can cause the vision of the patient to blur, which lasts until they wear off several hours later.

In the second method, i.e., a fundoscopic exam, DR is characterized based on cotton wool spots, flame hemorrhages, dot-blot hemorrhages, and boat hemorrhages. As both methods are dependent on the experience of the doctor, and as the first method is uncomfortable for patients, it is essential to consider a new type of diagnosis—an automatic framework based on optical imaging. In this study, in order to utilize such optical images, pre-processing methods were adopted.

2.2.3. Optical Image Pre-Processing

The reliable detection of DR strongly depends on the quality of the fundus images obtained during screening. As with any real-world dataset, noise is encountered in both the images and the labels. The images may contain artifacts or be out of focus, underexposed, or overexposed. Among the pre-processing techniques applied for image enhancement, some have used signal processing techniques [32], either based on wavelets, such as the simple and quasi-optimal wavelet denoising approach [33], or on the concept of compressed sensing, which reduces noise and preserves the image texture [34]. The employed images may be gathered from multiple clinics, using a variety of cameras and over an extended period of time, introducing further variation.

Examples of two types of fundus images are shown in Figure 5a and Figure 6a. The image in Figure 5a corresponds to an image of a normal eye, and the image shown in Figure 6a corresponds to an image of an eye with DR. Moreover, the corresponding image processing steps are shown in Figure 5b–d and Figure 6b–d, for normal and DR eyes, respectively.

The images shown in Figure 5a and Figure 6a were resized as shown in Figure 5b and Figure 6b to ensure their alignment with the input size of the network model employed. The pre-processing stage was applied to eliminate insignificant information and enhance the speed and efficiency of the system. Then, the images were converted to binary-scale images, as shown in Figure 5c and Figure 6c, to emphasize regions of interest and isolate significant areas via thresholding. We focused our equalization efforts on these areas, rather than the entire image, which might have included irrelevant regions.

After the binarization step, each image was converted into grayscale using color space conversion, i.e., by converting the colored images into grayscale images, as shown in Figure 5d and Figure 6d. This was beneficial for reducing the noise and enhancing image quality. Then, local brightness equalization methods, including adaptive histogram equalization and dynamic histogram equalization, were implemented to compensate for non-uniform illumination. Afterwards, the pre-processed and clean images were fed into the DL algorithms. The goal was to detect DR using different pre-trained DNNs and to explore the performance of DR diagnosis using several algorithms.

2.2.4. Early DR Detection Algorithms

The architecture used for DR detection was based on a transfer learning-based approach using known DL networks. This has been shown to be effective for detection in medical imaging applications. Eight pre-trained DNNs were used. The DNNs were kept in their original states, with the exception of the last layer, which was replaced with a fully connected layer with two units for DR detection. This was followed by a SoftMax function as the activation function, as well as a classification layer.

DR detection was performed by means of pre-trained DNNs and convolutional neural networks (CNNs) such as AlexNet, ResNet, Inception-v3, and GoogleNet. The performance of each of these networks was studied and compared by means of the sensitivity, specificity, and accuracy, and precision measures, using the same testing data. For the detection, all images of the retinas that were marked as mild DR, moderate DR, severe DR, and proliferative DR were combined into a single ‘Yes’ category, and the retinal images without DR comprised the ‘No’ category.

Of the available dataset, with N = 3662 (100%), 80% of the data were used for training and validation, and the other 20% were used for testing. This is illustrated in Table 3, showing the distribution of the training, testing, and validation datasets for DR detection; the total number of images without DR was 1805, and the total number of images with DR was 1857. Hence, the normal images constituted 49.3% and the DR images constituted 50.7%. These two classes/sets were considered approximately uniform, with a tolerance range of 1–2%.

As described earlier, different pre-trained CNNs were used for DR detection, and their details, including the network name, depth, size, parameters, and image input size, are shown in Table 4. The selected pre-trained networks were trained on ImageNet, and their main properties are listed in Table 4. The network depth was defined as the largest number of sequential convolutional or fully connected layers on a path from the input layer to the output layer. The networks were Alexnet, Googlenet, DarkNet-53, EfficientNet-b0, SqueezeNet, Inception-v3, MobileNet-v2, and ResNet-50.

Deep residual networks (ResNets), such as the popular ResNet-50 model, are CNNs with a depth of 50 layers. ResNet is an artificial neural network (ANN) that stacks residual blocks on top of each other to form a network. The inputs to the frameworks were the RGB images.

In addition to the DNNs’ properties, the models’ learning rates, hyperparameters, and hardware requirements are reported in Table 5. The learning rate ranged between 0.001 and 0.01, and the decay strategy was the same for all networks. Concerning the hyperparameters, the learning rate for Alexnet, GoogleNet, ResNet-50, EfficientNet-b0, and SqueezeNet was 0.01; that of DarkNet-53 was 0.001; and that of Inception-v3 and MobileNet-v2 was 0.045. Regarding the decay, step decay was applied for all networks, as reported in Table 5.

Regarding the CPU and GPU requirements, AlexNet required a moderate CPU/moderate GPU, GoogleNet required a high CPU/moderate GPU, ResNet-50 required a high CPU/high GPU, DarkNet-53 required a high CPU/high GPU, EfficientNet-b0 required a moderate CPU/moderate GPU, SqueezeNet required a low CPU/low GPU, Inception-v3 required a high CPU/high GPU, and MobileNet-v2 required a low CPU/moderate GPU, as shown in Table 5.

Finally, the Adam optimization technique was used for all networks, with a maximum number of epochs of six and with 274 iterations per epoch, based on empirical inference.

2.2.5. Adam Optimizer

Adam optimization is a stochastic gradient descent method that is based on the adaptive estimation of first-order and second-order moments. The Adam optimizer was used during the fine-tuning step, adopting smaller learning rates. Adjustments can be made, such as increasing/decreasing the learning rate if the model converges too slowly or if the model oscillates or diverges, respectively. Some key benefits of Adam optimization include the following: faster convergence, compared to stochastic gradient descent (SGD); and ease of implementation due to only requiring first-order gradients.

The Adam optimizer is used in DL to update the network weights iteratively based on the training data. It maintains an individual learning rate that improves the performance in problems with sparse gradients, as well as enabling adaptation based on the average of the magnitudes of the gradients for the weight. Adam achieves good results, and it exhibits high speeds in computer vision and natural language processing [35]. Following the use of the optimizer, it was essential to evaluate the results obtained from the experiment and use the new framework.

2.2.6. Evaluation Method

After training the data, a confusion matrix was plotted for each network in order to study the performance. This was achieved by calculating several metrics [36]. The accuracy was determined via Equation (1) as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} .

(1)

The sensitivity was determined via Equation (2) as follows:

S e n s i t i v i t y = \frac{T P}{T P + F N} .

(2)

The specificity was determined via Equation (3) as follows:

S p e c i f i c i t y = \frac{T N}{T N + F P} .

(3)

Finally, the precision was determined via Equation (4) as follows:

P r e c i s i o n = \frac{T P}{T P + F P} .

(4)

Here, TP, FP, TN, and FN denote true positives, false positives, true negatives, and false negatives, respectively, [37]. The performance of the novel framework and its capability in detecting DR was also studied qualitatively through the ROC curve [37]. This demonstrates the diagnostic ability of a classifier and the trade-off between the sensitivity and specificity.

A loss function offers a formalized, differentiable demonstration of the errors in training and is used to improve a model’s performance. It was used to evaluate the DR diagnosis procedure using DNNs. The loss function used here was the cross-entropy loss, reflecting the difference between the predicted probability distribution (output) and the true class labels. The formula for the cross-entropy loss is as follows:

L o s s = - \sum [y_{i} \times l o g ({\hat{y}}_{i})] .

(5)

where

y_{i}

is the true label for class i (1 for the correct class and 0 otherwise),

{\hat{y}}_{i}

is the predicted probability for class i (output from the softmax layer), and ∑ indicates the summation over all classes.

After applying the aforementioned framework, the results were obtained; these are provided in the following section.

3. Experimental Results

The results obtained by applying the novel framework to the experimental optical images referred to a total of eight DNNs pre-trained on MATLAB v2021. The quantitative results regarding the performance of the eight pre-trained DNN networks are shown in Table 6.

The quantitative results regarding DR detection are shown in Table 6. GoogleNet exhibited a remarkable sensitivity of 99.44% for DR detection. Although the sensitivity of GoogleNet surpassed that of the other seven networks—AlexNet, ResNet-50, DarkNet-53, EfficientNet-60, SqueezeNet, Inception-v3, and MobileNet-v2—ResNet-50 surpassed all other networks in DR detection, as the specificity and precision were 1.38% and 1.74% higher, respectively, than that obtained by GoogleNet. Meanwhile, their detection accuracies were the same.

On the other hand, among the considered networks, MobileNet-v2 exhibited the lowest sensitivity and AlexNet showed the lowest specificity, accuracy, and precision. It is noteworthy that the overall performance of the eight networks in DR detection was acceptable in terms of all measured aspects, i.e., all measurements were above 87%, as shown in Table 6.

Additional qualitative results regarding DR detection were obtained in the form of the ROC curves for GoogleNet, AlexNet, ResNet-50 DarkNet-53, EfficientNet-60, SqueezeNet, Inception-v3, and MobileNet-v2, as shown in Figure 7 and Figure 8.

The ROC curves of AlexNet, GoogleNet, ResNet-50, and DarkNet-53 are shown in Figure 7a, Figure 7b, Figure 7c and Figure 7d, respectively. Moreover, the ROC curves of EfficientNet-60, SqueezeNet, Inception-v3, and MobileNet-v2 are shown in Figure 8a, Figure 8b, Figure 8c and Figure 8d, respectively.

Among the tested networks, the sensitivity in DR detection was the highest for GoogleNet and ResNet-50; therefore, the ROC curves of both GoogleNet and ResNet-50 were the best compared to the other networks. Moreover, SqueezeNet showed the second-best performance in DR detection. However, among the explored networks, MobileNet-v2 showed the weakest performance in terms of the ROC. Finally, the general behavior of the ROC curves was acceptable regarding DR detection.

In addition to the ROC curves obtained to demonstrate the performance of the DL frameworks in DR detection, graphs showing the accuracy versus the number of epochs, as well as the error rate versus the epochs during training (smoothed) and validation, were obtained. These are shown in Figure 9 and Figure 10, respectively.

The accuracy of AlexNet, GoogleNet, ResNet-50, and DarkNet-53 is shown in Figure 9a–d. Their values were high and fluctuated within an approximate range of 92% to 94%. The values for these four networks converged after epoch 3; however, more fluctuations were observed for ResNet after epoch 3 compared to AlexNet, GoogleNet, and DarkNet.

Furthermore, the accuracy of EfficientNet-60, SqueezeNet, Inception-v3, and MobileNet-v2, as shown in Figure 10a–d, was high and fluctuated within an approximate range of 92% to 93.5%. These values converged after epoch 2 for the networks EfficientNet-60, SqueezeNet, and MobileNet-v2; however, for Inception-v3, they converged after epoch 3. Moreover, more fluctuations were observed for EfficientNet-60, Inception-v3, and MobileNet-v2 between the training and validation data as compared to SqueezeNet.

The error rates in DR detection using AlexNet, GoogleNet, ResNet-50, and DarkNet-53 were low, as shown in Figure 9a–d, (ranging between 0 and 0.5%) and fluctuating at around 0%. These values converged after 250 iterations for these four networks.

Similarly, regarding the errors in DR detection, as shown in Figure 10a–d, the rate was less than or equal to 0.5%. These values converged after 250 iterations for these four networks.

4. Discussion

In this study, we examined and compared the performance of eight pre-trained DNNs in DR detection. The best networks in this context were GoogleNet and ResNet-50, as they both achieved promising results across the selected criteria. Both the qualitative and quantitative results obtained in our research were in agreement, and all networks showed performance scores above 87% in detecting DR, with the highest being 100%.

Although the studies of Ratanapakorn et al. and Gharaibeh et al. dealt with the classification of retinal images, they used different databases, namely the KKU Eye Center database [11] and the DIARETDB1 database [12]. Our research focused on the APTOS 2019 dataset, which presents challenges due to the limited number of images included. Ratanapakorn et al. [11] and Gharaibeh et al. [12] focused on the detection and classification of DR with a single algorithm, which was beneficial in some aspects but also resulted in lower overall accuracy, which was not the case for the algorithms used in our framework—specifically, for GoogleNet and ResNet-50. Moreover, according to Gharaibeh et al., the sensitivity of the PNN, SVM, and SVMG models was 90, 98, and 99%, respectively, [12]. However, in our study, the sensitivity of GoogleNet was almost 100%, and that of the other networks was above 97%. This indicates the significant contributions of our framework, which mainly relies on DNNs for the automatic detection of DR.

Furthermore, a thorough review of the previous research and the statistical results achieved was employed to enable the comparison with our findings. The proposed algorithm was compared to 11 methods reported in the literature, as shown in Table 7. These 11 methods were (CNN, SVM) from 2021 [28], the feedforward backpropagation NN [28], the SVM from 2021 [28], the SVM modified in 2021 for OCTA images [28], (CNN, SVM) for OCT images [36], the SVM from 2018 [12], SVMGA from 2018 [12], a PNN for fundus images [12], K-Nearest Neighbors (KNN) for OCT images, random forest for OCT images [36], and a random tree method from 2020 [36].

According to the results shown in Table 7, state-of-the-art algorithms that have been presented in the literature showed superior results compared to our framework using the eight different DL networks. Although the SVM, SVMGA, and PNN models used by Ghazal et al. showed robust results, indicated in bold, they could not be used for a rigorous comparison, as they only made use of 64 DR images. However, in our framework, we utilized a large database of 3662 images, with 1857 DR images. An additional advantage of this framework is that, regardless of the type of input image, the algorithm can identify it as a healthy or DR case automatically, with high percentages of sensitivity, accuracy, and specificity compared to the techniques available in the literature.

Some evaluation measures were lacking, denoted by ’-’ in Table 7, due to the lack of data in [13,28]. It is noteworthy that other recent studies, such as one published in 2022, employed UNETs, which are advanced algorithms applied to biomedical images. However, these UNETs showed a prediction score of 88% for the disease [38], and they were used for the detection of blood diseases rather than eye diseases. Although UNETs exhibit powerful prediction capabilities, our framework demonstrates stronger performance, both quantitatively and qualitatively, as reported in Table 7.

Concerning the accuracy in DR detection relative to the number of epochs, among the employed DL networks, the accuracy ranged from 92% to 94% after epoch 2. This indicates remarkable performance as compared to the values obtained in the literature, such as in the work of Gharaibeh et al. [12]. In addition, our results were based on a larger database of fundus images.

With regard to the error/loss in DR detection, the new framework employing the eight DL networks exhibited a negligible error rate of less than or equal to 0.5%. The eight networks—AlexNet, GoogleNet, ResNet-50, DarkNet-53, EfficientNet-60, SqueezeNet, Inception-v3, and MobileNet-v2—converged after 250 iterations and after epoch 2. The error rates during diagnosis were calculated and reported for the first time for these DNNs; these are shown in Table 7 and were used to link the findings obtained with the new framework to the clinical findings.

An additional comparison was carried out using the methods for the diagnosis of retinal diseases. Traditional DL models, such as ResNet and GoogleNet, are often preferred over newer classification networks, such as Vision Transformer (ViT) [39], Swin Transformer (ST) [40], ConvNeXt, and Vision Mamba.

This comparison is shown in Table 8, and it is based on different criteria—namely, the computational efficiency, training data requirements, feature extraction, generalization to small datasets, interpretability, availability of pre-trained medical models, localization of retinal features, stability and robustness, and adoption in medical AI applications considering CNN models [41] and ViT and ST [39,40]. Alexey Dosovitskiy introduced ViT and discussed its high data dependency, as reported in Table 8. Liu et al. demonstrated the performance of ST but also revealed its computational complexity [40]. On the other hand, the advantages and criteria of the CNNs were provided, for instance, their capability in detecting small abnormalities in retinal images, and their ability to be fine-tuned on medical data.

Moreover, the images were clinically verified in the lab before processing, and the output of the new AI-based process was in accordance with the clinical outcomes—this was confirmed by the calculation of the error rate per epoch. It is noteworthy that most studies in the literature do not provide diagnosis errors. The decision to calculate the error rate in the current work was motivated by the fact that, without such algorithms, doctors would need to perform diagnosis based on either their experience or anatomical or functional images. This type of medical diagnosis is subject to errors. Therefore, we sought to demonstrate the significance of calculating the error rate in automatic diagnosis in comparison to the errors encountered in clinical diagnosis. It is noteworthy that the error rate obtained with the new framework was significantly lower.

This work also sought to illustrate the importance of carrying out such research and the significance of DR detection. It was found that most cases are due to “functional problems”, i.e., related to the flow of blood or the presence of blood in other regions. Hence, it was crucial to use optical images and not only classical images. Meanwhile, given that the prevalence of DR varies according to the region, diet, stress levels, etc., our attempts to cover a larger population are significant. As the APTOS data were collected in India, it is crucial to consider a broader international dataset to explore all possible variables in this field.

In addition to the use of a larger database, the broader implementation of retinal imaging in various settings, such as primary care clinics, remote areas without ophthalmologists, and areas with underserved populations, is likely to enhance DR detection and improve access to care. A standardized, reliable method to analyze retinal images is of great interest, and early AI methods seem to hold promise. These technological solutions reduce the need for trained specialists in the primary care setting and enable more efficient screening. Moreover, the promising results derived in the current work support the fact that AI reduces the burden associated with the manual review of fundus images [42] through the use of technology to automatically screen such images for the presence of pathologies, especially for DR detection.

5. Conclusions and Future Work

This work involved the exploration of different types of DL algorithms in order to automatically detect DR from fundus images. We utilized eight networks and assessed their sensitivity, specificity, accuracy, and precision in detecting DR. The results indicated that DL-based approaches are effective in detecting DR.

The highest sensitivity was obtained using GoogleNet, which exhibited 99.44% sensitivity in DR detection, i.e., almost 100%, while ResNet-50 exhibited the highest specificity and precision at 89.74% and 90.7%, respectively. Moreover, both GoogleNet and ResNet were 93.56% accurate in detecting DR, while the other networks exhibited similar scores.

Although there are numerous classification networks available in the literature, the studied algorithms were selected based on previous research focusing on disease detection from images. The current work focused on exploring the significance of the different types of images and datasets. Significant results were obtained due to the use of a combination of medical, clinical, computational, and statistical evaluations, aiming to contribute to improving healthcare services. Moreover, we determined the optimal network by taking into consideration the demographics and types of images.

We summarize the main outcomes and achievements as follows:

We provided a new process and a new level of diagnosis for patients with DR, thus paving the way for enhanced healthcare services;
We explored the most powerful DL algorithms in the detection of this particular disease and in a specific population for preventative purposes;
At the clinical level, we developed an “augmented diagnosis”;
At the medical level, we highlighted the significance of using specific types of images, mainly fundus images with DNNs and functional images with ML or combined ML, where the DNNs showed significant accuracy;
At the computer level, we developed an autonomous and automatic evaluation;
At the AI level, we combined algorithms presented in earlier research with the currently used DL algorithms, as well as exploring existing DL networks and optimizing their parameters;
At the statistical and evaluation level, we achieved the highest accuracy in diagnosis and calculated the error rate in DR diagnosis for the first time;
We devised a framework that resulted in the minimum error in automatic diagnosis and early detection, with results that were in accordance with the clinical findings.

Lastly, the results obtained through the new framework’s application to medical images show that it is a promising tool for the diagnosis of DR, surpassing those found in the literature. As DR is the leading cause of avoidable vision impairment worldwide, the automatic early detection and diagnosis of DR, enabled by our framework, plays a vital role in preventing blindness or other types of visual impairment. The automatic detection of DR could enable doctors to quickly diagnose the disease, assist specialists in saving time and money, and benefit patients in terms of preventing the onset of blindness.

As a future direction, the development of a grading algorithm to further improve the application of this framework should be considered. Certain networks, such as Vision Transformer, Swin Transformer, ConvNeXt, and Vision Mamba, could also be considered in future research.

Moreover, while the construction of a new network is an important innovation, it must exhibit promising outcomes. Our framework can also be applied to different types of eye images and different population demographics.

Regarding the aspects of DR that were not explored, additional work linking the diagnosis to the factors behind the disease must be carried out. Moreover, additional metrics and scores, such as the AUC and boxplots to demonstrate the standard deviation, could be considered for the evaluation of DR diagnoses, thus paving the way for enhanced healthcare services. Furthermore, the consideration of different demographics and datasets is crucial, e.g., the Middle Eastern population, where there is a high rate of diabetes and DR, and international databases, such as those containing European, American, and other data.

Author Contributions

Conceptualization, A.J.Z. and S.K.; methodology and formal analysis, A.J.Z.; data curation, A.J.Z.; writing—original draft preparation, A.J.Z.; writing—review and editing, A.J.Z. and S.K.; supervision, A.J.Z.; project administration, A.J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the unknown reviewers of the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wong, T.Y.; Gemmy, C.C.M.; Larsen, M.; Sharma, S.; Rafael, S. Erratum: Diabetic retinopathy. Nat. Rev. Dis. Prim. 2016, 2, 16030. [Google Scholar] [CrossRef]
Abràmoff, M.D.; Garvin, M.K.; Sonka, M. Retinal imaging and image analysis. IEEE Rev. Biomed. Eng. 2010, 3, 169–208. [Google Scholar] [CrossRef] [PubMed]
Varma, R.; Vajaranant, T.S.; Burkemper, B.; Wu, S.; Torres, M.; Hsu, C.; Choudhury, F.; McKean-Cowdin, R. Visual impairment and blindness in adults in the United States: Demographic and geographic variations from 2015 to 2050. JAMA Ophthalmol. 2016, 134, 802–809. [Google Scholar] [CrossRef]
Spanakis, E.K.; Golden, S.H. Race/ethnic difference in diabetes and diabetic complications. Curr. Diabetes Rep. 2013, 13, 814–823. [Google Scholar] [CrossRef]
Leasher, J.L.; Bourne, R.R.A.; Flaxman, S.R.; Jonas, J.B.; Keeffe, J.; Naidoo, K.; Pesudovs, K.; Price, H.; White, R.A.; Wong, T.Y.; et al. Global estimates on the number of people blind or visually impaired by diabetic retinopathy: A meta-analysis from 1990 to 2010. Diabetes Care 2016, 39, 1643–1649. [Google Scholar] [CrossRef] [PubMed]
Ogurtsova, K.; da Rocha Fernandes, J.; Huang, Y.; Linnenkamp, U.; Guariguata, L.; Cho, N.H.; Cavan, D.; Shaw, J.; Makaroff, L. IDF Diabetes Atlas: Global estimates for the prevalence of diabetes for 2015 and 2040. Diabetes Res. Clin. Pract. 2017, 128, 40–50. [Google Scholar] [CrossRef]
Ting, D.S.W.; Cheung, G.C.M.; Wong, T.Y. Diabetic retinopathy: Global prevalence, major risk factors, screening practices and public health challenges: A review. Clin. Exp. Ophthalmol. 2016, 44, 260–277. [Google Scholar] [CrossRef]
Oliverio, G.W.; Meduri, A.; De Salvo, G.; Trombetta, L.; Aragona, P. OCT angiography features in diabetes mellitus type 1 and 2. Diagnostics 2022, 12, 2942. [Google Scholar] [CrossRef]
Pauk-Domańska, M.; Walasik-Szemplińska, D. Color Doppler imaging of the retrobulbar vessels in diabetic retinopathy. J. Ultrason. 2014, 14, 28. [Google Scholar] [CrossRef]
Oliverio, G.W.; Meduri, A.; Brancati, V.U.; Ingrande, I.; De Luca, L.; Raimondo, E.D.; Minutoli, L.; Aragona, E.; Aragona, P. Clinical and optical coherence tomography biomarkers as prognostic factors in dexamethasone intravitreal implant for diabetic macular edema. Eur. J. Ophthalmol. 2024, 34, 1810–1818. [Google Scholar] [CrossRef]
Ratanapakorn, T.; Daengphoonphol, A.; Eua-Anant, N.; Yospaiboon, Y. Digital image processing software for diagnosing diabetic retinopathy from fundus photograph. Clin. Ophthalmol. 2019, 13, 641. [Google Scholar] [PubMed]
Gharaibeh, N.; Al-Hazaimeh, O.M.; Al-Naami, B.; Nahar, K.M.O. An effective image processing method for detection of diabetic retinopathy diseases from retinal fundus images. Int. J. Signal Imaging Syst. Eng. 2018, 11, 206–216. [Google Scholar]
Agarwal, A. Support Vector Machine-Formulation and Derivation. Medium Towards Data Sci. 2019. Accessed 20 (2022). Available online: https://medium.com/towards-data-science/support-vector-machine-formulation-and-derivation-b146ce89f28 (accessed on 25 February 2025).
Liu, Z.; Wang, C.; Cai, X.; Jiang, H.; Wang, J. Discrimination of diabetic retinopathy from optical coherence tomography angiography images using machine learning methods. IEEE Access 2021, 9, 51689–51694. [Google Scholar]
Mathews, M.R.; Anzar, S.T.M. A lightweight deep learning model for retinal optical coherence tomography image classification. Int. J. Imaging Syst. Technol. 2023, 33, 204–216. [Google Scholar]
Sharma, A. Logistic Regression Explained from Scratch (Visually, Mathematically and Programmatically). Towards Data Sci. 2021. Available online: https://towardsdatascience.com/logistic-regression-explained-from-scratch-visually-mathematically-and-programmatically-eb83520fdf9a/ (accessed on 25 February 2025).
Ting, D.S.W.; Cheung, C.Y.L.; Lim, G.; Tan, G.S.W.; Quang, N.D.; Gan, A.; Hamzah, H.; Garcia-Franco, R.; San Yeo, I.Y.; Lee, S.Y.; et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 2017, 318, 2211–2223. [Google Scholar]
Tsiknakis, N.; Theodoropoulos, D.; Manikis, G.; Ktistakis, E.; Boutsora, O.; Berto, A.; Scarpa, F.; Scarpa, A.; Fotiadis, D.I.; Marias, K. Deep learning for diabetic retinopathy detection and classification based on fundus images: A review. Comput. Biol. Med. 2021, 135, 104599. [Google Scholar]
Ferreira, V.; Da Silva, R.; Silva, D.; Gomes, E. Production of pectate lyase by Penicillium viridicatum RFC3 in solid-state and submerged fermentation. Int. J. Microbiol. 2010, 2010, 276590. [Google Scholar]
Qiao, L.; Zhu, Y.; Zhou, H. Diabetic retinopathy detection using prognosis of microaneurysm and early diagnosis system for non-proliferative diabetic retinopathy based on deep learning algorithms. IEEE Access 2020, 8, 104292–104302. [Google Scholar]
Fu, Y.; Lu, X.; Zhang, G.; Lu, Q.; Wang, C.; Zhang, D. Automatic grading of Diabetic macular edema based on end-to-end network. Expert Syst. Appl. 2023, 213, 118835. [Google Scholar]
Abdelsalam, M.M. Effective blood vessels reconstruction methodology for early detection and classification of diabetic retinopathy using OCTA images by artificial neural network. Inform. Med. Unlocked 2020, 20, 100390. [Google Scholar]
Abdelsalam, M.M.; Zahran, M. A novel approach of diabetic retinopathy early detection based on multifractal geometry analysis for OCTA macular images using support vector machine. IEEE Access 2021, 9, 22844–22858. [Google Scholar] [CrossRef]
Hemanth, D.J.; Deperlioglu, O.; Kose, U. An enhanced diabetic retinopathy detection and classification approach using deep convolutional neural network. Neural Comput. Appl. 2020, 32, 707–721. [Google Scholar] [CrossRef]
Damrawi, G.E.; Zahran, M.; Amin, E.; Abdelsalam, M.M. Numerical detection of diabetic retinopathy stages by multifractal analysis for OCTA macular images using multistage artificial neural network. J. Ambient. Intell. Humaniz. Comput. 2021, 14, 7133–7145. [Google Scholar]
Erciyas, A.; Barışçı, N. An effective method for detecting and classifying diabetic retinopathy lesions based on deep learning. Comput. Math. Methods Med. 2021, 2021, 9928899. [Google Scholar] [CrossRef]
Gulshan, V.; Peng, L.; Coram, M.; Stumpe, M.C.; Wu, D.; Narayanaswamy, A.; Venugopalan, S.; Widner, K.; Madams, T.; Cuadros, J.; et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016, 316, 2402–2410. [Google Scholar] [CrossRef]
Zaylaa, A.J.; Wehbe, G.I.; Ouahabi, A.M. Bringing AI to Automatic Diagnosis of Diabetic Retinopathy from Optical Coherence Tomography Angiography. In Proceedings of the 2021 Sixth International Conference on Advances in Biomedical Engineering (ICABME), Wardaniyeh, Lebanon, 7–9 October 2021; pp. 49–53. [Google Scholar]
National, E.I. At a Glance: Diabetic Retinopathy. 2023. Available online: https://www.nei.nih.gov/learn-about-eye-health/eye-conditions-and-diseases/diabetic-retinopathy (accessed on 25 February 2025).
Karthik, M.; Dane, S. APTOS 2019 Blindness Detection. Kaggle. 2019. Available online: https://kaggle.com/competitions/aptos2019-blindness-detection (accessed on 25 February 2025).
Kim, R.; Mishra, C.; Sen, S. The use of teleconsultation and technology by the Aravind Eye Care System, India. Community Eye Health J. 2022, 35, 10. Available online: https://pmc.ncbi.nlm.nih.gov/articles/PMC9412088/ (accessed on 25 February 2025).
Femmam, S.; M’Sirdi, N.; Ouahabi, A. Perception and characterization of materials using signal processing techniques. IEEE Trans. Instrum. Meas. 2001, 50, 1203–1211. [Google Scholar]
Ouahabi, A. A review of wavelet denoising in medical imaging. In Proceedings of the 2013 8th International Workshop on Systems, Signal Processing and Their Applications (WoSSPA), Algiers, Algeria, 2–15 May 2013; pp. 19–26. [Google Scholar]
Mahdaoui, A.E.; Ouahabi, A.; Moulay, M.S. Image denoising using a compressive sensing approach based on regularization constraints. Sensors 2022, 22, 2199. [Google Scholar] [CrossRef]
Brownlee, J. Deep Learning for Natural Language Processing: Develop Deep Learning Models for Your Natural Language Problems; Machine Learning Mastery: Vermont, Australia, 2017. [Google Scholar]
Ghazal, M.; Ali, S.S.; Mahmoud, A.H.; Shalaby, A.M.; El-Baz, A. Accurate detection of non-proliferative diabetic retinopathy in optical coherence tomography images using convolutional neural networks. IEEE Access 2020, 8, 34387–34397. [Google Scholar]
Zaylaa, A. Analysis and Extraction of Complexity Parameters of Biomedical Signals. Ph.D. Thesis, François-Rabelais University of Tours, Tours, France, 2014. [Google Scholar]
Zaylaa, A.J.; Makki, M.; Kassem, R. Thalassemia Diagnosis Through Medical Imaging: A New Artificial Intelligence-Based Framework. In Proceedings of the 2022 International Conference on Smart Systems and Power Management (IC2SPM), Beirut, Lebanon, 10–12 November 2022; pp. 41–46. [Google Scholar]
Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Nørgaard, M.F.; Grauslund, J. Automated screening for diabetic retinopathy—A systematic review. Ophthalmic Res. 2018, 60, 9–17. [Google Scholar] [PubMed]

Figure 1. Cumulative growth in diabetic retinopathy cases in the USA from 2010 to 2050.

Figure 2. Normal and diabetic eyes. The diabetic eye contains abnormal blood vessels, hemorrhage, cotton wool spots, and aneurysms.

Figure 3. The block diagram of the proposed framework for the diagnosis and detection of diabetic retinopathy (DR).

Figure 4. Digital slit lamp used in the examination of a patient at the Biomedical Imaging Lab at the Faculty of Engineering at Beirut Arab University. (a) Preparation and initialization of the machine, with the doctor on the right and the patient on the left. (b) Examination process.

Figure 5. Illustration of the steps involved in processing a fundus image of the right eye of a normal patient, i.e., ‘No DR’. (a) Original image. (b) Resized image of the original image of a normal patient. (c) Binary information. (d) Grayscale information of the image.

Figure 6. Illustration of the steps involved in processing the fundus image of the right eye of a patient suffering from DR. (a) Original image. (b) Resized image of the original image of the DR patient. (c) Binary information. (d) Grayscale information of the image.

Figure 7. The Receiver Operating Characteristics (ROC) curves of the deep neural network (DNN) models: (a) AlexNet; (b) GoogleNet; (c) ResNet-50; and (d) DarkNet-53.

Figure 8. The Receiver Operating Characteristics (ROC) curves of the deep neural network (DNN) models: (a) EfficientNet-60; (b) SqueezeNet; (c) Inception-v3; and (d) MobileNet-v2.

Figure 9. Accuracy (top) and error rate (bottom) versus the number of epochs for the deep neural network (DNN) frameworks: (a) AlexNet; (b) GoogleNet; (c) ResNet-50; and (d) DarkNet-53.

Figure 10. Accuracy (top) and error rate (bottom) versus the number of epochs for the deep neural network (DNN) frameworks: (a) EfficientNet-60; (b) SqueezeNet; (c) Inception-v3; and (d) MobileNet-v2.

Table 1. Cumulative growth in diabetic retinopathy cases based on race/ethnicity from 2010 to 2050.

Year	White	Black	Hispanic	Other	Total
2010	5,251,907	826,102	1,194,231	412,997	7,685,237
2030	6,384,275	1,191,481	2,939,136	835,113	11,350,005
2050	6,374,626	1,547,724	5,254,328	1,382,786	14,559,464

Table 2. The APTOS 2019 dataset: multi-class image distribution.

Class	Number of Images
No DR	1805
Mild DR	370
Moderate DR	999
Severe DR	193
Proliferative DR	295

Table 3. The APTOS 2019 dataset: binary-class image distribution with the total number of images, and the number of trained, validated and tested images.

DR Label	# Total Images	# Training Images	# Validation Images	# Test Images
No	1805	1300	144	361
Yes	1857	1337	149	371

Table 4. Properties of deep neural network models.

Network	Depth	Size (MB)	Parameters (Millions)	Image Input Size
AlexNet	8	227	61.0	224-by-224
GoogleNet	22	27	7.0	224-by-224
ResNet-50	50	96	25.6	224-by-224
DarkNet-53	53	155	41.6	256-by-256
EfficientNet-b0	20	20–30	5.3	224-by-224
SqueezeNet	18	5–10	1–2	224-by-224
Inception-v3	48	89	23.9	299-by-299
MobileNet-v2	53	13	3.5	224-by-224

Table 5. Learning rates, hyperparameters, and hardware requirements of deep neural network models.

Network	Learning Rate	Decay Strategy	Hardware (CPU/GPU) Requirements
AlexNet	0.01	Step decay	Moderate CPU/moderate GPU
GoogleNet	0.01	Step decay	High CPU/moderate GPU
ResNet-50	0.01	Step decay	High CPU/high GPU
DarkNet-53	0.001	Step decay	High CPU/high GPU
EfficientNet-b0	0.01	Step decay	Moderate CPU/moderate GPU
SqueezeNet	0.01	Step decay	Low CPU/low GPU
Inception-v3	0.045	Step decay	High CPU/high GPU
MobileNet-v2	0.045	Step decay	Low CPU/moderate GPU

Table 6. The statistical evaluation results regarding DR detection through the eight DNNs, including the sensitivity, specificity, accuracy, and precision, in percentages (%).

Network Model	Sensitivity (%)	Specificity (%)	Accuracy (%)	Precision (%)
AlexNet	97.7	87.09	92.12	87.09
GoogleNet	99.4	88.36	93.56	88.33
ResNet-50	97.58	89.74	93.56	90.07
DarkNet-53	98.05	87.56	92.51	87.59
EfiicientNet-b0	98.05	87.64	92.78	87.90
SqueezeNet	99.16	88.33	93.43	88.33
Inception-v3	97.82	88.86	93.17	89.08
MobileNet-v2	96.99	87.87	92.25	88.08

Note: Bold values indicate the best performance among the compared methods.

Table 7. Comparison of the statistical evaluation results regarding the detection and diagnosis of DR—namely, the sensitivity, specificity, accuracy, and precision—obtained with the new framework versus the methods described in the literature, based on the optical images involved.

Evaluation Algorithm for DR Diagnosis	Type and Number of Images (N)	Sensitivity (%)	Specificity (%)	Accuracy (%)	DR Diagnosis Error (%) After Epoch 2
FrameworkGoogleNet	Fundus images (1857)	99.40	88.36	93.56	0.50%
FrameworkResNet-50	Fundus images (1857)	97.58	89.74	93.56	0.25%
Framework DarkNet-53	Fundus images (1857)	98.05	87.56	92.51	0.25%
Framework EfficientNet-60	Fundus images (1857)	98.05	87.64	92.78	0.30%
Framework SqueezeNet	Fundus images (1857)	99.16	88.33	93.43	0.25%
Framework Inception-v3	Fundus images (1857)	97.82	88.86	93.17	0.25%
Framework MobileNet-v2	Fundus images (1857)	96.99	87.87	92.25	0.25%
Framework AlexNet	Fundus images (1857)	97.7	87.09	92.12	0.25%
(CNN, SVM) in 2021 [28]	OCTA images (45)	88.88	95.55	-	-
Feedforward Backpropagation NN [28]	OCTA images (45)	66.66	71.11	68.88	-
SVM in 2021 [28]	OCTA images (45)	40.00	62.22	-	-
SVM modified in 2021 [28]	OCTA images (45)	62.22	73.33	-	-
(CNN, SVM) in 2020 [36]	OCT images (26)	-	88	94	-
SVM in 2018 [12]	Fundus images (64)	98	96	97.6	-
SVMGA in 2018 [12]	Fundus images (64)	99	96	98.4	-
PNN in 2018 [12]	Fundus images (64)	90	88	89.6	-
K-Nearest Neighbors (KNN) in 2020 [36]	OCT images (26)	-	83	84	-
Random Forest in 2020 [36]	OCT images (26)	-	82	82	-
Random Tree in 2020 [36]	OCT images (26)	-	81	81	-

Note: Bold values indicate the best performance among the compared methods.

Table 8. Comparison of CNNs and the newer networks used in retinal disease diagnosis.

Criterion	CNN-Based Models (8 Networks: ResNet, GoogleNet, etc.)	Newer Networks (e.g., ViT, Swin Transformer, ConvNeXt, Vision Mamba)
Computational Efficiency	Efficient due to optimized convolutional operations and suitable for medical imaging on standard GPUs.	Require large computational resources, such as high-end GPUs, increasing the inference time.
Training Data Requirements	Perform well on small datasets; ImageNet-pre-trained models can be fine-tuned on medical data.	Need large-scale datasets for training due to the absence of spatial inductive bias.
Feature Extraction	Possess strong local feature extraction capabilities, useful in detecting small abnormalities in retinal images.	These models focus on global relationships; however, they may miss fine-grained details in small medical images.
Generalization to Small Datasets	Work well with limited labeled medical data due to strong hierarchical feature learning.	Prone to overfitting on small datasets without extensive augmentation and regularization.
Interpretability	More interpretable with feature visualization tools such as Grad-CAM.	Transformers lack straightforward interpretability, leading to challenges in clinical validation.
Availability of Pre-Trained Medical Models	CNN-based models pre-trained on medical imaging datasets (e.g., ImageNet).	Limited availability of pre-trained models for medical imaging; require domain-specific fine-tuning.
Localization of Retinal Features	CNNs naturally capture local spatial features (blood vessels, lesions) with small receptive fields.	Transformers analyze entire images at once, which can reduce the accuracy in detecting small retinal abnormalities.
Stability and Robustness	CNNs are well tested for medical image classification and generalize well across datasets.	Transformers are newer and need extensive hyperparameter tuning to avoid instability.
Adoption in Medical AI	Widely employed in retinal disease diagnosis (e.g., diabetic retinopathy, glaucoma detection).	Limited adoption in medical AI due to high data needs and computational costs.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zaylaa, A.J.; Kourtian, S. From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks. Appl. Sci. 2025, 15, 2684. https://doi.org/10.3390/app15052684

AMA Style

Zaylaa AJ, Kourtian S. From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks. Applied Sciences. 2025; 15(5):2684. https://doi.org/10.3390/app15052684

Chicago/Turabian Style

Zaylaa, Amira J., and Sylva Kourtian. 2025. "From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks" Applied Sciences 15, no. 5: 2684. https://doi.org/10.3390/app15052684

APA Style

Zaylaa, A. J., & Kourtian, S. (2025). From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks. Applied Sciences, 15(5), 2684. https://doi.org/10.3390/app15052684

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks

Abstract

1. Introduction

2. Novel Diabetic Retinopathy (DR) Detection Framework and Experiment

2.1. Experimental Setup

2.2. Methodology

2.2.1. Data Description

2.2.2. DR Diagnosis Criteria

2.2.3. Optical Image Pre-Processing

2.2.4. Early DR Detection Algorithms

2.2.5. Adam Optimizer

2.2.6. Evaluation Method

3. Experimental Results

4. Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI