Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy

Sassi Hidri, Minyar; Hidri, Adel; Alsaif, Suleiman Ali; Alahmari, Muteeb; AlShehri, Eman

doi:10.3390/info16030221

Open AccessArticle

Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy

by

Minyar Sassi Hidri

^*

,

Adel Hidri

,

Suleiman Ali Alsaif

,

Muteeb Alahmari

and

Eman AlShehri

Department of Computer Science, Deanship of Preparatory Year and Supporting Studies, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Information 2025, 16(3), 221; https://doi.org/10.3390/info16030221

Submission received: 8 February 2025 / Revised: 24 February 2025 / Accepted: 7 March 2025 / Published: 13 March 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Diabetic retinopathy (DR) is the main ocular complication of diabetes. Asymptomatic for a long time, it is subject to annual screening using dilated fundus or retinal photography to look for early signs. Fundus photography and optical coherence tomography (OCT) are used by ophthalmologists to assess retinal thickness and structure, as well as detect edema, hemorrhage, and scarring. The effectiveness of ConvNet no longer needs to be demonstrated, and its use in the field of imaging has made it possible to overcome many barriers, which were until now insurmountable with old methods. Throughout this study, a robust and optimal deep ConvNet is proposed to analyze fundus images and automatically distinguish between healthy, moderate, and severe DR. The proposed model combines the use of the ConvNet architecture taken from ImageNet, data augmentation, class balancing, and transfer learning in order to establish a benchmarking test. A significant improvement at the level of middle class which corresponds to the early stage of DR, which was the major problem in previous studies. By eliminating the need for retina specialists and broadening access to retinal care, the proposed model is substantially more robust in objectively early staging and detecting DR.

Keywords:

diabetic retinopathy; deep learning; ConvNet; ImageNet; SGD

1. Introduction

Diabetic retinopathy (DR) is a complication linked to long-term untreated diabetes. People with this condition generally experience serious visual loss due to late detection. These visual deficiencies can be more or less serious depending on the stage of the pathology, leading in the worst case to blindness in patients. If DR is discovered early on, the development of visual impairment can be slowed down or prevented. This can be challenging, though, as the illness sometimes does not show any signs until it is too late to receive a successful therapy.

The process of detecting the disease is manual and tedious, and requires a qualified ophthalmologist to evaluate the digital fundus images; however, it is often too late when the disease is diagnosed, with the patient having suffered irreversible losses. The number of adults suffering from blindness due to late-diagnosed retinopathy has been estimated at more than 93 million, hence the importance of a comprehensive and automated method to help detect this pathology early.

The recognition of medical images to detect the presence of anomalies or diagnose diseases has progressed significantly in recent years, particularly via deep learning (DL), a subfield of artificial intelligence (AI) which is mainly based on its resolution approach.

It turns out that the most successful approaches, currently, reside in the use of DL algorithms [1,2,3,4,5] with deep ConvNets, which has been proven by a set of studies which uses models inspired by a ConvNet architecture from 2010. To this, we add results which are more than indicative of the potential of its algorithms, some of which have become the state of the art and are even used in the medical field as diagnostic aids [6] or diagnostic tools [7,8].

The effectiveness of ConvNet no longer needs to be demonstrated, and its use in the field of imaging has made it possible to overcome many barriers, which were until now insurmountable with old methods. With the same direction, we propose optimal and robust models, making it possible to detect the presence of DR at an early stage on fundus images using three different models based on networks of neurons with deep ConvNet-type architecture.

The proposed solution combines the use of the ConvNet architecture [9,10,11] taken from ImageNet, data augmentation, class balancing, and transfer learning. Our choice was oriented towards ConvNets due to the nature of the problem (supervised learning type classification), as well as the type of data to be processed (images). The solution aims to detect the presence of symptoms linked to the presence or absence of the DR, and according to the different stages of the pathology from a healthy at the final stage (proliferative), corresponding to distinct classes of pathology.

The evaluation will be made on a dataset of 2750 samples containing 5 types of classes distributed by stage of the pathology by an ophthalmologist. The size of images varies from

2400 \times 3400

to

2500 \times 3550

. The images will undergo a preprocessing phase in which the raw data are transformed into a form that can be used by the model.

All models will be trained under the same conditions and we will make a change to the optimization function for comparative purposes and to collect the model which will have obtained the best results to compare it with previous studies. Xception, Inception-ResNet V2, and DenseNet201 will be subject to updates to suit the new deep ConvNet model.

The remainder of this paper is organized as follows. Section 2 shows how AI-based related work is at the service of early DR detection. Section 3 illustrates the deep ConvNet models for DR staging and detection. Section 4 shows the experimental setup and findings. Section 5 wraps up the article by reviewing the suggested approach’s possible uses and future possibilities.

2. AI at the Service of Early DR Detection: Issues, Challenges, and Approaches

DR is an ocular complication of diabetes caused by progressive damage to the small vessels of the retina. It is an irreversible disease, worsened by the duration of progression of diabetes and its imbalance. The wall of the vessels deteriorates, leading to a loss of tightness and areas of occlusion. Retinal abnormalities must be detected, monitored, and, if necessary, treated in order to prevent ocular complications of diabetes. Research on early DR focuses on the identification, characterization, and management of the disease in its early stages to prevent progression and minimize vision impairment [12,13,14,15].

Development of accurate and efficient screening methods for early DR detection is a key area of research. This includes the use of AI algorithms for analyzing retinal images to identify subtle changes indicative of early-stage disease [16,17,18]. Leveraging machine learning (ML) algorithms and deep neural networks, AI systems can analyze retinal images to identify signs of DR at its initial stages. In [19], AI algorithms are trained on large datasets of retinal images to learn patterns associated with DR. The algorithms can then automatically analyze new images and identify abnormalities. In [20,21,22], the authors used AI models which excelled at extracting subtle features indicative of early-stage DR, such as microaneurysms, hemorrhages, and exudates.

Convolutional Neural Networks (CNNs), a type of DL neural network, have shown significant success in image recognition tasks, including the classification of retinal images for DR [1,2,3,4,5]. In [23], the authors used CNNs to learn hierarchical representations of image features, allowing for the identification of intricate details relevant to disease progression.

In [24], the authors used AI models to classify DR into different severity grades, helping prioritize patients for appropriate interventions based on the level of retinal damage. The proposed systems can categorize retinal images into classes representing various stages of DR, from mild non-proliferative to severe proliferative DR. In [25,26], the authors used ensemble models to combine predictions from multiple AI algorithms, enhancing overall accuracy and robustness in detecting early signs of DR. They used voting systems to aggregate individual algorithm predictions, reducing the likelihood of false positives or false negatives. In [27,28,29], eXplainable AI (XAI) techniques were used. The aim is to make AI models more interpretable by providing insights into the decision-making process. This is crucial for gaining trust among healthcare professionals and ensuring the reliability of AI-based diagnoses. In [30], AI systems have been integrated with electronic health records, streamlining the workflow for healthcare professionals and ensuring seamless communication between different components of patient care.

Some AI solutions offer real-time analysis of retinal images, allowing for immediate feedback and intervention during routine clinical examinations [31,32].

While AI holds great promise in the early detection of DR, it is crucial to address challenges such as dataset biases, ethical considerations, and regulatory standards to ensure the responsible and effective deployment of AI in clinical settings. Ongoing research and collaboration between AI developers, healthcare professionals, and regulatory bodies are essential for harnessing the full potential of AI in improving DR detection and patient outcomes.

3. Deep ConvNet for DR Staging and Detecting

The models used in this article come from the ImageNet community, an independent organization that organizes competitions in the Computer Vision field that bring together the best in the field.

We will make changes to the different architectures to improve DR staging and detection.

3.1. Model Architectures

Xception [33] is a deep CNN architecture that involves depth-wise separable convolutions. It has an efficient architecture which is based on two main points:

Depth-separable convolution: This is an alternative to classic convolutions, and is supposed to be much more efficient in terms of calculation time; in other words, certain kernels or filters can be separated into as many unidirectional vectors as dimensions (x-axis and ordered case of 2D), concerning 3D (we add the vectors concerning depth). Thus, this overcomes the problem of classic convolutions.
Shortcuts between convolution blocks: This is when the activation of one layer is transferred quickly to a deeper layer of the neural network, which is called a residual block or identity.

In the inception module, we performed a series of operations in parallel instead of performing just one like classic ConvNet. These operations can be convolutions with different filter formats (

5 \times 5 / 3 \times 3 / 1 \times 1

), as well as average pooling.

Average pooling,

1 \times 1

convolution,

1 \times 1

followed by

5 \times 5

convolution; this combination can be used with a choice of parameters which make the final result linked to this module smaller than the one calculated. A concatenation of the results of the different operations to be sent to the next inception module.

Figure 1 shows the basic architecture of XceptionNet.

Inception-ResNet V2 [34] is a CNN trained on over a million images from the ImageNet dataset. The network can categorize images into 1000 different item categories and has a depth of 164 layers. Consequently, a vast array of picture-rich feature representations were learned by the network. Its inception module is identical to the Inception-ResNet module shown in Figure 1a.

Figure 2 shows the Inception-ResNet V2 architecture.

DenseNet [35] is a new ConvNet architecture (from ImageNet) which has obtained high results on classification datasets using fewer parameters than its predecessors (other ImageNet models). This architecture is composed of dense blocks. In these blocks, the layers are closely linked to each other: each layer receives as input all the output feature maps of the previous layers. Figure 3 shows the DenseNet architecture architecture.

DenseNet is composed of dense blocks. In these blocks, the layers are closely linked to each other: each layer receives as input all the results of the convolutions of the previous layers. Extreme use of residuals is a mechanism used in DenseNet to connect layers of the same block with those of other following blocks.

The layers create deep supervision because each layer receives increased loss function supervision due to the shorter connections. We find within a layer of a dense block the following operations:

Batch normalization: This is a technique that accelerates the convergence of a model and improves its performance;
ReLU activation;
$3 \times 3$ convolution.

A transition layer is employed to regulate the model’s complexity. The model’s complexity is decreased by utilizing the

1 \times 1

convolutional layer to minimize the number of channels and by halving the average pooling layer’s height and breadth with a step size of 2. It also provides a compression function which helps improve the compactness of the model.

All the models mentioned above have undergone a transformation in terms of their architecture which is presented as follows:

For the input layer, we add:
–
1 layer (299, 299, 3) for Xception and Inception-Resnet V2.
–
1 layer (224, 224, 3) for DenseNet201.
For the output layer, we add:
–
A dense layer (fully connected with 256 neurons) + Relu activation function.
–
A dropout layer (0.4): this will allow us to drop units (both hidden or visible) in a neural network, these neurons will be chosen at random thanks to a probability $1 - p$ , or kept with the probability p so that it remains at the end reduced network, and by abandonment, we mean that these neurons will not be taken into account during forward or backward pass to avoid overfitting.
–
Addition of a loss function (categorical cross entropy) loss calculation with three categories (for our three classes).
–
A softmax layer.

The selection and tuning of hyperparameters used in our study are as follows:

Learning rate: a unified learning rate of 0.001/million was applied on three optimization methods (Adam, Adagrad, and SGD).
Batch size: a treat of 32-f and 32-c was given for both the training and the testing and was kept in balance in the trade-off between the computational efficiency and model performance.
Dropout rate: a dropout layer with a penalty of 0.4 was also settled in the context of overvaluating by performing the ablation of the redundant neurons.

Although EfficientNet and Vision Transformers show great potential, their high overhead during processing and requirement of extensive data augmentation meant they were not well-suited to our study. With improvements in hardware capabilities and dataset sizes, these models can perhaps be included in future research studies. Xception, Inception-ResNet V2, and DenseNet201 were chosen based on a few key considerations:

Medical imaging task performance: Convolutional Neural Networks (CNNs) have recorded remarkable performance in medical image classification tasks, specifically diabetic retinopathy detection. Xception, Inception-ResNet V2, and DenseNet201 have been thoroughly explored in the literature, and they have been found to possess excellent feature extraction capabilities for fundus images.
Stability in training and computational efficiency: Although EfficientNet is renowned for parameter efficiency, we favored models that have been stable and consistent in their training on comparably modest medical datasets. Despite their strength, Vision Transformers (ViTs) usually need significantly larger datasets and processing resources to be comparable with other models.
Transfer learning and pretrained weights: The chosen models are already pretrained on ImageNet, enabling efficient transfer learning for our dataset. This helped counter the problem of sparse training data, since medical datasets tend to be small and not very diverse.

3.2. Datasets

We have a dataset of 2750 samples provided by the Kaggle platform, containing 5 types of classes distributed by stage of the pathology by a specialist (ophthalmologist); the size of its images varies from

2400 \times 3400

to 2500 [36].

His fundus images were checked by an ophthalmologist who classified them into five categories (classes), each of which represents a stage of the disease:

0—Healthy (No DR): 100 images;
1—Mild DR: 370 images;
2—Moderate DR: 900 images;
3—Severe DR: 190 images;
4—Proliferative DR: 290 images.

In our study, we were able to use only 1134 data samples (due to resources), fragmented into three classes:

Class #1: Healthy (No DR);
Class #2: Merge Mild DR and Moderate DR classes;
Class #3: Merge Severe DR and Proliferate DR classes.

Issues with data distribution and model performance were the main factors in the decision to reduce the original five classes to three. There were fewer photos in the mild and severe DR categories, indicating an unequal sample distribution across classes in the dataset. This could have led to biased model prediction. Second, the dataset annotations, which were obtained from Kaggle, reflected classification inconsistency variation between mild and moderate DR, a difference that in most instances can be tricky even for specialists. Mild and moderate DR share superimposable clinical characteristics, such as microaneurysms and small hemorrhages. With limited data size, it is probable that training to discriminate these two stages with fewer available labeled data could have led to misclassifications. By reducing them to a single class (i.e., #2), we hoped to improve model robustness further while still enabling early-stage detection. This reduced class provided the model permission to be excellent at identifying the early-stage DR rather than performing poorly attempting to differentiate between two very similar classes with limited available data.

The dataset was fragmented proportionally to 80% and 20% training set and test set, respectively, with a random arrangement of categories within the training and test set. The data will undergo a preprocessing phase in which the raw data are transformed into data usable by the model.

3.3. Data Preprocessing

Preprocessing is the step during which the raw data undergo processing to make them usable by the model.

In our case, our images are background fluoroscopy shots; the treatments applied to them are as follows:

Change of data format to tensor (matrix with dimension greater than 3).
We carried out under-sampling (sample reduction) on our data, to overcome the problem of imbalance at the class level relating all classes to the number of samples of the lowest class (Class #5).
Grouping of classes with a low number of samples as follows:
–
Class #0: Healthy (No Dr) (Class #0).
–
Class #1: Middle DR (Class #1 and #2).
–
Class #2: Severe DR (Class #3 and #4).
Increase in the number of data available per class in a homogeneous manner.
Normalize images by subtracting the minimum pixel intensity of each channel and dividing it by the average pixel intensity to represent pixels in a range of 0 to 1.
Image processing using the CLAHE algorithm (Contrast-Limited Adaptive Histogram Equalization) [37] on RGB images, whose role is to improve the quality of the image. First, the image is separated into rectangular parts that resemble a grid, and each zone is subjected to typical histogram equalization. All these regions are combined, to obtain a complete optimized image.
Resizing images to a format acceptable to models:
–
(229, 299, 3) for Xception and Inception-Resnet V2.
–
(224, 224, 3) for DenseNet201.

3.4. Improving Model Performance

To improve the training performance, we used transfer learning to save knowledge learned by the model, then reused its knowledge on another model. In our case, each model (Xception, Inception-Resnet V2, and DenseNet201) is pretrained on an ImageNet dataset (object detection task), and then we reuse the latter with the learning parameters already available on our problem (DR detection).

Another way to increase the performance of a model is to perform data augmentation. Data augmentation is a process that allows models to generalize better. In practice, this involves performing transformations on an image during training, such as rotations, image enlargements, or flips. This makes it possible to increase the variability of a database from the original images for many reasons:

The model requires more data to be trained, creating a risk of over- or under-training (everything will depend on the behavior of the model).
The dataset is unbalanced, which results in an imbalance within classes having a disproportionate number of samples (which makes the evaluation task more complex).

To augment our data, we generated data from the originals by transposing the image and rotating the image from different angles (120°, 72°, 45°).

4. Simulation Results and Discussion

4.1. Testing and Evaluation

We subjected our models to a series of tests to evaluate the performance of our models. Due to the area in which we wish to develop our model, we take into account metrics that best respect the constraints linked to the medical field, our choices focused on sensitivity, specificity, and the ROC-AUC curve.

Sensitivity represents the True Positive Rate (TPR) or recall, which means the proportion of people who were predicted to be positive on the test. In other words, if we take a population of patients who have taken the test, sensitivity refers to the proportion of people affected by the disease, and who were predicted correctly. The mathematical definition is given by Equation (1).

S e n s i t i v i t y = \frac{T P}{T P + F N}

(1)

where TP (True Positive) is the number of cases that the test declares positive and which are positive, FP (False Positive) is the number of cases that the test declares positive that are negative, TN (True Negative) is the number of cases that the test declares negative and which are negative, and FN (False Negative) is the number of cases that the test declares negative that are positive. Here, this value is 1.

Specificity represents the true negative rate, which means the proportion of people who were predicted to be negative on the test. It aims to calculate the proportion of people who do not suffer from the disease, and who have been predicted as such. The mathematical definition is given by Equation (2).

S p e c i f i c i t y = \frac{T N}{T N + F P}

(2)

Taking these two metrics into account, as well as the correct choice of the threshold, will allow us to maximize the specificity and sensitivity values.

This step is crucial because it will allow us to avoid certain cases which would make our model unreliable for medical diagnosis.

Case 1 (Low specificity): Predict sickness among healthy patients.
Case 2 (Low sensitivity): Predict health among sick patients.
Case 3 (Low sensitivity and specificity): Wrong prediction, the model would have a prediction value below a random model.

The best case would be to have maximum specificity as well as sensitivity, but this turns out to be a complex task because there is a trade-off between its two metrics; pushing one metric to the max deteriorates the other metric. Thus, we compromise specificity in favor of sensitivity.

In a practical case, it would be more judicious to have a so-called healthy patient predicted as sick than the opposite because we could always have them undergo additional tests confirming or not the veracity of our diagnosis. To predict the opposite would be disastrous, especially in a field as critical as the medical field.

The ROC curve takes the sensitivity metric and the anti-specificity (1-Specificity) [38]. This curve is also very well researched when it comes to dealing with cases in which the metrics mentioned above are included, as it gives us information on the behavior of the model. The AUC or area under the ROC curve is the most important value to estimate the performance of a model across the ROC curve.

Since the AUC-ROC curve is only for binary classification problems, we extend it to multiclass classification problems using the One vs. All technique. Thus, for our three classes, Class #0, Class #1, and Class #2, the ROC for Class #0 will be generated as classifying 0 against not 0, i.e., 1 and 2. The ROC for Class #1 will be generated as classifying 1 against not 1, and so on.

4.2. Results

The process of adjusting weights to produce increasingly accurate predictions about data is known as parameter optimization. This process can be considered a scientific method in which a hypothesis is formulated and tested against reality. Then, this hypothesis is refined or replaced (possibly several times) to better describe events in the world.

The problem of choosing appropriate weights for the model is a daunting task because a DL model usually consists of millions of parameters. It is, therefore, necessary to choose an optimization algorithm adapted to the type of application being processed. Among the optimization functions most used in the literature, we can cite Adam (Adaptive Moment Estimation), Adagrad (Adaptive Gradient Algorithm), and SGD (Stochastic Gradient Descentare) [39].

Table 1 presents Xception, Inception-ResNet V2, and DenseNet setups.

Regarding the number of iterations, it was set to 100 (epochs = 100), with a batch_size of 32 for the training phase and the testing phase.

Figure 4 shows the ROC AUC curves for the three classes with Xception architecture, i.e., #1, #2, and #3. For Classes #1 and #2, the AUC results indicate 0.5, which means that the model has random behavior. For Class #0, the AUC = 1, which means that the model is in an ideal case to predict Class #0. Figure 4c shows a score of 0 for AUC for Classes #0 and #1, which means that the model is not able to distinguish between positive and negative. For Class #2, it gave a maximum of 1.

Figure 5 shows the ROC AUC curves for the three classes with Inception-ResNetV2 architecture, i.e., #1, #2, and #3. For Classes #1 and #2, the AUC was 0.5; thus, the behavior of the model is random in these classes. For class #0, it obtained a max of 1.

In Figure 5a, Class #0 an AUC of 0, which means that the model cannot distinguish between positive and negative; for Class #1, it obtained the maximum; for Class #1, it obtained the max, and finally, Class #2 it is at a random level. The results shown in Figure 5b,c, for the three classes combined, are at 0, which means that the model is not able to distinguish between positive and negative for the three classes.

Figure 6 shows the ROC AUC curves for the three classes with DenseNet architecture, i.e., #1, #2, and #3. Classes #0 and #2 obtained a max AUC of 1, as shown in Figure 6a, as well as 0.5 for Class #1, which means that it is random regarding the early case. Classes #0 and #2 in Figure 6b obtained an AUC of 1, but Class #1 obtained an AUC of 0.

After the series of experimental tests carried out on all the configurations cited in Table 2, DenseNet Case (c) obtained the best results in terms of evaluation metrics, as well as prediction time, which places it first in front of the other two. Using the Inception-ResNetV2 model, we obtained the worst results in terms of specificity or even sensitivity, and we can see this very clearly in the ROC curve graph regardless of the optimization algorithm used.

Concerning the Xception model, the results are indeed better than Inception-ResNetV2, but remain quite low given the costly training time; however, it remains faster in terms of prediction than its predecessors Inception V3 and V4.

The DenseNet model provided the best results in this study, outperforming Xception and Inception-ResNet V2 by far in terms of specificity and sensitivity; however, it has too long a prediction time due to the model depth.

Finally, regarding the methodology followed throughout this study, our model obtained high classification results. We can also note a significant improvement at the level of Class#2, which corresponds to the early stage of the pathology, which was a major problem in previous studies.

Table 3 presents the results of statistical significance tests comparing DenseNet201 with Xception and Inception-ResNet V2 using t-tests and Wilcoxon tests. DenseNet201 performs significantly better than both Xception and Inception-ResNet V2 using the t-test. The Wilcoxon test confirms that the differences in performance between DenseNet201 and the other models are statistically significant.

Since all p-values are below 0.05, the differences in model performance are statistically significant. DenseNet201 is statistically proven to outperform Xception and Inception-ResNet V2 in this study. This supports the claim that DenseNet201 is the best-performing model among the three for this task.

5. Conclusions and Limitations

This study explored an optimized deep ConvNet method for DR staging and detection from fundus images. The models showed a significant improvement in classification accuracy with the help of transfer learning, data augmentation, and class balancing. Upon comparing different architectures, it was observed that DenseNet201 provided higher sensitivity and specificity measures compared to Xception and Inception-ResNet V2 at the cost of prediction efficiency due to depth. The results demonstrate the potential of DL to diagnose DR automatically, eliminating the need for expert ophthalmologists and permitting earlier treatment to prevent severe visual handicaps.

Despite such promising findings, there are limitations. Although the dataset used for this study is adequate, additional balanced and representative samples should be added to further enhance the generalizability of the model. Moreover, despite the high accuracy of DL models, the low interpretability of these models may limit their acceptance in clinical applications. Overcoming these limitations is essential to enabling AI-based DR detection to make it to the clinic.

Further improvements in model performance and stability can be achieved by optimizing the preprocessing techniques and integrating domain-specific optimizations for retinal image processing. In addition, optimizing balancing techniques, particularly for the severity levels of underrepresented DR, would result in increased classification reliability. Future studies should also tackle the creation of hybrid architectures that take advantage of convolutional neural networks together with attention mechanisms to further improve feature extraction efficiency.

Validation of these results in medical centers is the next significant benchmark. The coordination of the hospital and the clinical center will facilitate easy testing of the performance in different sets of patient groups and image states. An AI usability interface will also help to master computerized DR screening tools in routine general medical consultations. With rigorous testing and integration into clinical practice, AI-driven DR detection has the potential to transform early diagnosis and treatment, eventually stopping irreversible vision loss in diabetic patients.

Author Contributions

Conceptualization, M.S.H., A.H. and M.A.; Methodology, M.S.H. and A.H.; Software, M.S.H. and A.H.; Validation, M.S.H., A.H. and M.A.; Formal analysis, S.A.A. and E.A.; Investigation, E.A.; Resources, S.A.A. and M.A.; Data curation, S.A.A., M.A. and E.A.; Writing—original draft, S.A.A., M.A. and E.A.; Writing—review and editing, A.H.; Visualization, S.A.A. and E.A.; Supervision, M.S.H.; Project administration, M.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available at https://www.kaggle.com/datasets/sachinkumar413/diabetic-retinopathy-dataset (accessed on 23 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Nguyen, Q.H.; Muthuraman, R.; Singh, L.; Sen, G.; Tran, A.C.; Nguyen, B.P.; Chua, M. Diabetic Retinopathy Detection Using Deep Learning. In Proceedings of the 4th International Conference on Machine Learning and Soft Computing, Haiphong, Vietnam, 17–19 January 2020; pp. 103–107. [Google Scholar] [CrossRef]
Joshi, S.; Kumar, R.; Rai, P.K.; Garg, S. Diabetic Retinopathy Using Deep Learning. In Proceedings of the 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 28–30 April 2023; pp. 145–149. [Google Scholar] [CrossRef]
Wahab Sait, A.R. A Lightweight Diabetic Retinopathy Detection Model Using a Deep-Learning Technique. Diagnostics 2023, 13, 3120. [Google Scholar] [CrossRef] [PubMed]
Alwakid, G.; Gouda, W.; Humayun, M. Deep Learning-Based Prediction of Diabetic Retinopathy Using CLAHE and ESRGAN for Enhancement. Healthcare 2023, 11, 863. [Google Scholar] [CrossRef] [PubMed]
Ten Dam, W.; Grol, M.; Zeegers, Z.; Dehghani, A.; Aldewereld, H. Representative Data Generation of Diabetic Retinopathy Synthetic Retinal Images. In Proceedings of the 2023 Conference on Human Centered Artificial Intelligence: Education and Practice, Dublin, Ireland, 14–15 December 2023; pp. 9–15. [Google Scholar] [CrossRef]
Xu, K.; Feng, D.; Mi, H. Deep Convolutional Neural Network-Based Early Automated Detection of Diabetic Retinopathy Using Fundus Image. Molecules 2017, 22, 2054. [Google Scholar] [CrossRef]
Gargeya, R.; Leng, T. Automated Identification of Diabetic Retinopathy Using Deep Learning. Ophthalmology 2017, 124, 962–969. [Google Scholar] [CrossRef]
Jimenez-Baez, M.V.; Márquez-González, H.; Barcenas-Contreras, R.; Morales Montoya, C.; García, L.F. Early diagnosis of diabetic retinopathy in primary care. Colomb. Médica 2015, 46, 14–18. [Google Scholar] [CrossRef]
García, G.; Gallardo, J.; Mauricio, A.; López, J.; Carpio, C.D. Detection of Diabetic Retinopathy Based on a Convolutional Neural Network Using Retinal Fundus Images. In Proceedings of the Artificial Neural Networks and Machine Learning —ICANN 2017—26th International Conference on Artificial Neural Networks, Alghero, Italy, 11–14 September 2017; Proceedings, Part II. Lecture Notes in Computer Science. Lintas, A., Rovetta, S., Verschure, P.F.M.J., Villa, A.E.P., Eds.; Springer: Cham, Switzerland, 2017; Volume 10614, pp. 635–642. [Google Scholar] [CrossRef]
Firke, S.N.; Jain, R.B. Convolutional Neural Network for Diabetic Retinopathy Detection. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 549–553. [Google Scholar] [CrossRef]
Rashid, H.; Mohsin Abdulazeez, A.; Hasan, D. Detection of Diabetic Retinopathy Based on Convolutional Neural Networks: A Review. Asian J. Res. Comput. Sci. 2021, 8, 1–15. [Google Scholar] [CrossRef]
Hernández, C.; Porta, M.; Bandello, F.; Grauslund, J.; Harding, S.P.; Aldington, S.J.; Egan, C.; Frydkjaer-Olsen, U.; García-Arumí, J.; Gibson, J.; et al. The Usefulness of Serum Biomarkers in the Early Stages of Diabetic Retinopathy: Results of the EUROCONDOR Clinical Trial. J. Clin. Med. 2020, 9, 1233. [Google Scholar] [CrossRef]
Waheed, N.K.; Rosen, R.B.; Jia, Y.; Munk, M.R.; Huang, D.; Fawzi, A.; Chong, V.; Nguyen, Q.D.; Sepah, Y.; Pearce, E. Optical coherence tomography angiography in diabetic retinopathy. Prog. Retin. Eye Res. 2023, 97, 101206. [Google Scholar] [CrossRef]
Abràmoff, M.D.; Reinhardt, J.M.; Russell, S.R.; Folk, J.C.; Mahajan, V.B.; Niemeijer, M.; Quellec, G. Automated Early Detection of Diabetic Retinopathy. Ophthalmology 2010, 117, 1147–1154. [Google Scholar] [CrossRef]
Padmapriya, M.; Pasupathy, S.; Punitha, V. Early diagnosis of diabetic retinopathy using unsupervised learning. Soft Comput. 2023, 27, 9093–9104. [Google Scholar] [CrossRef]
Wang, Z.; Li, Z.; Li, K.; Mu, S.; Zhou, X.; Di, Y. Performance of artificial intelligence in diabetic retinopathy screening: A systematic review and meta-analysis of prospective studies. Front. Endocrinol. 2023, 14, 1197783. [Google Scholar] [CrossRef] [PubMed]
Barakat, A.A.; Mobarak, O.; Javaid, H.A.; Awad, M.R.; Hamweyah, K.; Ouban, A.; Al-Hazzaa, S.A.F. The application of artificial intelligence in diabetic retinopathy screening: A Saudi Arabian perspective. Front. Med. 2023, 10, 1303300. [Google Scholar] [CrossRef] [PubMed]
Poly, T.N.; Islam, M.M.; Walther, B.A.; Lin, M.C.; Li, Y.J. Artificial intelligence in diabetic retinopathy: Bibliometric analysis. Comput. Methods Programs Biomed. 2023, 231, 107358. [Google Scholar] [CrossRef] [PubMed]
Noriega, A.; Meizner, D.; Camacho, D.; Enciso, J.; Quiroz-Mercado, H.; Morales-Canton, V.; Almaatouq, A.; Pentland, A. Screening Diabetic Retinopathy Using an Automated Retinal Image Analysis System in Mexico: Independent and Assistive use Cases. medRxiv 2020. [Google Scholar] [CrossRef]
Das, D.; Das, S.; Biswas, S.K.; Purkayastha, B. Deep Diabetic Retinopathy Feature eXtraction and Random Forest based ensemble Classification System (DDRFXRFCS). In Proceedings of the 2021 Asian Conference on Innovation in Technology (ASIANCON), Pune, India, 27–29 August 2021; pp. 1–7. [Google Scholar] [CrossRef]
Uppamma, P.; Bhattacharya, S. A multidomain bio-inspired feature extraction and selection model for diabetic retinopathy severity classification: An ensemble learning approach. Sci. Rep. 2023, 13, 18572. [Google Scholar] [CrossRef]
Usman, T.M.; Saheed, Y.K.; Ignace, D.; Nsang, A. Diabetic retinopathy detection using principal component analysis multi-label feature extraction and classification. Int. J. Cogn. Comput. Eng. 2023, 4, 78–88. [Google Scholar] [CrossRef]
de Sousa, T.F.; Camilo, C.G. HDeep: Hierarchical Deep Learning Combination for Detection of Diabetic Retinopathy. Procedia Comput. Sci. 2023, 222, 425–434. [Google Scholar] [CrossRef]
Tajudin, N.M.A.; Kipli, K.; Mahmood, M.H.; Lim, L.T.; Awang Mat, D.A.; Sapawi, R.; Sahari, S.K.; Lias, K.; Jali, S.K.; Hoque, M.E. Deep learning in the grading of diabetic retinopathy: A review. IET Comput. Vis. 2022, 16, 667–682. [Google Scholar] [CrossRef]
Shen, Z.; Wu, Q.; Wang, Z.; Chen, G.; Lin, B. Diabetic Retinopathy Prediction by Ensemble Learning Based on Biochemical and Physical Data. Sensors 2021, 21, 3663. [Google Scholar] [CrossRef]
Reddy, G.T.; Bhattacharya, S.; Siva Ramakrishnan, S.; Chowdhary, C.L.; Hakak, S.; Kaluri, R.; Praveen Kumar Reddy, M. An Ensemble based Machine Learning model for Diabetic Retinopathy Classification. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–6. [Google Scholar] [CrossRef]
Quellec, G.; Al Hajj, H.; Lamard, M.; Conze, P.H.; Massin, P.; Cochener, B. ExplAIn: Explanatory artificial intelligence for diabetic retinopathy diagnosis. Med. Image Anal. 2021, 72, 102118. [Google Scholar] [CrossRef]
Obayya, M.; Nemri, N.; Nour, M.K.; Al Duhayyim, M.; Mohsen, H.; Rizwanullah, M.; Sarwar Zamani, A.; Motwakel, A. Explainable Artificial Intelligence Enabled TeleOphthalmology for Diabetic Retinopathy Grading and Classification. Appl. Sci. 2022, 12, 8749. [Google Scholar] [CrossRef]
Shorfuzzaman, M.; Hossain, M.S.; El Saddik, A. An Explainable Deep Learning Ensemble Model for Robust Diagnosis of Diabetic Retinopathy Grading. ACM Trans. Multimed. Comput. Commun. Appl. 2021, 17, 1–14. [Google Scholar] [CrossRef]
Lin, W.C.; Chen, J.S.; Chiang, M.F.; Hribar, M.R. Applications of Artificial Intelligence to Electronic Health Record Data in Ophthalmology. Transl. Vis. Sci. Technol. 2020, 9, 13. [Google Scholar] [CrossRef]
Gupta, S.; Panwar, A.; Kapruwan, A.; Chaube, N.; Chauhan, M. Real Time Analysis of Diabetic Retinopathy Lesions by Employing Deep Learning and Machine Learning Algorithms using Color Fundus Data. In Proceedings of the 2022 International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India, 12–13 February 2022; pp. 1–5. [Google Scholar] [CrossRef]
Ruamviboonsuk, P.; Tiwari, R.T.; Sayres, R.; Nganthavee, V.; Hemarat, K.; Kongprayoon, A.; Raman, R.; Levinstein, B.; Liu, Y.; Schaekermann, M.; et al. Real-time diabetic retinopathy screening by deep learning in a multisite national screening programme: A prospective interventional cohort study. Lancet Digit. Health 2022, 4, E235–E244. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Kumar, S. Diabetic Retinopathy Dataset. 2023. Available online: https://www.kaggle.com/datasets/way2tutorials/diabetic-retinopathy-dataset-2023 (accessed on 23 November 2024).
Hana, F.; Maulida, I. Analysis of contrast limited adaptive histogram equalization (CLAHE) parameters on finger knuckle print identification. J. Phys. Conf. Ser. 2021, 1764, 012049. [Google Scholar] [CrossRef]
Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef]
Mustapha, A.; Mohamed, L.; Ali, K. Comparative study of optimization techniques in deep learning: Application in the ophthalmology field. J. Phys. Conf. Ser. 2021, 1743, 012002. [Google Scholar] [CrossRef]

Figure 1. Basic architecture of XceptionNet: (a) Inception module, (b) XceptionNet module [33].

Figure 2. Inception-ResNet V2 architecture [34].

Figure 3. DenseNet architecture [35].

Figure 4. ROC AUC curves for the three classes #1, #2, and #3 with Xception architecture.

Figure 5. ROC AUC curves for the three classes with Inception-ResNetV2 architecture, #1, #2, and #3.

Figure 6. ROC AUC curves for the three classes with DenseNet architecture, i.e., #1, #2, and #3.

Table 1. Xception, Inception-ResNet V2, and DenseNet setups.

Case	Learning Rate	Dropout Layer	Input Resolution	Optimizer
(a)	0.001	0.4	(299, 299, 3), (299, 299, 3), (224, 224, 3)	Adam
(b)	0.001	0.4	(299, 299, 3), (299, 299, 3), (224, 224, 3)	Adagrad
(c)	0.001	0.4	(299, 299, 3), (299, 299, 3), (224, 224, 3)	SGD

Table 2. Comparative study.

Model	Case	Specificity	Sensitivity	AUC ROC
Xception	(a)	#0: 0.88	#0: 0.33	#0: 1
		#1: 1.54	#1: 0.24	#1: 0.50
		#2: 0.66	#2: 0.39	#2: 0.50
	(b)	#0: 0.79	#0: 0.35	#0: 1
		#1: 0.85	#1: 0.33	#1: 0.50
		#2: 1.10	#2: 0.24	#2: 0.50
	(c)	#0: 1.0	#0: 0.33	#0: 0
		#1: 0.9	#1: 0.45	#1: 0
		#2: 1.51	#2: 0.29	#2: 1
Inception-Resnetv2	(a)	#0: 2.5	#0: 0.18	#0: 0.50
		#1: 0.36	#1: 0.50	#1: 0
		#2: 1.13	#2: 0.26	#2: 0.50
	(b)	#0: 0.96	#0: 0.38	#0: 0
		#1: 1.02	#1: 0.38	#1: 1
		#2: 1.30	#2: 0.29	#2: 0.5
	(c)	#0: 0.90	#0: 0.39	#0: 0
		#1: 1.12	#1: 0.32	#1: 0
		#2: 1.12	#2: 0.32	#2: 0
DenseNet	(a)	#0: 0.88	#0: 0.41	#0: 0
		#1: 1.33	#1: 0.36	#1: 0
		#2: 1.41	#2: 0.33	#2: 0
	(b)	#0: 0.57	#0: 0.44	#0: 1
		#1: 1.54	#1: 0.25	#1: 0.50
		#2: 1.46	#2: 0.31	#2: 1
	(c)	#0: 0.46	#0: 0.47	#0: 1
		#1: 1.51	#1: 0.20	#1: 0
		#2: 1.39	#2: 0.26	#2: 1

Table 3. Statistical significance results comparing DenseNet201 with Xception and Inception-ResNet V2.

Statistical Test	p-Value
t-test (Xception vs. DenseNet201)	0.02
t-test (Inception-ResNet V2 vs. DenseNet201)	0.01
Wilcoxon test (Xception vs. DenseNet201)	0.03
Wilcoxon test (Inception-ResNet V2 vs. DenseNet201)	0.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sassi Hidri, M.; Hidri, A.; Alsaif, S.A.; Alahmari, M.; AlShehri, E. Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy. Information 2025, 16, 221. https://doi.org/10.3390/info16030221

AMA Style

Sassi Hidri M, Hidri A, Alsaif SA, Alahmari M, AlShehri E. Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy. Information. 2025; 16(3):221. https://doi.org/10.3390/info16030221

Chicago/Turabian Style

Sassi Hidri, Minyar, Adel Hidri, Suleiman Ali Alsaif, Muteeb Alahmari, and Eman AlShehri. 2025. "Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy" Information 16, no. 3: 221. https://doi.org/10.3390/info16030221

APA Style

Sassi Hidri, M., Hidri, A., Alsaif, S. A., Alahmari, M., & AlShehri, E. (2025). Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy. Information, 16(3), 221. https://doi.org/10.3390/info16030221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy

Abstract

1. Introduction

2. AI at the Service of Early DR Detection: Issues, Challenges, and Approaches

3. Deep ConvNet for DR Staging and Detecting

3.1. Model Architectures

3.2. Datasets

3.3. Data Preprocessing

3.4. Improving Model Performance

4. Simulation Results and Discussion

4.1. Testing and Evaluation

4.2. Results

5. Conclusions and Limitations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI