Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy

Mostafa, Fahad; Khan, Hafiz; Farhana, Fardous; Miah, Md Ariful Haque

doi:10.3390/appliedmath5010011

Open AccessArticle

Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy

by

Fahad Mostafa

^1,2,*

,

Hafiz Khan

³,

Fardous Farhana

⁴ and

Md Ariful Haque Miah

⁵

¹

Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA

²

Biostatistics and Analytics Core, ACCORDS, University of Colorado School of Medicine, Aurora, CO 80045, USA

³

Department of Public Health, Julia Jones Matthews School of Population and Public Health, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA

⁴

Nutritional Sciences Department, Texas Tech University, Lubbock, TX 79430, USA

⁵

Department of Industrial Engineering, University of Arkansas, Fayetteville, AR 72701, USA

^*

Author to whom correspondence should be addressed.

AppliedMath 2025, 5(1), 11; https://doi.org/10.3390/appliedmath5010011

Submission received: 11 November 2024 / Revised: 5 January 2025 / Accepted: 23 January 2025 / Published: 5 February 2025

(This article belongs to the Special Issue Optimization and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Diabetic retinopathy (DR) is a severe microvascular complication of diabetes that affects the eyes, leading to progressive damage to the retina and potential vision loss. Timely intervention and detection are crucial for preventing irreversible damage. With the advancement of technology, deep learning (DL) has emerged as a powerful tool in medical diagnostics, offering a promising solution for the early prediction of DR. This study compares four convolutional neural network architectures, DenseNet201, ResNet50, VGG19, and MobileNetV2, for predicting DR. The evaluation is based on both accuracy and training time data. MobileNetV2 outperforms other models, with a validation accuracy of 78.22%, and ResNet50 has the shortest training time (15.37 s). These findings emphasize the trade-off between model accuracy and computational efficiency, stressing MobileNetV2’s potential applicability for DR prediction due to its balance of high accuracy and a reasonable training time. Performing a 5-fold cross-validation with 100 repetitions, the ensemble of MobileNetV2 and a Graph Convolution Network exhibits a validation accuracy of 82.5%, significantly outperforming MobileNetV2 alone, which shows a 5-fold validation accuracy of 77.4%. This superior performance is further validated by the area under the receiver operating characteristic curve (ROC) metric, demonstrating the enhanced capability of the ensemble method in accurately detecting diabetic retinopathy. This suggests its competence in effectively classifying data and highlights its robustness across multiple validation scenarios. Moreover, the proposed clustering approach can find damaged locations in the retina using the developed Isolate Regions of Interest method, which achieves almost a 90% accuracy. These findings are useful for researchers and healthcare practitioners looking to investigate efficient and effective powerful models for predictive analytics to diagnose diabetic retinopathy.

Keywords:

diabetic retinopathy; image denoising; deep learning; cross validation; IRI

1. Introduction

DR is a serious complication of diabetes that can lead to vision impairment or even blindness if not detected and treated early [1,2,3]. Diabetes mellitus (DM), known as diabetes, refers to a category of metabolic disorder characterized by the persistent elevation of blood sugar levels due to the impairment of insulin secretion or insulin action [4] Frequent urination, increased thirst, and increased hunger are all common symptoms of DM [5]. Diabetes, if left untreated, can lead to a variety of consequences. DR is a disorder in which excessive glucose levels in the bloodstream damage the small blood vessels in the retina, the rear part of the eye [6]. Diabetes is a leading cause of blindness in working-age Americans, affecting 7.7 million Americans today, and this figure is expected to rise to more than 14.6 million by 2030 [7,8]. According to the CDC [9], in 2021, an estimated 9.6 million people in the U.S., spanning all age groups, lived with DR, and 1.84 million of them experienced vision-threatening DR. Prevalence varied by age, with the lowest rate (13.0%) in people under 25 and the highest (28.4%) in the 65–79 age group. For those aged 40 and above, 8.94 million had DR, and 1.71 million had vision-threatening DR. Non-Hispanic black people showed the highest prevalence rates (DR: 3.26%, vision-threatening DR: 1.11%). Males had higher rates than females (DR: 0.64% vs. 0.47%, vision-threatening DR: 2.74% vs. 1.94%). DR prevalence differed at the state and county levels; for example, it ranged from 20.8% in Nevada to 31.3% in Massachusetts among people with diabetes. Diabetes raises the risk of DR, underlining the necessity of early detection and prevention. Diabetes also increases the risk of various eye problems such as glaucoma and cataracts. DR [10] is characterized by changes in the retina’s blood vessels, which can lead to swelling, leakage, or the abnormal growth of blood vessels. In its early stages, DR may not present noticeable symptoms, making regular eye examinations imperative for those with diabetes [11]. DR progression varies among individuals due to a mix of non-modifiable factors like genetics and modifiable factors such as glycemic control [12]. Effective lifestyle management and diabetic care can mitigate risks [2,8]. Tailored interventions, regular monitoring, and patient education are vital in addressing the diverse factors influencing DR progression [13]. Unfortunately, the global burden of diabetes has been rising, emphasizing the need for cost-effective and scalable methods for early DR detection [14]. Managing DR is difficult due to the lack of a customized risk model for the accurate prediction of disease onset and progression. The development of such a model could potentially transform DR management by providing precise risk assessments and predicting disease progression timelines. This improvement could significantly improve the efficacy of DR screening programs. Furthermore, with a tailored risk model, healthcare professionals could assign more intensive management measures to those at a higher risk, increasing the likelihood of avoiding or halting the onset of DR more frequently. VEHSS used Bayesian meta-regression to estimate the prevalence of DR, utilizing 2005–2008 NHANES retinal imaging data as an example [9,15]. The diabetic population was identified from 2017–2020 NHANES, and additional data sources provided insights into DR prevalence by stage and demographics, single-year age trends, regional variations, and the underlying U.S. population at risk in 2021. Detailed methods are described in Lundeen et al., 2023 [16]. Nowadays, DL, a subset of artificial intelligence (AI) [17], has demonstrated remarkable capabilities across various domains, including healthcare [18]. DL algorithms, particularly convolutional neural networks (CNNs), have proven effective in image recognition tasks. In the context of DR, DL models can analyze retinal images and identify subtle abnormalities that may go unnoticed by the human eye. The benefits of employing DL for predicting DR are multifaceted [19]. Moreover, in the related work section, we will discuss more about the recent advancements in ML/DL-based predictions of DR. Despite significant improvements in DR diagnosis, there are many gaps still remaining. In this paper, we introduce the following contributions to DR prediction using retinal scans:

a.: Our study aims to significantly enhance retinal image analysis by contrasting four state-of-the-art deep-learning models (DenseNet201, Res-Net50, VGG19, and MobileNetV2) in detecting DR. These models are evaluated to create an early detection method that is both effective and accurate, potentially revolutionizing early diagnostics practices. The comparative analysis provides insights into the strengths and limitations of each model, guiding the selection of the most appropriate model for clinical implementation.
b.: The study systematically compares the performances and training speeds of several models using the APTOS 2019 Blindness Detection dataset. The findings reveal that MobileNetV2 is the most suitable for practical implementation, owing to its high validation accuracy coupled with its relatively low computational resource demands. This model’s efficiency makes it accessible for use in resource-constrained environments, potentially extending the benefits of early detection to a wider population.
c.: To further enhance the accuracy, robustness, and generalizability of the models, an ensemble method is employed. The study utilizes 5-fold cross-validation with 100 repeats, reinforcing the reliability of the results and ensuring that the models can perform well across diverse datasets. This ensemble approach combines the strengths of multiple models, resulting in an improved performance and reduced variance, which are critical for reliable clinical applications.
d.: In its novel application of IRIs, the study employs clustering methods to statistically classify retinal lesions with a maximum accuracy and identify damaged regions of the retina. These research findings underscore the potential of selecting computationally efficient and accurate CNN architectures for large-scale DR screening, thereby bolstering confidence in clinical decisions and contributing to the prevention of vision loss in diabetic patients. Additionally, the use of clustering methods offers a robust way to segment and analyze retinal images, paving the way for more precise and targeted interventions.
e.: This research contributes to the development of a scalable and efficient screening tool that can be easily integrated into existing healthcare systems. By prioritizing models that balance accuracy and computational efficiency, the study supports the creation of tools that are both effective and practical for widespread use. This approach not only enhances early detection capabilities but also helps to optimize resource allocation in healthcare settings.

Overall, DL models excel at early detection, adeptly discerning subtle changes in retinal images that enable the identification of DR in its nascent stages, ensuring that interventions are effective (Figure A1). Additionally, the scalability of DL algorithms is instrumental, as they can rapidly process vast volumes of data, rendering them suitable for large-scale screening programs and population-wide initiatives. Another crucial advantage is the reduction in human errors. By automating the analysis of retinal images, DL mitigates the risk associated with manual assessments, ensuring consistent and reliable diagnostic outcomes. Moreover, DL facilitates resource optimization by allowing models to prioritize patients in need of immediate attention, thereby streamlining healthcare resources and expediting timely interventions.

2. Related Works

Lin and Wu [20] used the Kaggle dataset for DR fundus images, with 35,126 pictures categorized as normal or DR, with the latter further divided into four stages. Here, the ResNet-50 architecture was updated. The SOP for preprocessing was standardized to enhance the images, and the adaptive learning rates and regularization were applied to improve the model’s accuracy and stability. The authors proposed a “revised ResNet-50” model for DR detection. The test and validation accuracies using this modified ResNet-50 were 74.32% and 74.16%, respectively. However, the following three significant limitations were identified in this study: (i) the need for diversity with larger datasets to be showcased, (ii) preprocessing steps potentially causing a loss of detail, and (iii) a lack of real-time access, as the system was only trialed on a Local Area Network [7]. Pratt et al. [21] studied a DR dataset from Kaggle consisting of over 80,000 fundus images. They used pre-processing steps for colored normalization and image resizing and a classifier CNN with some NN architectures. The class imbalance and overfitting reduction were handled with the data augmentation and regularization methodologies. Instead of feature-specific detection, the authors tentatively proposed an end-to-end CNN for image classification, achieving a 75% accuracy and 30% sensitivity. Limitations included a low sensitivity (especially for differentiating mild, moderate, and severe cases), ungradable images, and potential quality-dependent classification errors [18]. In this experiment, the Kaggle DR dataset, which contains 1000 fundus images for training and 200 for testing, was used. A CNN with six convolutional layers and one fully connected layer was used in this study. Images were pre-processed through color normalization, resizing, cropping the black border, and focusing on the required features. Sharma et al. [22] suggested a CNN-based model for DR classification, achieving a five-class classification accuracy of 74.04% for different stages of the disease. With a small training set and basic hardware, the accuracy was low. A larger dataset could improve this accuracy, but excellent computing skills were shown on a smaller scale [23]. In another study, researchers used the APTOS 2019 blindness detection dataset, annotated with approximately 3600 fundus images at various DR levels. Dekhil et al. [24] adapted a VGG-16-based convolutional neural network (CNN) architecture, which incorporated five convolutional stages followed by three fully connected layers, in this experiment. Due to a limited DR dataset, they adopted transfer learning with a weight pre-trained on ImageNet. The authors proposed a deep CNN model based on the VGG-16 architecture for DR classification and achieved a 77% test accuracy on the proposed model with a 78% quadratic weighted Kappa score. Pitfalls were biased toward the majority classes, despite using class weighting and transfer learning, as the DR (small task) data did not contain enough examples to train a deep model. The authors noted that larger, more balanced datasets would be better suited for training the model [1]. Gangwar and Ravi [25] used the following two datasets for their experiment: the Messidor–1 dataset and the Kaggle APTOS 2019 blindness detection dataset. They proposed a hybrid model for their study that utilized Inception-ResNet-v2 combined with a custom CNN block for feature extraction purposes. Transfer learning was applied to Inception-ResNet-v2, with ImageNet weights loaded and additional layers to be trained. The architecture of the proposed model was a new hybrid deep model, which the authors called Hybrid Inception-ResNet-v2 with custom CNN Layers. It achieved a 72.33% and 82.18% accuracy on the Messidor-1 and APTOS 2019 datasets, respectively. Some limitations were mentioned, primarily related to dataset size, and it was recommended that a single test be applied to every site in order to improve the generalizability of the model. In a separate study, the authors also suggested applying a Graph Attention Network (GAN) for addressing class imbalance problems through data augmentation, but not as an individual study [26]. The first public EyePACS DR dataset, consisting of 35,126 fundus images from Kaggle, were used to assess various types of edge neural network classes based on DL networks for DR classification. A detailed comparative analysis was performed to compare 26 standard techniques against different pre-trained deep learning networks such as ResNet50, Inception V3, EfficientNet, etc. The authors referred to the model they put forward as DR Feature Extraction and Classification (DRFEC), this was related to the data used to test various DL models for DR classification. With DRFEC, the highest validation accuracy was 79.11%, as shown by the EfficientNetB4 model. The limitations included excessive overfitting to specific models, imbalanced data in the DR dataset, and a very limited generalizability between datasets. The authors mentioned that datasets must be more apparent and well-balanced for better generalization [2]. Lam et al. [27] used the Kaggle DR dataset of nearly 35,000 images and the Messidor-1 dataset of 1200 images with pixel-wise annotations for different stages of DR. The authors used CNNs for transfer learning (models such as GoogLeNet and AlexNet). CLAHE (Contrast Limited Adaptive Histogram Equalization) and data augmentation were applied to improve picture contrast and early-stage disease identification. Modified GoogLeNet and AlexNet were used for the multistage classification of DR. Binary classification (normal vs. abnormal) resulted in a peak test accuracy of 74.5% and moderate multi-class classification accuracies (a lower accuracy as the number of classes increased). The study also faced obstacles, including features related to mild diabetic retinopathy, which can be hard to identify, as there are no apparent lesions at this early stage. Higher-fidelity datasets are needed to increase the ability to discriminate in these early stages [28]. Muthusamy and Palani [29,30] developed the MAPCRCI-DMPLC model using a scaled DR dataset from Kaggle. The scientists used MAP-estimated local region filtering to remove noise and improve image quality, increasing the peak signal-to-noise ratio. The researchers used Camargo’s Index for ROI extraction, which helped them to identify contaminated locations. Using the attributes that were collected, the swish activation function in the output layer correctly put photos into groups based on how bad the DR was—normal, mild, moderate, severe, or proliferative. Their model scored 94.28%, surpassing ours. However, the datasets were different, resulting in differing accuracy outcomes. Model generalizability and time complexity constrain their work [29]. In a new study, they used the same Kaggle dataset that was scaled for diabetic retinopathy and added a new model called LN-SDCTC (Luminosity Normalized Symmetric Deep Convolute Tubular Classifier). The brightness-normalized retinal color fundus model preprocessed the images to reduce noise and enhance contrast. To find features that were unique to each lesion, like microaneurysms and hemorrhages, they used a symmetric deep convolute tubular neighborhood classifier, convolutional layers, average pooling, and max pooling. They used multinomial logistic regression to classify diabetic retinopathy as normal, mild, moderate, severe, or proliferative. The proposed model had a 93.75% sensitivity, 90.23% specificity, and a higher peak signal-to-noise ratio than previous methods. Divergent datasets caused inconsistent outcomes compared to our model. Their approach needs improvement for validation on a diverse dataset and real-time clinical use [30] Hamza [31] found that a CNN with a 97.2% accuracy outperformed other machine learning models using augmented datasets from APTOS 2019, MESSIDOR, IDRiD, and DIARETDB1. He used CNN comparison with random forest, SVM, gradient boosting, and K-nearest neighbors, as well as data augmentation, preprocessing, and F1-score metrics for accuracy, precision, and recall. Overfitting, class imbalances, and interpretability issues limited this work [31]. In 2024, Navaneethan and Devarajan [32] used the Fundus Image Dataset from Zhao et al. [33]. The MGA-CSG model achieved a 98.8% accuracy. Their strategy included preprocessing, CNN-based feature extraction, GAN-based augmentation, and optimization. They tested the suggested model on a single dataset, potentially limiting its generalizability. Despite optimization, computing costs limited their efforts [32,33].

3. Data Preparation

3.1. Data Overview

The success of DL models in predicting DR depends on the availability of high-quality and diverse datasets. These datasets typically consist of thousands of annotated retinal images, allowing the model to learn and generalize the patterns associated with different stages of DR. Our dataset comprises retinal scan images subjected to Gaussian filtering for diabetic retinopathy detection. These images, sourced from the APTOS 2019 Blindness Detection dataset [34], were resized to

224 \times 224

pixels to facilitate compatibility with various pre-trained deep learning models. The dataset is organized based on the severity or stage of DR, as outlined in the accompanying train.csv file [34]. There are five distinct directories representing different severity levels, as follows: 0—no DR, 1—mild, 2—moderate, 3—severe, and 4—proliferative DR. This categorization enables a stratified analysis, aiding in the development and evaluation of predictive models for DR at various stages. Regarding the number of images categorized by DR levels, there were 1805 images in the “no DR” class, indicating the absence of DR. Mild DR was represented by 370 images. Figure 1 shows a sample image from original data file [34], image has been shown in two separate color to identify some retinal lesions clearly.

3.2. Data Pre-Processing

Denoising [26] DR image data is important for DL classification, because it improves the quality of input information, allowing the network to focus on essential features rather than noise [26]. Noise in eye scan images can introduce unnecessary features, making it difficult for various CNN approaches to recognize important patterns and appropriately identify objects [23]. Mathematically, denoising in the context of CNN classification involves the application of a denoising function, typically represented as

D (X)

, where

X

is the input image data. The denoising function aims to remove unwanted noise and artifacts, producing a cleaner version of the image, denoted as

\hat{X}

. This process can be expressed as follows:

\hat{X} = D (X) .

(1)

here,

D (\cdot)

represents the denoising operation and

\hat{X}

is the denoised image data used as input for the CNN classification. The denoising function in Equation (1) plays a critical role in improving the signal quality, allowing the CNN to focus on relevant features and patterns during the DR classification process. Figure 2 shows an example from our dataset.

3.3. Training and Validation

To build a robust and representative deep learning model for predicting DR severity, we employed a simple random sampling approach to partition our dataset into training and validation sets [28]. The dataset was randomly divided, with

80 %

of the images allocated to the training set and the remaining

20 %

to the validation set [35]. This method ensured that both sets maintained a diverse representation of images from each severity class (0—no DR, 1—mild, 2—moderate, 3—severe, and 4—proliferative DR), allowing the model to learn effectively from a varied range of cases. The use of an 80–20 split struck a balance between training on a sufficiently large dataset and validating on an independent subset, promoting the generalization of the DL model to unseen data while minimizing the risk of overfitting. In Figure 3, the blue bars indicate the distribution in the training set and the orange bars represent the validation set. The graphic compares the numbers of photos in the two subsets at different DR levels.

4. Proposed Models

The training process involves feeding the DL model with labeled retinal images, enabling it to learn the features indicative of DR. The model is then validated on a separate set of images to ensure its ability to generalize to new data [35]. Continuous refinement and optimization are performed to enhance the model’s accuracy and sensitivity. Various CNN [8] architectures have been employed for DR classification, each bringing its own strengths to the task in the next section. Models like DenseNet VGG19, ResNet, and MobileNetV2 have demonstrated effectiveness in capturing intricate features within retinal images, enabling the accurate identification of DR stages. MobileNetV2, known for its lightweight design, is suitable for resource-constrained environments, making it advantageous for deployment on mobile and edge devices. Transfer learning techniques are commonly applied, leveraging pre-trained models on large datasets to enhance the performance of DR classification with limited labeled medical images. Ensemble methods, combining predictions from multiple CNNs, further contribute to robust and reliable classification results. The choice of CNN architecture depends on factors such as computational efficiency, model interpretability, and the specific requirements of the healthcare application. In this paper, different CNNs are applied to improve the accuracy and efficiency of diabetic retinopathy classification systems, as shown in Figure 4 below. We train four deep learning models. The mathematical formulation is discussed below.

4.1. DenseNet201

DenseNet (Densely Connected Convolutional Networks) 201 [36] is a neural network architecture that facilitates the maximum information flow between layers by densely connecting each layer to every other layer. The mathematical formulation of DenseNet involves the concept of dense blocks and transition layers, as shown in Equation (2). Let us denote

H_{l}

as the feature maps at layer

l, k

as the growth rate (the number of feature maps produced by each layer in a dense block), and

L

as the total number of layers in the network.

Dense Block: The output of the

l

-th layer in a dense block, denoted as

H_{l}

, is given by the following:

H_{l} = H_{l - 1} \oplus f_{l} ([H_{0}, H_{1}, \dots, H_{l - 1}]),

(2)

where:

$\oplus$ denotes the concatenation operation,
$f_{l}$ is a composite function typically consisting of batch normalization, ReLU activation, and a convolution operation with a kernel size of $3 \times 3$ ,
$[H_{0}, H_{1}, \dots, H_{l - 1}]$ represents the concatenation of feature maps from all preceding layers up to the current layer.

Transition Layer: After each dense block, a transition layer is introduced to down-sample the spatial dimensions and reduce the number of feature maps. The output of the

l

-th layer in a transition block, denoted as

H_{l}^{'}

, is given by the following:

H_{l}^{'} = f_{l}^{'} (H_{l}),

(3)

where:

$f_{l}^{’}$ typically involves batch normalization, $1 \times 1$ convolution, and average pooling. The DenseNet architecture consists of multiple dense blocks interleaved with transition layers, as shown in Equation (3). The overall output of the network, $H_{L}$ , is then fed into a global average pooling layer and a fully connected layer for final classification. In summary, for a given input retinal scan $\hat{X}$ , DenseNet transforms it through a series of dense blocks and transition layers, capturing intricate features at different scales.

Final Output of the Network:

H_{L} = FinalTransition (H_{final}) .

(4)

Global Average Pooling and Fully Connected Layer:

O = Global Avg . Pooling (H_{L}) .

(5)

Classification Output = Fully Connected (O) .

(6)

The specific definitions of Transition, Final Transition, Global Avg. Pooling, and Fully Connected in Equations (4) and (5) depend on the exact architecture implementation and hyperparameters chosen for the DenseNet201 model. The network’s final output in Equation (4),

H_{L}

, encapsulates the learned representations, which are then utilized for DR classification tasks after passing through global average pooling and a fully connected layer.

4.2. ResNet50

ResNet50 [37] is a convolutional neural network architecture consisting of 50 layers. It employs a series of stacked residual blocks with down-sampling layers for spatial dimension reduction. Its architecture includes skip connections to enable the direct flow of information between inputs and outputs across multiple layers, mitigating the risk of vanishing gradients. Each layer consists of convolutional operations, batch normalization, Rectified Linear Unit (ReLU) activations, and global average pooling, ultimately leading to a SoftMax layer for classification in image recognition tasks.

Residual Block: A residual block in a neural network, denoted as

F (\hat{X})

, is defined mathematically as follows:

F (\hat{X}) = W_{2} σ (W_{1} \hat{X} + b_{1}) + b_{2} + \hat{X} .

(7)

here:

$W_{1}$ and $W_{2}$ are weight matrices.
$b_{1}$ and $b_{2}$ are bias terms.

The bias terms

b_{1}

and

b_{2}

in the ResNet50 architecture play crucial roles in enhancing the network’s learning capabilities. They are added to the weighted input before the activation function, allowing the model to adjust neuron activation thresholds and better fit the training data. This adjustment improves the learning process by providing additional flexibility. Similarly,

b_{2}

is added after the activation and second weighting step, further refining the output of the residual block. This final adjustment ensures that the model can fine-tune its predictions more precisely. Together,

b_{1}

and

b_{2}

contribute significantly to the effective training of deep neural networks by maintaining gradient flow during backpropagation, mitigating vanishing gradient issues, and enhancing the model’s ability to identify complex patterns in data, which is vital for tasks such as diabetic retinopathy classification.

$σ$ is the activation function, and here we use ReLU. This formulation in Equation (7) enables the training of very deep neural networks by addressing the vanishing gradient problem in DR classification.

4.3. VGG19

VGG19, a convolutional neural network architecture, is characterized by its deep structure with 19 layers, predominantly comprising convolutional and pooling layers [27,38].

\hat{X}

denotes the input image and

W_{i j}^{(l)}

represents the weights of the convolutional filters at layer

l

. The convolutional operation at layer

l

can be expressed as follows:

Z_{i j}^{(l)} = \sum_{a = 1}^{n^{(l - 1)}} \sum_{b = 1}^{n^{(l - 1)}} W_{a b}^{(l)} \cdot X_{i + a - 1, j + b - 1}^{(l - 1)} + b_{i j}^{(l)},

(8)

where

Z_{i j}^{(l)}

is the pre-activation,

X_{i + a - 1, j + b - 1}^{(l - 1)}

is the input at the previous layer, and

b_{i j}^{(l)}

is the bias term in Equation (8). Following the convolution, the Rectified Linear Unit (ReLU) activation function is applied elementwise, as follows:

A_{i j}^{(l)} = \max (0, Z_{i j}^{(l)}) .

(9)

The network also employs max-pooling layers to down-sample spatial dimensions. VGG19’s architecture is characterized by its repeated blocks of two or more convolutional layers followed by a max-pooling layer in Equation (9). The final fully connected layers are responsible for DR classification.

4.4. MobileNetV2

MobileNetV2 is a convolutional neural network architecture designed for efficient and lightweight deep learning applications [39], particularly on mobile and edge devices. The network introduces a novel building block called an inverted residual with a linear bottleneck, which significantly reduces computational cost while preserving representational capacity. Mathematically, given an input tensor

X_{T}

of

\hat{X}

, the inverted residual block consists of a lightweight depthwise separable convolution with a non-linear activation function, followed by a linear bottleneck layer with a

3 \times 3

convolution and linear activation. The output

H

of the block is computed as follows:

H = F_{ReLU} (W_{2} \times F_{ReLU} (W_{1} \times X_{T} + b_{1}) + b_{2}) + X_{T},

(10)

where

W_{1}

and

W_{2}

denote the convolutional weights,

b_{1}

and

b_{2}

are the biases, and

F_{ReLU}

represents the rectified linear unit activation function in Equation (10). This architecture efficiently balances model size and computational efficiency, making it well-suited for resource-constrained environments. MobileNetV2 was applied to achieve a better accuracy in DR classification.

4.5. Ensembling MobileNetV2 and GCN

We aim to improve accuracy by ensembling MobileNetV2 and GCN. Since we have already discussed MobileNetV2, this section will focus on GCN and the formulation of the ensembling technique. For a graph

G = (V, E)

, where

V

is the set of nodes and

E

is the set of edges with an input feature matrix

X

with a size of

N \times F

, where

N

is the number of nodes and

F

is the number of features per node, the output of a single layer of a GCN can be mathematically represented as follows:

1.

Initialization:

$X$ is the input feature matrix.
$W$ is the weight matrix.
$σ$ is the activation function such as ReLU.

2.

GCN Layer Operation:

H = σ (D^{- 1 / 2} A D^{- 1 / 2} X W),

where:

$H$ is the output feature matrix after applying the GCN layer.
$σ$ is an activation function applied elementwise.
$D^{- 1 / 2} A D^{- 1 / 2}$ is a normalized graph Laplacian.

3.

Explanation:

$D^{- 1 / 2} A D^{- 1 / 2}$ is the symmetrically normalized adjacency matrix, which ensures stable training by controlling the scale of the feature vectors during propagation.
Multiplying $X$ by $A$ effectively aggregates features from neighboring nodes based on the graph structure.
$D^{- 1 / 2}$ ensures that the features of nodes with higher degrees are down-weighted, and vice versa, promoting more stable training.
$W$ is a weight matrix that is learned during training.
$σ$ applies an element-wise non-linear activation function to introduce non-linearity into the model.

This operation is typically followed by additional GCN layers or other types of layers (e.g., fully connected layers), depending on the architecture of the network. The final output can be fed into a classifier for tasks like node classification or graph classification.

Training
- Train MobileNetV2 on the image data for diabetic retinopathy classification.
- Train GCN on graph data (if available) or features extracted from data.

Prediction of Individual Models
- Let $M (x)$ be the prediction of MobileNetV2 for input $x$ .
- Let $G (x)$ be the prediction of GCN for input $x$ .

Ensemble Method
- Combine predictions using a weighted average, stacking, or other methods.
- Let $E (x)$ be the ensemble prediction for input $x$ .

Weighted Average Ensemble

E (x) = α M (x) + (1 - α) G (x),

where

α

is the weight assigned to MobileNetV2’s prediction.

Stacking Ensemble

Train a meta-learner (such as logistic regression, random forest, or another neural network) on top of the predictions of MobileNetV2 and GCN to learn the best way to combine them. Optionally, fine-tune the weights of MobileNetV2 and/or GCN during the ensemble training process to further improve performance. Deploy the ensemble model for real-world DR classification tasks.

4.6. Cross Validation

To assess the performance and generalization capabilities of our DR prediction models, we employed 5-fold cross-validation. Specifically, we divided the dataset into five folds, and in each iteration, four folds were used for training the model, while the remaining fold was reserved for validation [35]. We repeated this process five times, ensuring that each fold served as the validation set exactly once. To implement 5-fold cross-validation for DenseNet201, VGG19, ResNet50, and MobileNetV2, we first divided the dataset into five subsets, denoted as

D_{1}, D_{2}, D_{3}, D_{4}, D_{5}

. In each iteration

i

, the model was trained on a combination of four subsets, excluding

D_{i}

, and then validated on

D_{i}

. Mathematically, the training and validation sets for each iteration are defined as follows:

\begin{array}{l} Iteration 1 : & Train on D_{2}, D_{3}, D_{4}, D_{5} & Validate on D_{1} \\ Iteration 2 : & Train on D_{1}, D_{3}, D_{4}, D_{5} & Validate on D_{2} \\ Iteration 3 : & Train on D_{1}, D_{2}, D_{4}, D_{5} & Validate on D_{3} \\ Iteration 4 : & Train on D_{1}, D_{2}, D_{3}, D_{5} & Validate on D_{4} \\ Iteration 5 : & Train on D_{1}, D_{2}, D_{3}, D_{4} & Validate on D_{5} \end{array}

This process was repeated five times, with each subset serving as the validation set exactly once. The final performance metric was computed by averaging the results from all iterations.

4.7. Hyper Parameter Tuning

Hyperparameter tuning entails adjusting the values that the model did not learn during training. Potential hyperparameters to tune for the provided code snippet included the learning rate, batch size, and number of units in the densely connected layers. During optimization, the learning rate affects the step size, which can be modified to obtain the best value for convergence. The batch size, which influences the amount of data processed in each iteration, can affect the training speed and memory utilization. In this work, we chose 32 batch sizes for each model and 100 epochs. Furthermore, the number of units in the dense layers (for example, 128 in the code) could be varied to find a design that balances complexity and efficiency for this DR classification problem. In all the above models, we used consistent values for the hyperparameters, such as a learning rate of 0.001, batch size of 32, and units in the dense layers randomly chosen between 64 and 256, which was explored during hyperparameter tuning. The grid search technique was employed to systematically search the hyperparameter space and improve the model’s overall performance in HPCC of Google Collab Pro.

4.8. Model Diagnosis by Evaluation Matrices

The evaluation metrics for the performance of the DenseNet201, VGG19, ResNet50, and MobileNetV2 techniques in DR classification were calculated using standard classification metrics. Let

T P, T N, F P

, and

F N

represent true positives, true negatives, false positives, and false negatives, respectively. The accuracy

(A c c)

, precision (Prec), recall (Rec), and F1-score (F1) are defined as follows:

A c c = \frac{T P + T N}{T P + T N + F P + F N} .

(11)

Prec = \frac{T P}{T P + F P} .

(12)

Rec = \frac{T P}{T P + F N} .

(13)

F 1 = \frac{2 \cdot Prec \cdot Rec}{Prec + Rec} .

(14)

These metrics in Equations (11)–(14) provide a comprehensive assessment of the models’ performances. They consider both correct and incorrect classifications, with precision emphasizing the accuracy of positive predictions, recall capturing the ability to detect positives, and F1-score balancing precision and recall for a more nuanced evaluation of DR prediction.

4.9. AUC Analysis

AUC values, representing the Area Under the Receiver Operating Characteristic (ROC) curve, provide insights into the performances of the four models—DenseNet201, ResNet50, VGG19, and MobileNetV2—in the context of DR classification. AUC is a metric that evaluates the ability of a model to distinguish between positive and negative instances, with a higher AUC value indicating a better discriminatory power. In our results, the AUC values varied slightly among the models, demonstrating nuanced differences in their performance. The AUC was calculated by integrating the true positive rate (TPR) with respect to the false positive rate (FPR) over the interval

[0, 1]

.

AUC = \int_{0}^{1} TPR ({FPR}^{- 1} (t)) d t .

(15)

here, TPR represents the true positive rate, FPR represents the false positive rate, and

t

varies from 0 to 1 in Equation (15).

4.10. Identification of Isolate Regions of Interest (IRIs) in a Retinal Scan

The primary objective of this section is to develop a robust and efficient method for segmenting retinal images in the context of predicted DR scans by leveraging advanced clustering and active contour models. Specifically, this study aims to enhance the identification and delineation of pathological features, such as lesions and blood vessels, which are indicative of varying severity levels of DR. Moreover, we propose the use of the Modified Gradient Vector Flow (MGVF) active contour model to address the challenge of non-convex regions in DR images, which are difficult to segment using traditional clustering methods. By incorporating gradient information into the MGVF model, the objective is to improve the accuracy of segmenting irregular and complex patterns within retinal images, ultimately enabling the more precise detection of DR severity in a particular location. This approach will contribute to the development of more reliable diagnostic tools for the early detection and classification of DR. Let us discuss image segmentation first, where we can use the DR images detected by the ensemble method discussed above.

In the context of DR, the K-means segmentation technique [40] aids in the identification and isolation of disease-specific regions of interest, such as hemorrhages or exudates. The K-means algorithm combines pixels optimally while limiting intra-cluster variation, allowing abnormal features to be distinguished from normal retinal structures. In the context of diabetic retinopathy image segmentation, K-means clustering can be expressed mathematically in Equation (16), where

I \in \hat{X}

represents the input retinal image and

P

is the set of pixels in the image. The goal is to partition

P

into

K

clusters

(C_{1}, C_{2}, \dots, C_{K})

by minimizing the intra-cluster variance. The objective function for K-means can be written as follows:

J = \sum_{k = 1}^{K} \sum_{i \in C_{k}} ‖ I_{i} - μ_{k} ‖^{2},

(16)

where

I_{i}

is the color or intensity of pixel

i, μ_{k}

is the mean color or intensity of cluster

C_{k}

, and

∥ \cdot ∥

denotes the Euclidean norm. The algorithm iteratively updates the cluster assignments and means until convergence, resulting in a segmentation of the DR image into distinct regions corresponding to different pathologies or structures. This mathematical representation captures the essence of how K-means clustering is utilized for image segmentation in the context of DR. However, DR in retinal images has non-convex clusters that refer to irregular and complex patterns within the images that represent different severity levels or distinct characteristics of the disease [36,41]. When applying clustering techniques to DR images, the goal is to identify and group pixels or regions that correspond to specific pathological features or severity levels. In this case, spectral-clustering-based segmentation [28] may not work well because of non-convex regions. In the context of DR retina scans, active contours [42] are employed to delineate the boundaries of structures such as blood vessels or lesions within the retina. There are many types of methods, and the Modified Gradient Vector Flow (MGVF) model is an enhancement of the traditional snake or active contour models [43]. The MGVF model is designed to enhance the traditional active contour model by incorporating gradient information. In the context of detecting the optic disk boundary for DR and locating the IRIs, the MGVF model minimizes an energy functional consisting of internal and external energy terms, as follows:

E_{total} = E_{internal} + E_{external} .

(17)

The internal energy encourages smoothness in the contour by considering the gradient vector flow, as follows:

E_{internal} = \int_{contour} μ | \nabla F |^{2} d s .

here,

F

is the Gradient Vector Flow field and

μ

is a parameter controlling the influence of the gradient. The external energy attracts the contour toward image features using the Gradient Vector Flow, as follows:

E_{external} = - \int_{contour} g | \nabla G |^{2} d s .

here,

G

is the Gradient Vector Flow field and

g

is a function emphasizing the Gradient Vector Flow gradient. The Gradient Vector Flow fields

F

and

G

are computed by solving partial differential equations (PDEs) derived from the image gradient. The contour is updated by minimizing the total energy functional, as follows:

{argmin}_{contour} E_{total .}

(18)

The gradient descent [44] method is employed for solving the optimization problem. More specifically, when dealing with DR retina scans, adapting the energy terms to capture relevant features in the images is essential for accurate segmentation and to figure out the specific regions in the retina in human eyes where there are effective lesions. Appendix B: Algorithm A1 provides a detailed, step-by-step mathematical framework for identifying diabetic retinopathy using deep learning and clustering techniques.

5. Results

Figure 5 illustrates a model comparison based on the accuracy and training times of the following four different convolutional neural network architectures: DenseNet201, ResNet50, VGG19, and MobileNetV2. In terms of accuracy, MobileNetV2 outperforms the other models, with a validation accuracy of

78.22 %

, followed closely by DenseNet201 at

76.98 %

. ResNet50 and VGG19 achieve lower validation accuracies of

71.02 %

and

71.01 %

, respectively. In terms of training time, ResNet50 exhibits the shortest training time at 15.37 s, followed by VGG19 (19.11 s), MobileNetV2 (21.20 s), and DenseNet201 (43.15 s). The error bars represent confidence intervals, providing a sense of the variability in the measurements. Overall, the figure highlights the trade-off between model accuracy and training time, with MobileNetV2 emerging as a strong contender for a balance between high accuracy and a relatively low training time.

Figure 6 presents a series of images that distinctly categorize the different severity levels of DR. Each directory within the figure is meticulously organized to represent the progression of the condition, ranging from mild to proliferative stages. This visual arrangement allows for a clear comparison and understanding of the morphological changes that occur in the retina as the disease advances.

Table 1 presents the performance metrics for the different models in the classification of DR. Performance metrics such as precision, recall, and F1-score are reported for each model across different DR severity levels (mild, moderate, no DR, proliferative DR, and severe), providing insights into their ability to correctly classify instances of each category. Starting with DenseNet201, the model demonstrates a strong performance in the no DR category, achieving a high precision (0.959) and recall (0.969), resulting in a balanced F1-score (0.956). However, the model struggles with the proliferative DR and severe categories, where both its precision and recall are notably low, leading to a low F1-score

(0.001)

. The Weighted Avg F1-score is 0.758, indicating a moderate overall performance.

In contrast, ResNet50 exhibits a more balanced performance across the different categories. While its precision for the mild category is lower than that of DenseNet201, ResNet50 achieves a higher recall and F1-score for both the moderate and no DR categories. However, the model fails to predict instances in the proliferative DR and severe categories, resulting in an F1-score of 0.001. The Weighted Avg F1-Score for ResNet50 is 0.660, reflecting a lower overall performance compared to DenseNet201. Moving on to VGG19, the model struggles with recall in all categories, particularly in the mild and severe categories, where it achieves 0.101 and 0.001, respectively. This contributes to lower F1-scores across the board, resulting in a Weighted Avg F1-Score of 0.625. VGG19 exhibits a significant drop in performance compared to DenseNet201 and ResNet50. Finally, MobileNetV2 stands out for its higher precision in the mild and proliferative DR categories. It demonstrates a competitive recall in the moderate and no DR categories, resulting in relatively higher F1-scores. The model, however, faces challenges in correctly classifying severe instances, leading to a lower F1-score in that category. The Weighted Avg F1-score for MobileNetV2 is 0.728, positioning it as a strong performer, though it is slightly lower than DenseNet201. To enhance the accuracy of DR detection, this study employs an ensemble method that combines MobileNetV2 and GCN. This combined approach leverages the strengths of both models, resulting in an improved accuracy of 82.5%, an AUC of 0.88, an F1 score of 0.81, and a precision score of 0.83. These enhanced accuracy and reliable performance metrics underscore the potential of this ensemble method to create robust and efficient models for large-scale DR screening. By ensuring a better generalizability and reliability, this study contributes to more accurate and early diagnoses of DR, ultimately aiding in the prevention of vision loss in diabetic patients. Moreover, we also compare more advanced models, such as EfficientNet [45], which could enhance the breadth of the analysis. While the primary goal of this study was to assess well-established models like DenseNet201, ResNet50, VGG19, and MobileNetV2, we also included the EfficientNet model’s validation accuracy. Our findings show that EfficientNet achieved a validation accuracy of 80.2%, which is higher than that of MobileNetV2, demonstrating its potential for improving diabetic retinopathy detection. However, our proposed ensemble technique shows a slightly greater validation accuracy.

By applying the identification of IRIs to DR images such as in Figure 7, clinicians can acquire significant insights into the spatial distribution of abnormalities, which can help with diagnostic assessments and treatment planning for people suffering from this vision-threatening condition. K-means clustering was performed on DR eye scans with varied values of

K (3

,

4, 5, 6, 8

), yielding silhouette scores of

0.49, 0.44, 0.41, 0.42

, and 0.40, respectively. The silhouette ratings show the quality of the clusters, with higher values signifying better-defined clusters. In this case, the use of K-means aided in the segmentation of DR pictures by splitting them into various groups based on similarities. The silhouette scores indicate that

K = 3

produced the best-defined clusters, implying that the images may be efficiently divided into three discrete portions. This segmentation is critical for identifying and analyzing different features in eye scans, which may aid in the detection and assessment of DR. The method leads to more efficiency and accuracy.

The MGVF active contour model precisely determines the DR-affected boundaries in the affected retinas. The algorithm’s efficacy was evaluated on a dataset containing 370 images representing mild DR, 999 images for moderate DR, 193 images for severe DR, and 295 images for proliferative DR. These images included colored retinal scans. Encouragingly, the results showcased a commendable

90 %

average accuracy in affected area detection using the MGVF technique. Figure 8 shows the affected DR areas in some images.

6. Discussion

In summary, DenseNet201 shows a strong performance in the no DR category, but struggles with severe cases. ResNet50 achieves a more balanced performance across the categories, although it fails to predict proliferative and severe DR. VGG19 exhibits a lower performance, particularly in terms of recall. MobileNetV2 stands out with a competitive performance, showcasing a higher precision in specific categories. The choice of the most suitable model depends on the specific priorities of the application, considering factors such as the importance of correctly identifying severe cases and the balance between precision and recall. Further fine-tuning and optimization may be necessary to enhance the models’ capabilities in handling the complexities of DR classification.

Figure 6 not only categorizes the different severity levels of DR, but also displays the true and predicted labels using MobileNetV2. This dual representation provides a clear comparison between the actual diagnosis and the model’s predictions, highlighting the accuracy and effectiveness of MobileNetV2 in detecting DR. By visually showcasing both the true and predicted labels, Figure 6 offers valuable insights into the model’s performance, emphasizing its potential application in clinical settings for reliable and precise diagnoses of the disease.

The model with the highest AUC is VGG19, achieving an AUC of 0.731. This suggests that VGG19 has a superior ability to discriminate between different severity levels of DR, showcasing its effectiveness in capturing subtle patterns and features in the data. ResNet50 closely follows with an AUC of 0.715, indicating a robust performance in distinguishing between positive and negative cases. DenseNet201 and MobileNetV2 exhibit AUC values of 0.762 and 0.774, respectively, showcasing competitive but slightly higher discriminatory capabilities compared to VGG19 and ResNet50. These AUC values stem from the architectural variations and complexities inherent in each model. MobileNetV2, with its deeper architecture, seems to capture intricate details more effectively, resulting in the highest AUC. However, it is essential to interpret the AUC in conjunction with other metrics and consider the specific context of the application. While the AUC for DR prediction provides a global assessment of model performance, individual model strengths and weaknesses may be better illuminated by examining the precision, recall, and F1-score metrics for each severity level. Additionally, the closeness of the AUC values to the corresponding validation accuracies—

0.731, 0.715

, 0.762, and 0.774—indicates a reasonable alignment between the models’ classification performance and their ability to discriminate between different classes in DR. Table 2 compares various models for DR detection. Sharma et al. [22] used a CNN with six convolutional layers, achieving a 74.04% accuracy on the APTOS 2019 dataset. Lin et al. [20] revised ResNet-50, obtaining 74.16%. Lam et al. [27] modified GoogLeNet and AlexNet with data augmentation, reaching 74.5%. Our model explored architectures like DenseNet201, ResNet50, VGG19, and MobileNetV2, prioritizing accuracy and efficiency, achieving 78.22%. This substantial improvement underscores the value of ensembling, particularly in leveraging the complementary strengths of different models to enhance predictive accuracy. The superior performance of the ensemble model is further affirmed by the AUC metric, which serves as a robust indicator of model discrimination. An AUC close to 1.0 signifies an excellent ability to distinguish between classes, highlighting the ensemble model’s proficiency in accurately identifying DR cases. Incorporating GCN into the ensemble model enriches the feature extraction process by capturing the intricate relationships within the data, which MobileNetV2 alone might overlook. Thus, the ensemble method not only boosts overall accuracy, but also enhances the model’s robustness and generalizability across varied datasets. This is particularly critical for medical applications, where precision and reliability are paramount. The findings of this study emphasize the practical implications of adopting ensemble methods in clinical settings. By achieving a high validation accuracy with relatively modest computational resources, the ensemble model presents a feasible solution for large-scale screening programs. The implementation of a 5-fold cross-validation with 100 repetitions demonstrates the efficacy of the ensemble model combining MobileNetV2 and GCN. This approach yields a notable validation accuracy of 82.5%, markedly surpassing the standalone performance of MobileNetV2, which achieves a validation accuracy of 77.4%. This can facilitate early detection and intervention, ultimately aiding in the prevention of vision loss among diabetic patients. Furthermore, the rigorous validation process performed, involving extensive cross-validation and repeated testing, lends credence to the reliability and applicability of the results. Our proposed ensemble technique produces an AUC of 88%. This ensures that the model’s performance is not an artifact of specific data splits, but is consistent and reproducible across different subsets of data. These insights advocate for the broader adoption of ensemble techniques in developing predictive models for medical diagnostics, where the balance between accuracy and computational efficiency can significantly impact patient outcomes.

By applying IRIs to retinal images, clinicians can gain significant insights into the spatial distribution of abnormalities. This information is crucial for diagnostic assessments and treatment planning for individuals with this vision-threatening condition. The use of K-means clustering aids in the segmentation of images by dividing them into various groups based on similarities. This segmentation (Figure 7) is critical for identifying and analyzing different features in eye scans, which supports the detection and assessment of DR. Overall, this method enhances the efficiency and accuracy of clinical evaluations, potentially leading to better-targeted treatments and improved outcomes for patients. The approach provides a robust framework for integrating advanced image analysis techniques into routine clinical practice, contributing to the effective management of DR. The MGVF active contour model is a sophisticated algorithm designed to precisely delineate the boundaries of regions affected by DR. This model operates by utilizing gradient vector fields to guide the contour towards the edges of the affected areas in retinal images. The robustness of the MGVF model lies in its ability to handle variations in the intensity and contrast of retinal images, which are common challenges in medical image analysis. These images were taken from colored retinal scans, which provide the detailed visual information necessary for accurate boundary detection. Each category represents a different severity level of DR, ranging from minor lesions and microaneurysms in mild DR to extensive neovascularization and retinal detachment in proliferative DR. The algorithm’s performance was particularly noteworthy, achieving an impressive average accuracy of 90% in detecting the affected areas. This high level of accuracy indicates that the MGVF model can reliably identify and outline pathological regions in the retina, which is crucial for diagnosing and monitoring the progression of DR. Accurate boundary detection helps clinicians to make more informed decisions regarding treatment plans and interventions, potentially mitigating the risk of severe vision impairment. Additionally, the successful application of the MGVF model across such a diverse dataset underscores its generalizability and robustness. It demonstrates that the model can adapt to various degrees of retinal damage and provide consistent results, regardless of the severity of the disease. This adaptability is essential for clinical settings, where variability in patient data is a common occurrence. Figure 8, referenced in the study, visually presents areas affected by DR, as detected by the MGVF model. By illustrating the exact regions of damage, these images validate the model’s precision and effectiveness, providing a clear visual confirmation of its performance. Such visualizations are invaluable for both clinical diagnosis and further research, offering insights into the morphological changes associated with different stages of DR.

7. Conclusions

Early detection through accurate and efficient predictive models can prevent the progression of DR and reduce the incidence of vision loss among people with diabetes. This study underscores the potential of DL models in the early prediction of DR, a severe complication of diabetes that can lead to vision loss. Among the evaluated convolutional neural network architectures, MobileNetV2 demonstrated the most promising balance between a high validation accuracy (78.22%) and a reasonable training time, making it a viable candidate for practical applications. These findings highlight the trade-off between accuracy and computational efficiency, with MobileNetV2 showing a robust performance across multiple validation scenarios. DL has the potential to revolutionize the early prediction of DR, offering a transformative solution for managing the serious consequences of diabetes. Ensembling MobileNetV2 and GCN demonstrated the potential for creating robust and efficient models for detecting DR. By combining these models, the study ensured a better generalizability and reliability in practical applications, ultimately contributing to more accurate and early diagnoses of DR. As technology continues to advance, the combination of medical and scientific technologies will play a pivotal role in harnessing the full potential of DL for the benefit of patients with DR worldwide. This study showed that DL takes less time and is more cost-effective and efficient compared to ophthalmologists or trained healthcare personnel in identifying DR in patients. Therefore, DL could play a significant role in public health initiatives, such as mass screening programs for DR in people with diabetes, even in areas with limited healthcare infrastructure [46]. The integration of DL with routine eye screenings holds the promise of improving the lives of individuals, enabling timely interventions and preventing vision loss. Although DL holds great promise, it faces several challenges such as interpretability, ethical considerations, and data accuracy. However, clinicians and public health practitioners should collaborate to develop guidelines for implementing DL into clinical practice, ensuring its responsible and ethical deployment. Moreover, the MGVF active contour model represents a significant advancement in retinal image analysis for DR. Its high accuracy and robustness make it a promising tool for improving the early detection and monitoring of this debilitating condition, ultimately aiding in the prevention of vision loss in diabetic patients. We acknowledge the concern regarding the relatively small size of the APTOS 2019 dataset [34] and its potential impact on the generalizability of our models. To address this, we plan to validate the models using larger and more diverse datasets in future studies. Additionally, we will explore data augmentation techniques and transfer learning to enhance model robustness and minimize overfitting. This approach aims to improve the models’ reliability and applicability in various clinical environments. We have included this discussion in the article to acknowledge these limitations and outline our future research directions.

Author Contributions

Conceptualization, F.M. and H.K.; methodology, F.M.; validation, F.M., M.A.H.M. and H.K.; data curation, F.M.; writing—original draft, F.M., M.A.H.M. and F.F.; writing—review and editing, F.M., H.K., M.A.H.M. and F.F.; supervision, H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable, for this study authors use data from publicly available source that contains IRB approval by their own.

Informed Consent Statement

APTOS 2019 Blindness Detection Dataset collectors had taken all permission for using the data. It’s a open source data set, therefore using the data is open to all for use.

Data Availability Statement

The clean datasets generated and/or analyzed during the current study are available from the corresponding author upon reasonable request. It would be enlightened to the readers that the data supporting the findings of this study publicly available through APTOS 2019 Blindness Detection Dataset [34].

Acknowledgments

The authors are thankful to the unknown reviewers and TTU HPC, CU Anschutz Medical Campus for computational supports. The authors are also thankful for University of Arkansas, Fayetteville Writing Studio for improving English language in the writing.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1 shows a deep learning framework for detecting diabetic retinopathy (DR) in retinal fundus images. The process begins with image preprocessing to enhance quality, followed by segmentation to isolate regions of interest. A deep convolutional neural network (CNN) then classifies images through layers that detect and extract features, distinguishing between DR, no DR, and specific lesions like intraretinal hemorrhages. Finally, parameter optimization refines the model’s accuracy, enabling automated DR diagnosis.

Figure A1. Deep-convolutional-neural-network-based framework for diabetic retinopathy detection in retinal fundus images.

Appendix B

Algorithm A1. Pseudocode for the DR detection algorithm

# Initialize parameters

cnn_models = [‘DenseNet201’, ‘ResNet50’, ‘VGG19’, ‘MobileNetV2’]

ensemble_model = ‘MobileNetV2 + GCN’

results = {}

# Loop through CNN models

for model in cnn_models:

# Data Preparation

dataset = load_dataset ()

I_norm = preprocess_images(dataset)

# Feature Extraction

For i = 1:N

capture I_i

I_norm = I_resized/255

F = extract_features(model, I_norm)

# Model Evaluation

accuracy, training_time = evaluate_model(model, F)

results[model] = {‘accuracy’: accuracy, ‘training_time’: training_time}

# Cross-Validation

validation_accuracy = cross_validate(model, F)

results[model][‘validation_accuracy’] = validation_accuracy

# Ensemble Method (if MobileNetV2)

if model == ‘MobileNetV2’:

ensemble_accuracy = cross_validate(ensemble_model, F)

results[‘Ensemble’] = {‘validation_accuracy’: x%}

# Performance Metrics

if model == ‘MobileNetV2’:

auc, f1, precision = calculate_metrics(ensemble_model, F)

results[‘Ensemble’].update({‘AUC’: auc, ‘F1’: f1, ‘Precision’: precision})

# Clustering Analysis

for k in k_values in 3, 4, 5, 6, 8: #modify based on problem

silhouette_score = apply_kmeans(F, k)

results[‘Clustering’] = {‘k’: k, ‘silhouette_score’: silhouette_score}

# MGVF Active Contour Model

MGVF_results = apply_MGVF(F)

results[‘MGVF’ = MGVF_results

# Output results

display_results(results)

References

Tan, G.S.; Cheung, N.; Simó, R.; Cheung, G.C.; Wong, T.Y. Diabetic macular oedema. Lancet Diabetes Endocrinol. 2017, 5, 143–155. [Google Scholar] [CrossRef] [PubMed]
Ting, D.S.W.; Tan, K.-A.; Phua, V.; Tan, G.S.W.; Wong, C.W.; Wong, T.Y. Biomarkers of diabetic retinopathy. Curr. Diabetes Rep. 2016, 16, 159. [Google Scholar] [CrossRef] [PubMed]
Wong, T.Y.; Gemmy, C.C.M.; Larsen, M.; Sharma, S.; Rafael, S. Erratum: Diabetic retinopathy. Nat. Rev. Dis. Primers 2016, 2, 16012. [Google Scholar] [CrossRef]
Alberti, K.G.M.M.; Zimmet, P.Z. Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: Diagnosis and classification of diabetes mellitus. provisional report of a who consultation. Diabet. Med. 1998, 15, 539–553. [Google Scholar] [CrossRef]
Geerlings, S.E.; Hoepelman, A.I. Immune dysfunction in patients with diabetes mellitus (dm). FEMS Immunol. Med. Microbiol. 1999, 26, 259–265. [Google Scholar] [CrossRef] [PubMed]
Bansal, P.; Gupta, R.P.; Kotecha, M. Frequency of diabetic retinopathy in patients with diabetes mellitus and its correlation with duration of diabetes mellitus. Med. J. Dr. DY Patil Univ. 2013, 6, 366–369. [Google Scholar]
NIH. People with Diabetes Can Prevent Vision Loss. Available online: https://www.nei.nih.gov/sites/default/files/2019-06/diabetes-prevent-vision-loss.pdf (accessed on 7 January 2024).
Albawi, S.; Bayat, O.; Al-Azawi, S.; Ucan, O.N. Social touch gesture recognition using convolutional neural network. Comput. Intell. Neurosci. 2018, 2018, 6973103. [Google Scholar] [CrossRef]
Prevention, C.f.D.C.a. VEHSS Modeled Prevalence Estimates. Available online: https://www.cdc.gov/vision-health-data/prevalence-estimates/index.html (accessed on 1 November 2024).
Lumbroso, B.; Rispoli, M.; Savastano, M.C. Diabetic Retinopathy; JP Medical Ltd.: Altrincham, UK, 2015. [Google Scholar]
Kumar, S.; Kumar, G.; Velu, S.; Pardhan, S.; Sivaprasad, S.; Ruamviboonsuk, P.; Raman, R. Patient and provider perspectives on barriers to screening for diabetic retinopathy: An exploratory study from southern India. BMJ Open 2020, 10, e037277. [Google Scholar] [CrossRef]
Amoaku, W.M.; Ghanchi, F.; Bailey, C.; Banerjee, S.; Banerjee, S.; Downey, L.; Gale, R.; Hamilton, R.; Khunti, K.; Posner, E.; et al. Diabetic retinopathy and diabetic macular oedema pathways and management: UK consensus working group. Eye 2020, 34 (Suppl. S1), 1–51. [Google Scholar] [CrossRef] [PubMed]
Wong, T.Y.; Sun, J.; Kawasaki, R.; Ruamviboonsuk, P.; Gupta, N.; Lansingh, V.C.; Maia, M.; Mathenge, W.; Moreker, S.; Muqit, M.M.; et al. Guidelines on diabetic eye care: The international council of ophthalmology recommendations for screening, follow-up, referral, and treatment based on resource settings. Ophthalmology 2018, 125, 1608–1622. [Google Scholar] [CrossRef]
Mayya, V.; Kamath, S.; Kulkarni, U. Automated microaneurysms detection for early diagnosis of diabetic retinopathy: A comprehensive review. Comput. Methods Programs Biomed. Update 2021, 1, 100013. [Google Scholar] [CrossRef]
Flaxman, A.D.; Wittenborn, J.S.; Robalik, T.; Gulia, R.; Gerzoff, R.B.; Lundeen, E.A.; Saaddine, J.; Rein, D.B.; Baldonado, K.N.; Davidson, C.; et al. Prevalence of visual acuity loss or blindness in the us: A bayesian meta-analysis. JAMA Ophthalmol. 2021, 139, 717–723. [Google Scholar] [CrossRef]
Lundeen, E.A.; Burke-Conte, Z.; Rein, D.B.; Wittenborn, J.S.; Saaddine, J.; Lee, A.Y.; Flaxman, A.D. Prevalence of diabetic retinopathy in the US in 2021. JAMA Ophthalmol. 2023, 141, 747–754. [Google Scholar] [CrossRef]
Grauslund, J. Diabetic retinopathy screening in the emerging era of artificial intelligence. Diabetologia 2022, 65, 1415–1423. [Google Scholar] [CrossRef] [PubMed]
Qummar, S.; Khan, F.G.; Shah, S.; Khan, A.; Shamshirband, S.; Rehman, Z.U.; Khan, I.A.; Jadoon, W. A deep learning ensemble approach for diabetic retinopathy detection. IEEE Access 2019, 7, 150530–150539. [Google Scholar] [CrossRef]
Nielsen, K.B.; Lautrup, M.L.; Andersen, J.K.; Savarimuthu, T.R.; Grauslund, J. Deep learning-based algorithms in screening of diabetic retinopathy: A systematic review of diagnostic performance. Ophthalmol. Retin. 2019, 3, 294–304. [Google Scholar] [CrossRef]
Lin, C.L.; Wu, K.C. Development of revised ResNet-50 for diabetic retinopathy detection. BMC Bioinform. 2023, 24, 157. [Google Scholar] [CrossRef] [PubMed]
Pratt, H.; Coenen, F.; Broadbent, D.M.; Harding, S.P.; Zheng, Y. Convolutional neural networks for diabetic retinopathy. Procedia Comput. Sci. 2016, 90, 200–205. [Google Scholar] [CrossRef]
Sharma, H.S.; Singh, A.; Chandel, A.S.; Singh, P.; Sapkal, P. Detection of diabetic retinopathy using convolutional neural network. In Proceedings of the International Conference on Communication and Information Processing (ICCIP), Chongqing, China, 15–17 November 2019. [Google Scholar]
Sil, D.; Dutta, A.; Chandra, A. Convolutional neural networks for noise classification and denoising of images. In Proceedings of the TENCON 2019–2019 IEEE Region 10 Conference (TENCON), Kerala, India, 17–20 October 2019; pp. 447–451. [Google Scholar]
Dekhil, O.; Naglah, A.; Shaban, M.; Ghazal, M.; Taher, F.; Elbaz, A. Deep learning-based method for computer aided diagnosis of diabetic retinopathy. In Proceedings of the 2019 IEEE International Conference on Imaging Systems and Techniques (IST), Abu Dhabi, United Arab Emirates, 9–10 December 2019; pp. 1–4. [Google Scholar]
Gangwar, A.K.; Ravi, V. Diabetic retinopathy detection using transfer learning and deep learning. In Evolution in Computational Intelligence: Frontiers in Intelligent Computing: Theory and Applications (FICTA 2020); Springer: Singapore, 2021; Volume 1, pp. 679–689. [Google Scholar]
Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C.-W. Deep learning on image denoising: An overview. Neural Netw. 2020, 131, 251–275. [Google Scholar] [CrossRef] [PubMed]
Lam, C.; Yi, D.; Guo, M.; Lindsey, T. Automated detection of diabetic retinopathy using deep learning. AMIA Summits Transl. Sci. Proc. 2018, 2018, 147. [Google Scholar]
Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
Muthusamy, D.; Palani, P. Deep learning model using classification for diabetic retinopathy detection: An overview. Artif. Intell. Rev. 2024, 57, 185. [Google Scholar] [CrossRef]
Muthusamy, D.; Palani, P. Deep neural network model for diagnosing diabetic retinopathy detection: An efficient mechanism for diabetic management. Biomed. Signal Process. Control. 2025, 100, 107035. [Google Scholar] [CrossRef]
Hamza, M. Optimizing early detection of diabetes through retinal imaging: A comparative analysis of deep learning and machine learning algorithms. J. Comput. Inform. Bus. 2024, 1, 1. [Google Scholar]
Navaneethan, R.; Devarajan, H. Enhancing diabetic retinopathy detection through preprocessing and feature extraction with MGA-CSG algorithm. Expert Syst. Appl. 2024, 249, 123418. [Google Scholar] [CrossRef]
Zhao, H.; Li, H.; Maurer-Stroh, S.; Guo, Y.; Deng, Q.; Cheng, L. Supervised segmentation of un-annotated retinal fundus images by synthesis. IEEE Trans. Med. Imaging 2018, 38, 46–56. [Google Scholar] [CrossRef] [PubMed]
APTOS 2019 Blindness Detection Dataset. Available online: https://www.kaggle.com/competitions/aptos2019-blindness-detection (accessed on 1 November 2024).
Moreno-Torres, J.G.; Sáez, J.A.; Herrera, F. Study on the impact of partition-induced dataset shift on k-fold cross-validation. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1304–1312. [Google Scholar] [CrossRef] [PubMed]
Zhang, K.; Guo, Y.; Wang, X.; Yuan, J.; Ding, Q. Multiple feature reweight dense net for image classification. IEEE Access 2019, 7, 9872–9880. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Dey, N.; Zhang, Y.-D.; Rajinikanth, V.; Pugalenthi, R.; Raja, N.S.M. Customized vgg 19 architectures for pneumonia detection in chest X-rays. Pattern Recognit. Lett. 2021, 143, 67–74. [Google Scholar] [CrossRef]
Xiang, Q.; Wang, X.; Li, R.; Zhang, G.; Lai, J.; Hu, Q. Fruit image classification based on mobilenetv2 with transfer learning technique. In Proceedings of the 3rd International Conference On Computer Science and Application Engineering, Sanya, China, 22–24 October 2019; pp. 1–7. [Google Scholar]
Yun, W.L.; Mookiah, M.R.K. Detection of diabetic retinopathy using k-means clustering and self-organizing map. J. Med. Imaging Health Inform. 2013, 3, 575–581. [Google Scholar] [CrossRef]
Javidi, M.; Harati, A.; Pourreza, H. Retinal image assessment using bilevel adaptive morphological component analysis. Artif. Intell. Med. 2019, 99, 101702. [Google Scholar] [CrossRef] [PubMed]
Chan, T.F.; Vese, L.A. Active contours without edges. IEEE Trans. Image Process. 2001, 10, 266–277. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Deng, T.; Xie, W. Active contours driven by divergence of gradient vector flow. Signal Process. 2016, 120, 185–199. [Google Scholar] [CrossRef]
Andrychowicz, M.; Denil, M.; Gomez, S.; Hoffman, M.W.; Pfau, D.; Schaul, T.; Shillingford, B.; De Freitas, N. Learning to learn by gradient descent by gradient descent. In Proceedings of the Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 3981–3989. [Google Scholar]
Arora, L.; Singh, S.K.; Kumar, S.; Gupta, H.; Alhalabi, W.; Arya, V.; Gupta, B.B. Ensemble deep learning and EfficientNet for accurate diagnosis of diabetic retinopathy. Sci. Rep. 2024, 14, 30554. [Google Scholar] [CrossRef] [PubMed]
Dai, L.; Wu, L.; Li, H.; Cai, C.; Wu, Q.; Kong, H.; Liu, R.; Wang, X.; Hou, X.; Liu, Y.; et al. A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 2021, 12, 3242. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Images showing distinct DR in a human eye. Moderate DR has 999 images. The severe DR class consists of 193 images, and the proliferative DR class comprises 295 images. These counts offer a comprehensive overview of the distribution of images across different DR levels in the dataset.

Figure 2. Example of the denoise image shows distinct DR in a human eye scan, (a) original grayscale, (b) Gaussian noise, (c) after denoising, and (d) RGB scale.

Figure 3. Training and validation splits of retina scan image datasets. It gives the insights into the dataset’s composition and assists in determining the balance of class representation in the training and validation sets.

Figure 4. CNNs are made up of three types of layers: convolutional layers, pooling layers, and fully connected (FC) layers. When these layers are stacked, a CNN architecture will emerge. In addition to these three layers, there are two other significant factors, the dropout layer and the activation function.

Figure 5. A comparison of different deep learning models, namely DenseNet201, ResNet50, VGG19, and MobileNetV2, based on their training and validation accuracies, as well as training times, with 5-fold CV and 100 repeats.

Figure 6. Images show distinct directories representing different severity levels.

Figure 7. Algorithm in (3.9) optimally groups pixels by minimizing intra-cluster variance, enabling the separation of pathological features from normal retinal structures.

Figure 8. MGVF active contour model precisely determines the DR-affected boundaries in the affected retinas for four different samples.

Table 1. Performance metrics for different models in diabetic retinopathy classification with 5-fold CV and 100 repeats.

DenseNet201				ResNet50
Condition of DR	Precision	Recall	F1-Score		Precision	Recall	F1-Score
Mild	0.610	0.511	0.541	Mild	0.512	0.169	0.248
Moderate	0.602	0.772	0.671	Moderate	0.457	0.849	0.601
No DR	0.959	0.969	0.956	No DR	0.869	0.909	0.888
Proliferate DR	0.458	0.391	0.425	Proliferate DR	0.001	0.001	0.001
Severe	0.680	0.221	0.339	Severe	0.001	0.001	0.001
Weighted Avg	0.767	0.769	0.758	Weighted Avg	0.600	0.378	0.660
VGG19				MobileNetV2
	Precision	Recall	F1-Score		Precision	Recall	F1-Score
Mild	0.527	0.101	0.159	Mild	0.422	0.278	0.333
Moderate	0.465	0.919	0.617	Moderate	0.528	0.728	0.618
No DR	0.919	0.929	0.927	No DR	0.94	0.96	0.95
Proliferate DR	0.001	0.001	0.001	Proliferate DR	0.579	0.252	0.344
Severe	0.001	0.001	0.001	Severe	0.328	0.209	0.251
Weighted Avg	0.629	0.701	0.625	Weighted Avg	0.739	0.748	0.728

Table 2. Comparative study with other models.

Authors	Dataset Used	Method	Accuracy
Sharma et al. [22]	APTOS 2019 Blindness Detection dataset [34]	CNN proposed with six convolution layers and one fully connected layer	74.04%
Lin and Wu [20]		Revised ResNet-50 model for diabetic retinopathy detection	74.16%
Lam et al. [27]		CNNs to Modified GoogLeNet and AlexNet for multistage classification of diabetic retinopathy. CLAHE (Contrast Limited Adaptive Histogram Equalization) and data augmentation	74.5%
Our model		Used different architectures based on CNN, namely DenseNet201, ResNet50, VGG19, and MobileNetV2. The focus is on balancing accuracy and computation efficiency, cross-validating and tuning hyperparameters to obtain the best-performing model like MobileNetV2. Gaussian filtering and image resizing were attempted for data preprocessing to enhance classification. Accuracy is improved by ensembling MobileNetV2 and GCN	78.22% and 82.5%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mostafa, F.; Khan, H.; Farhana, F.; Miah, M.A.H. Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy. AppliedMath 2025, 5, 11. https://doi.org/10.3390/appliedmath5010011

AMA Style

Mostafa F, Khan H, Farhana F, Miah MAH. Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy. AppliedMath. 2025; 5(1):11. https://doi.org/10.3390/appliedmath5010011

Chicago/Turabian Style

Mostafa, Fahad, Hafiz Khan, Fardous Farhana, and Md Ariful Haque Miah. 2025. "Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy" AppliedMath 5, no. 1: 11. https://doi.org/10.3390/appliedmath5010011

APA Style

Mostafa, F., Khan, H., Farhana, F., & Miah, M. A. H. (2025). Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy. AppliedMath, 5(1), 11. https://doi.org/10.3390/appliedmath5010011

Article Menu

Application of Deep Learning Framework for Early Prediction of Diabetic Retinopathy

Abstract

1. Introduction

2. Related Works

3. Data Preparation

3.1. Data Overview

3.2. Data Pre-Processing

3.3. Training and Validation

4. Proposed Models

4.1. DenseNet201

4.2. ResNet50

4.3. VGG19

4.4. MobileNetV2

4.5. Ensembling MobileNetV2 and GCN

4.6. Cross Validation

4.7. Hyper Parameter Tuning

4.8. Model Diagnosis by Evaluation Matrices

4.9. AUC Analysis

4.10. Identification of Isolate Regions of Interest (IRIs) in a Retinal Scan

5. Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI