Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics

Aziz, Tamoor; Charoenlarpnopparut, Chalie; Mahapakulchai, Srijidtra; Ajayi, Babatunde Oluwaseun; Bamisaye, Mayowa Emmanuel

doi:10.3390/signals7020034

Open AccessArticle

Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics

by

Tamoor Aziz

^1,*

,

Chalie Charoenlarpnopparut

²

,

Srijidtra Mahapakulchai

^3,*,

Babatunde Oluwaseun Ajayi

⁴

and

Mayowa Emmanuel Bamisaye

⁴

¹

Computer Engineering Department, KOSEN-King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

²

Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand

³

Department of Electrical Engineering, Kasetsart University, Bangkok 10900, Thailand

⁴

King Prajadhipok’s Institute, Bangkok 10210, Thailand

^*

Authors to whom correspondence should be addressed.

Signals 2026, 7(2), 34; https://doi.org/10.3390/signals7020034

Submission received: 4 March 2026 / Revised: 27 March 2026 / Accepted: 31 March 2026 / Published: 7 April 2026

Download

Browse Figures

Versions Notes

Abstract

Diabetic retinopathy is a prevalent condition and a major public health concern due to its detrimental impact on eyesight. Diabetes is a root cause of its development and damages small blood vessels caused by prolonged high blood sugar levels. The degenerative consequences of diabetic retinopathy are irrevocable if not diagnosed in the early stages of its progression. This ailment triggers the development of retinal lesions, which can be identified for diagnosis and prognosis. However, lesion detection is challenging due to their similarity in intensity profiles to other retinal features, inconsistent sizes, and random locations. This research evaluates a custom deep learning network for classifying retinal images and compares it with the state-of-the-art classifiers. The novel preprocessing method is introduced to reduce the complexity of the diagnostic process and to enhance classification performance by adaptively enhancing images. Despite being a shallow network, the proposed model yields competitive results with an accuracy of 87.66% and an F1-score of 0.78. The evaluation metrics indicate that class imbalance affects the performance of the proposed model despite using the weighted cross-entropy loss. The future contribution will be the inclusion of generative adversarial networks for generating synthetic images to balance the dataset. This research aims to develop a robust computer-aided diagnostic system as a second interpreter for ophthalmologists during the diagnosis and prognosis stages.

Keywords:

computer-aided design; custom deep learning; feature localization; Grad-CAM; diabetic retinopathy; image preprocessing; lesion detection; retinal fundus image

1. Introduction

Diabetic retinopathy (DR) is one of the most common microvascular complications of diabetes mellitus and remains a leading cause of preventable blindness worldwide. According to the research [1], DR will affect approximately 16.16 million people by 2030. Its prevalence is rising in parallel with the global increase in diabetes cases, which makes it a profound public health concern [2]. The pathophysiology of DR begins with prolonged hyperglycemia, which damages the retinal microvasculature [3]. This leads to increased vascular permeability, capillary occlusion, and ischemia. These changes deteriorate the structural and functional integrity of the retina. This results in perpetual vision impairment. It significantly affects the patient’s quality of life and imposes a substantial socioeconomic burden on healthcare systems.

The progression of DR is characterized by the appearance of distinct retinal lesions, such as microaneurysms, hemorrhages, hard exudates, and cotton-wool spots [4]. These lesions are the earliest visible indicators of retinal damage and serve as critical biomarkers for diagnosis and prognosis [5]. However, their detection is not straightforward due to certain impediments. Lesions can vary greatly in size, shape, and distribution [6]. Their visual characteristics often overlap with normal retinal features, such as blood vessels or the optic disc. These challenges make manual screening time-consuming and prone to observer variability, especially in large-scale screening programs.

Accurate and efficient lesion detection is essential for early diagnosis, timely treatment, and prevention of vision loss [7]. Traditional screening methods rely on ophthalmologists to examine fundus images; however, increasing patient demand and a shortage of specialists place immense pressure on the healthcare system. This creates a significant impediment that delays timely diagnosis and treatment. Furthermore, inconsistent lesion sizes, random spatial distribution, and similarity in intensity profiles with other retinal structures complicate the diagnosis process even for experienced clinicians [8]. These limitations underscore the need for robust, automated diagnostic tools. These tools can assist in lesion detection and disease classification. They improve screening efficiency, reduce diagnostic errors, and enable earlier intervention.

The development of automated computer-aided diagnostic systems for DR analysis is extremely challenging due to variable imaging conditions. Fundus photographs often suffer from poor illumination (Figure 1a), low contrast (Figure 1b), and uneven brightness (Figure 1c). These issues can obscure fine retinal details critical for lesion detection [9]. Additionally, variations in image resolution (Figure 1d), aspect ratio (Figure 1e), and field of view (Figure 1f) introduce further complexities. These variations make it difficult to standardize inputs for deep learning models. In addition to inherent image inconsistencies, class imbalance can significantly bias model training [10]. Imbalanced gradients and a feature space dominated by the majority class can destabilize the optimization process. This steers the model toward suboptimal representations and overlapping decision boundaries.

This study introduces several key contributions to address these challenges in retinal fundus image analysis. First, an adaptive gamma correction in combination with contrast-limited adaptive histogram equalization [11] was applied to address image quality issues. This preprocessing strategy selectively enhances low-quality images by improving brightness and contrast. It minimally affects images that already possess adequate quality. As a result, the classification model can better interpret subtle retinal features. Second, a retinal mask was generated using empirical image processing techniques to effectively handle variations in image resolution and aspect ratios. This step removes redundant peripheral regions introduced by fundus camera settings. It streamlines diagnosis and reduces computational complexity. Third, a weighted loss function was employed to mitigate extreme class imbalance. It ensures adequate representation of minority classes during training. Finally, a custom deep learning architecture was introduced for retinal fundus image classification. The proposed model incorporates a multi-scale fusion block at the input stage to encode multi-scale retinal features and lesion characteristics. The first branch preserves high-resolution spatial details to extract fine-grained features such as microaneurysms, small hemorrhages, and thin vessels. The second branch captures coarse contextual information, including lesion distribution and macular structure. By integrating these feature maps, the model constructs a rich multi-scale representation that combines local detail with global context. Furthermore, adaptive Gaussian noise, proportional to the standard deviation of the feature maps, is injected during training to learn robust feature embeddings. This mechanism is a data-dependent regularization strategy that improves generalization without significantly degrading informative representations. Despite being a relatively shallow network compared to conventional state-of-the-art architectures, the proposed model demonstrated competitive performance. This highlights the effectiveness of targeted design choices over architectural depth alone. Overall, these contributions establish a practical and efficient framework for automated diabetic retinopathy screening that balances accuracy, interpretability, and computational efficiency.

2. Literature Review

A comprehensive review [12] highlighted that deep learning has emerged as the dominant paradigm for DR screening and grading. However, it also emphasized recurring practical challenges, such as variable image quality, heterogeneous data sources, differences in field-of-view, and severe class imbalance that limit real-world deployment. Several studies have proposed diverse strategies to address these limitations. Mehboob et al. [13] utilized the large-scale EyePACS dataset and experimented with three different frameworks, a cascaded binary classifier, an ensemble of CNNs across different color spaces (HSV, RGB, and normalized), and a CNN-LSTM hybrid, to capture sequential feature dependencies. The ensemble of CNNs with color-space inputs performed best and achieved 78.06% accuracy without augmentation and 83.78% with augmentation. Ayala et al. [14] employed transfer learning with DenseNet-121 using publicly available datasets (including APTOS). Their technique used preprocessing for resolution standardization and augmentation. Their model achieved up to 81% accuracy, demonstrating the value of cross-dataset validation for generalization. Devi et al. [15] addressed the issue of low-quality and imbalanced data by applying a sophisticated denoising pipeline (Wavelet + Retinex + Bilateral filter), followed by dual-phase feature extraction (global and local) and fine-tuning EfficientNet-B7. Their results showed that advanced preprocessing and feature engineering significantly enhance performance on noisy datasets.

Other works have focused on refining CNN-based pipelines. Yang et al. [16] presented a deep learning framework for automatic detection of diabetic retinopathy from retinal fundus images. The proposed method highlighted the effectiveness of convolutional neural networks (CNNs) in extracting hierarchical features without manual preprocessing. The study reported improved accuracy and sensitivity compared to traditional machine learning approaches and demonstrated robustness across multiple datasets. Key contributions included optimized image enhancement, balanced handling of class imbalance, and evaluation against standard benchmarks. Atwany et al. [17] proposed a methodology that involves preprocessing techniques such as image normalization and augmentation, followed by training a CNN to automatically extract discriminative features. The model was evaluated on publicly available datasets and achieved promising evaluation metrics compared to conventional machine learning methods. Results demonstrate the framework’s robustness in handling class imbalance and its potential for reliable early-stage diagnosis. Mohanty et al. [18] proposed two deep learning approaches for DR detection and classification. The study compared a hybrid model that combined VGG16 for feature extraction with XGBoost for classification, and a DenseNet-121 architecture known for its dense connectivity and efficient feature reuse. Images were preprocessed (resizing, Gaussian blurring, and Ben Graham’s cropping method), and the dataset was balanced to mitigate class imbalance. Experimental results showed that the hybrid VGG16 + XGBoost achieved 79.5% accuracy, while DenseNet-121 significantly outperformed with 97.3% accuracy.

Ali et al. [19] proposed a hybrid convolutional neural network (IR-CNN) for automatic DR classification using fundus images. The model combined ResNet50 and InceptionV3 for feature extraction and concatenated their outputs before feeding them into a CNN classifier. Preprocessing steps, such as histogram equalization, intensity normalization, and data augmentation, were applied to enhance image quality and training performance. Results showed that the hybrid IR-CNN significantly outperformed standalone models, achieved 96.85% accuracy and 98.65% F1-score. Hayati et al. [20] investigated the impact of Contrast Limited Adaptive Histogram Equalization (CLAHE) on DR image classification using deep learning. The study used the APTOS 2019 dataset (3288 images) and evaluated four CNN architectures: VGG16, ResNet34, InceptionV3, and EfficientNetB4. CLAHE preprocessing was applied to enhance contrast and brightness in retinal images before training. Results showed significant improvements in accuracy for VGG16 (87% to 91%), InceptionV3 (90% to 95%), and EfficientNetB4 (95% to 97.8%), while ResNet34 performed better on original images (95% vs. 84% with CLAHE). The findings highlight that image enhancement can substantially improve DR detection performance, especially for lighter CNN models. Aziz et al. [21] proposed a novel Smart Window-based Adaptive Thresholding for segmentation and Gaussian matched filtering and entropy-based thresholding for seed-point extraction. A custom shallow CNN architecture was introduced for hemorrhage classification. Experimental validation showed that HemNet achieved a competitive accuracy of 97.19% on the DIARETDB1 dataset, outperforming deeper networks such as ResNet50, AlexNet, and VGG-16. Das et al. [22] developed a deep learning-based system for automatic detection and classification of DR. The methodology involved image preprocessing (resizing, normalization, and augmentation), followed by training CNN architectures including ResNet50 and DenseNet121. Results showed that DenseNet121 achieved the highest accuracy of 97.3% and outperformed other tested models.

3. Methodology

The proposed algorithm is designed to systematically address the inherent challenges of retinal fundus image analysis and improve the reliability of automated diabetic retinopathy classification. First, redundant peripheral regions introduced by fundus camera settings are removed by isolating the retinal region. This step ensures uniformity across images that mitigates resolution and aspect ratio inconsistencies that could otherwise hinder model performance. Next, image quality is enhanced using CLAHE and adaptive gamma correction. These preprocessing techniques amplify the visibility of subtle retinal features. Consequently, preprocessing enhances the representation of early diabetic retinopathy indicators and enables the deep learning model to extract more discriminative latent features. Finally, the classifier is trained using a weighted loss function to counteract the effects of dataset imbalance, which ensures that minority classes are adequately represented during optimization. The overall workflow of the proposed algorithm is illustrated in Figure 2.

3.1. Dataset Description

In this study, the Retinal Fundus Multi-disease Image Dataset (RFMiD) was employed. This is a publicly available resource designed to facilitate research in automated ocular disease detection and classification [23]. The dataset consists of 3200 color fundus images acquired using three different fundus cameras, thereby incorporating natural variability in image quality, resolution, and acquisition settings. This dataset is divided into training (1920), validation (640), and testing (640) sets, of which only 376 images correspond to the DR cases in the training set, 132 images belong to DR in the validation set, and 124 images in the testing set. This indicates that approximately 20% of the samples belong to the abnormal class in each subset. Each image has been annotated through an adjudicated consensus of two senior retinal experts to ensure high-quality ground truth labels. The dataset is particularly challenging due to its heterogeneity in imaging conditions (e.g., variations in illumination, contrast, and field of view) and the class imbalance across disease categories. These characteristics make RFMiD a realistic proxy for clinical screening environments, where image quality varies widely. By providing a large-scale dataset, RFMiD enables the development of robust deep learning models for DR classification.

3.2. Image Calibration

Image calibration is an essential preprocessing step to reduce variability in aspect ratio and image resolution across retinal fundus images. In addition, it removes redundant peripheral regions, such as the black background surrounding the retina, which do not contribute to diagnostic information. By standardizing the retinal region, this step produces uniformly scaled images that can be more effectively interpreted by deep learning models and facilitates the automation of the diagnostic procedure.

The calibration process begins with the application of a median filter to suppress intensity variations at the retinal periphery. The filtered image is then binarized to generate a retinal mask, which delineates the retinal region where DR-related features are present. A bounding box is subsequently derived from this mask to define the retinal coordinates, which are then used to crop and calibrate the image. This procedure not only ensures that the model focuses on clinically relevant regions but also reduces computational complexity, thereby improving the efficiency and reliability of the proposed algorithm for automatic DR identification.

3.3. Image Enhancement

Retinal fundus images often suffer from poor illumination and low contrast due to variable acquisition conditions, which can obscure subtle pathological features. To address these challenges, two complementary enhancement techniques are employed: CLAHE and adaptive gamma correction.

Conventional histogram equalization (HE) enhances contrast by redistributing pixel intensities across the full dynamic range. However, HE over-amplifies noise in homogeneous regions. CLAHE mitigates this limitation by operating on small contextual regions (tiles) and applying a clip limit to prevent over-saturation of contrast. For a contextual region

R

, the transformation function is defined as

I_{C L A H E} (x, y) = I_{m i n} + ({I_{m a x} - I}_{m i n}) \cdot C D F_{c l i p p e d} I (x, y)

(1)

where

I (x, y)

is the original pixel intensity,

C D F_{c l i p p e d}

is the clipped cumulative distribution function of the histogram in the region

R

, and

I_{m i n}

,

I_{m a x}

are the minimum and maximum intensity values. By limiting histogram amplification, CLAHE adaptively enhances local contrast while suppressing noise. This step significantly improves the visibility of fine retinal structures such as microaneurysms and exudates, which are critical for early DR detection.

To correct poor illumination, gamma correction is applied. The standard gamma transformation is expressed as

I_{e n h a n c e d} (x, y) = I_{m a x} {(\frac{I (x, y)}{I_{m a x}})}^{γ}

(2)

where

γ

is the gamma parameter that controls brightness adjustment. However, a fixed

γ

value is suboptimal because some images are already adequately illuminated, while others are dark. To make the process adaptive, the Sobel gradient operator is used to estimate edge information, which indirectly reflects image brightness and contrast. The adaptive gamma is defined as

γ = α \cdot \frac{⌊β \times 100⌋}{10}

(3)

where

β

represents the edge density derived from the Sobel gradient, and

α

is the brightness adjustment coefficient. A large

β

indicates that the image is adequately bright, so low correction (

γ \approx 1)

is needed, whereas a small

β

corresponds to a darker image, so higher correction (

γ < 1)

is required.

By combining CLAHE for local contrast enhancement with adaptive gamma correction for illumination normalization, the proposed enhancement technique ensures that subtle DR features are preserved and emphasized. This dual strategy not only improves the visual quality of retinal images but also enhances the discriminative power of the deep learning model that enables more reliable and robust detection of early pathological signs.

3.4. Custom Deep Learning Model

To effectively capture subtle DR features while maintaining computational efficiency, a custom CNN classifier was designed. Unlike standard off-the-shelf architecture, the proposed model integrates multi-scale feature extraction, adaptive noise regularization, and lightweight fully connected layers to balance accuracy and efficiency, illustrated in Figure 3.

The network begins with a multi-scale fusion block that enables the model to simultaneously preserve fine vascular details and broader lesion context. This is achieved by applying two convolutional branches with different kernel sizes and stride lengths.

The first branch applies a

3 \times 3

convolution with a stride length of 1, followed by batch normalization and max pooling:

x_{1 a} = P o o l (σ (B N (W_{1 a} * I + b_{1 a})))

(4)

where

I

is the input image,

W

and

b

are convolutional weights and bias,

*

denotes convolution,

B N

is batch normalization, and

σ (.)

is the

R e L U

activation. This stage extracts low-level retinal features such as vessels, microaneurysms, and exudates.

The second branch applies a 5 × 5 convolution with a stride of 2 to capture coarser contextual features. Due to the difference in kernel size and stride, spatial misalignment occurs. To address this, the latent features are interpolated and concatenated:

x_{1 b} = P o o l (σ (B N (W_{1 b} * I + b_{1 b})))

(5)

x_{1} = C o n c a t (x_{1 a}, I n t e r p (x_{1 b}))

(6)

This multi-scale fusion ensures that both fine and coarse retinal structures are represented in the feature space.

A key contribution of the model is the adaptive noise injection mechanism, applied during training to improve generalization. For each mini-batch, Gaussian noise is added with variance proportional to the standard deviation of the feature maps:

x^{'} = x + ϵ, ϵ ~ N (0, σ^{2} (x) \cdot η^{2})

(7)

where

η

is the noise scaling factor. This strategy prevents overfitting by simulating acquisition variability (illumination changes, sensor noise), which makes the model more robust to real-world screening conditions.

Finally, a convolutional block reduces dimensionality, followed by flattening and two fully connected layers:

h = σ (W_{f c 1} \cdot F l a t t e n (x_{3}) + b_{f c 1}), \hat{y_{i}} = S o f t m a x (W_{f c 2} \cdot h + b_{f c 2})

(8)

where

y ε R^{2}

corresponds to the binary classification (DR vs. non-DR).

The model was trained using hyperparameters optimized by Bayesian optimization. The optimal configuration consisted of a batch size of 32, the Adam optimizer, and a learning rate of 5.61 × 10⁻⁴. The input image size of 224 × 224 was used. The model was trained for a fixed number of epochs with early stopping based on validation performance to prevent overfitting.

Data augmentation was not applied in this study. Preliminary experiments indicated that certain augmentation techniques, particularly geometric transformations such as rotation, can distort clinically relevant lesion patterns and alter their appearance. This may lead to class ambiguity, where pathological features resemble non-DR cases, thereby introducing label noise and reducing class separability.

3.5. Weighted Cross-Entropy Loss

A major challenge in DR classification is the class imbalance inherent in medical imaging datasets. In the RFMiD dataset, the number of negative samples significantly exceeds the number of positive samples. Training a deep learning model directly on such imbalanced data often biases the classifier toward the majority class.

To mitigate this issue, we employed a weighted cross-entropy (WCE) loss function, which assigns higher penalties to misclassified minority-class samples. The weight for each class c is defined as

w_{c} = \frac{N}{2 N_{c}}

(9)

where

N

is the total number of training samples,

N_{c}

is the number of samples in class

c

, and the denominator

2

corresponds to the total number of classes. This formulation ensures that the contribution of each class to the loss is inversely proportional to its frequency. Specifically, the minority class receives a larger weight that amplifies its influence during optimization. Conversely, the majority class receives a smaller weight that prevents it from dominating the gradient updates.

The WCE loss is then expressed as

L = - \sum_{i = 1}^{N} {[w}_{1} y_{i} \log (\hat{y_{i}}) + w_{0} (1 - y_{i}) \log (1 - \hat{y_{i}})]

(10)

where

y_{i} ε {0, 1}

is the true label,

{\hat{y}}_{i}

is the predicted probability of the positive class for sample

i

, and

w_{0}

,

w_{1}

are the weights of the negative and positive classes, respectively.

4. Experiment Results and Comparative Analysis

To evaluate the effectiveness of the proposed model, two widely used performance metrics were used: Accuracy and F1-score. These metrics provide complementary perspectives on classification performance.

Accuracy measures the overall proportion of correctly classified samples and is defined as

A C C = \frac{T P + T N}{T P + F P + T N + F N}

(11)

where TP denotes true positives (DR cases correctly identified), TN represents true negatives (non-DR cases correctly identified), FP is false positives (non-DR cases misclassified as DR), and FN denotes false negatives (DR cases missed by the model).

Accuracy provides a global measure of correctness; however, it can be misleading in imbalanced datasets. For instance, if the majority of samples are non-DR, a model biased toward predicting the negative class may achieve high accuracy while failing to detect DR cases, which are clinically the most important.

To address this limitation, the F1-score is reported. The F1-score is then defined as the harmonic mean of precision and recall:

F 1 - s c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l}

(12)

Here, precision measures the reliability of the model’s positive predictions, while Recall measures its ability to capture the positive cases. Unlike accuracy, the F1-score is sensitive to class imbalance. A high F1-score indicates that the model not only identifies DR cases correctly but also avoids excessive false alarms. This makes it a more clinically meaningful metric, since missing DR cases (false negatives) can delay treatment, while excessive false positives may burden healthcare systems with unnecessary follow-ups.

The performance of the proposed model was compared against several state-of-the-art CNN architectures, including LeNet [24], VGG16 [25], ResNet18 [26], and AlexNet [27]. Table 1 summarizes the training, validation, and test accuracies along with the corresponding F1-scores, while the F1-score statistics are pictorially illustrated in Figure 4.

LeNet, being a relatively shallow design, achieved training, validation, and test accuracies of 85.99%, 84.38%, and 80.31%, with F1-scores of 0.79, 0.75, and 0.69, respectively. While LeNet demonstrated stable training behavior, its limited depth and representational capacity restricted its ability to capture the complex retinal features necessary for robust DR classification.

VGG16, with its deeper architecture and uniform 3 × 3 convolutional kernels, slightly improved test accuracy to 81.41% and F1-score to 0.70. However, the model exhibited a drop in validation performance (80.78% accuracy, 0.72 F1-score) compared to training (86.56% accuracy, 0.79 F1-score), suggesting overfitting due to its large parameter count and lack of regularization tailored to medical imaging data.

ResNet18, which incorporates residual connections to mitigate vanishing gradients, achieved very high training accuracy (98.91%) and F1-score (0.98), but its validation and test performance dropped to 83.12% and 82.97% accuracy, with F1-scores of 0.75 and 0.71, respectively. This discrepancy indicates that ResNet18 can fit the training data extremely well, but this overfitting leads to weaker generalization on unseen samples.

AlexNet, with its moderate depth and use of dropout regularization, provided stronger generalization compared to LeNet and VGG16 and more balanced evaluation metrics compared to ResNet18. It achieved validation and test accuracies of 88.44% and 88.12%, with F1-scores of 0.82 and 0.80, respectively. This balance highlights the effectiveness of its architecture in capturing discriminative features while controlling overfitting.

The proposed custom CNN achieved training, validation, and test accuracies of 91.98%, 90.31%, and 87.66%, with F1-scores of 0.86, 0.84, and 0.78, respectively. Although its training accuracy was lower than that of ResNet18 and AlexNet because of its shallow architecture, the proposed model demonstrated the most consistent performance across training, validation, and test sets. This stability reflects the effectiveness of its multi-scale feature fusion and adaptive noise injection, which enhanced generalization by capturing both fine and coarse retinal structures while simulating acquisition variability.

From a clinical perspective, the F1-score is particularly important due to class imbalance. The proposed model’s validation F1-score of 0.84 surpasses all baselines, which indicates superior reliability in detecting DR cases without excessive false alarms. While AlexNet achieved slightly higher test accuracy, the proposed model maintained a stronger balance between accuracy and F1-score, and therefore, it was a more suitable choice for real-world screening scenarios.

5. Discussion

This section provides a comprehensive reflection on the experimental findings and highlights the strengths and limitations of the proposed approach. The discussion examines the discriminative ability of the models through receiver operating characteristic (ROC) analysis, interprets the architectural choices that shaped the observed outcomes, and evaluates the effectiveness and limitations of weighted cross-entropy loss in handling class imbalance. These perspectives provide a deep understanding of the models’ behavior, their clinical applicability, and potential directions for further refinement.

5.1. Evaluation of Models Using ROC Curves

The ROC analysis provides an additional perspective on the discriminative ability of the evaluated classifiers. The area under the ROC curve (AUC) reflects the overall capacity of a model to separate positive and negative cases across varying thresholds, while the optimal threshold indicates the decision point at which the trade-off between maximizing True-Positive Rate (TPR) and minimizing False-Negative Rate (FNR) is optimized. The ROC curves of the baseline models and the proposed model are shown in Figure 5.

LeNet-5 and VGG16 achieved AUC values of 0.803 and 0.801, with optimal thresholds of 0.350 and 0.265, respectively. These moderate AUC scores are consistent with their lower test accuracies (80.31% and 81.41%) and F1-scores (0.69 and 0.70). This confirms that their limited representational capability restricts them to capturing the subtle retinal features necessary for robust classification.

ResNet18 achieved a higher AUC of 0.823 with an optimal threshold of 0.142. The very low threshold suggests that the model must operate with a high TPR bias to compensate for its weaker generalization, which aligns with its tendency to overfit, while training accuracy and F1-score were extremely high (98.91% and 0.98), validation and test F1-scores dropped sharply (0.75 and 0.71). This indicates that despite its theoretical discriminative strength, ResNet18 struggles to generalize effectively.

AlexNet achieved the highest AUC of 0.901 with an optimal threshold of 0.399, demonstrating excellent discriminative ability and strong generalization. Its validation and test F1-scores (0.82 and 0.80) confirm this balance, which makes it a competitive baseline.

The proposed model achieved an AUC of 0.882 with an optimal threshold of 0.134. Although its threshold is low, this does not reflect overfitting. Instead, the model achieves high TPR while keeping FNR under control, as evidenced by its superior validation F1-score of 0.84 and consistent accuracy across training, validation, and test sets (91.98%, 90.31%, and 87.66%). This indicates that the proposed architecture is well-calibrated: it favors detecting DR cases early. From a clinical perspective, this is highly desirable because reducing FNR is critical to avoid missed diagnoses, and the strong F1-score confirms that high TPR is not achieved at the expense of excessive false positives.

5.2. Model Interpretation Using the GRAD-CAM Method

To further understand the decision-making process of the proposed model, the Gradient-weighted Class Activation Mapping (Grad-CAM) [28] method was employed, which highlights the latent regions most influential in classification. This interpretability analysis offers insight into how the model perceives pathological structures and whether its learned representations align with clinically meaningful features. The visualizations can assist clinicians in verifying whether the model attends to clinically meaningful features. This improves transparency and facilitates clinical decision support by guiding attention to potentially abnormal regions, thereby supporting more efficient and reliable diagnosis. Representative Grad-CAM visualizations of the custom model are shown in Figure 6.

The Grad-CAM visualizations revealed that the proposed model is particularly effective at detecting exudates, which are categorized as bright DR features. The model consistently activated regions corresponding to these lesions, indicating that its convolutional filters are well-tuned to capture high-intensity contrasts against the retinal background. This ability to localize exudates demonstrates the model’s strength in identifying features with distinct brightness and sharp boundaries.

In contrast, the model showed greater difficulty in fully characterizing hemorrhages, which are darker lesions with less distinct borders. The GRAD-CAM maps highlighted regions associated with hemorrhages. However, the activations were less precise and sometimes extended into peripheral black regions of the fundus images, as these regions share similar intensity profiles with hemorrhages. This overlap suggests that the model may confuse background artifacts with true pathological features, which leads to weaker latent representations for hemorrhages. This overlap suggests that the model can detect hemorrhage-related latent features, but it may confuse background artifacts with true pathological regions, which leads to weaker and less consistent representations.

This observation provides two important insights. First, it underscores the robustness of the model in detecting bright pathological features, which contributes to its strong F1-score and balanced performance across datasets. Second, it highlights a potential vulnerability in the consistent identification of dark lesions, which could be addressed in future work through targeted preprocessing (e.g., peripheral masking), data augmentation strategies that emphasize hemorrhage patterns, or architectural refinements that enhance sensitivity to low-intensity features.

5.3. Limitations of Weighted-Cross Entropy Loss

A widely adopted approach to address class imbalance in medical imaging tasks is the WCE loss (Equation (10)), where class weights are inversely proportional to class frequencies (Equation (9)). While WCE mitigates the dominance of majority classes, it is not without limitations. Assigning very high weights to minority classes may lead to unstable optimization and overfitting, especially when the number of minority samples is very small. The gradients associated with rare classes can dominate updates that cause oscillations or even divergence in training [29,30]. Furthermore, WCE assumes that all samples within a class contribute equally once weighed. However, in DR datasets, minority samples are highly heterogeneous, and not all images of a severe class contain informative features. WCE therefore amplifies noisy images, which negatively biases the decision boundaries. Another limitation of WCE loss is that it does not distinguish between hard and easy samples. For example, correctly classified minority instances continue to contribute as much as misclassified ones, which is inefficient [31].

These limitations suggest that while WCE is a useful baseline, it is insufficient for highly imbalanced and heterogeneous medical datasets like RFMiD, which necessitate more adaptive approaches for dealing with imbalanced datasets.

6. Conclusions

The early and accurate detection of diabetic retinopathy remains a critical challenge in medical imaging, as timely diagnosis can significantly reduce the risk of vision loss. Automated classification of retinal fundus images offers a scalable solution to support clinicians. However, the complexity of the RFMiD dataset, including its diverse pathological manifestations, varying image quality and image resolution, and class imbalance, makes this task particularly demanding.

In this study, an automated classification algorithm is proposed that combines preprocessing with a custom deep learning architecture. The proposed model incorporated multiscale feature fusion to enable the network to capture both global retinal structures and fine-grained pathological details, while adaptive noise injection improved robustness against variability in fundus image acquisition. These strategies enhanced the model’s ability to generalize across heterogeneous samples.

Despite these advances, certain limitations were observed. The use of a weighted cross-entropy loss partially mitigated class imbalance but did not fully resolve the underrepresentation of rare pathological features. Moreover, while the model demonstrated strong performance in detecting exudates, its interpretability analysis revealed less consistent localization of hemorrhages, likely due to the similarity between dark lesions and peripheral background regions. These findings highlight both the strengths of the proposed approach and the areas where refinement is needed.

Future work will focus on addressing these limitations by exploring more effective loss functions tailored to imbalanced medical datasets. In addition, data augmentation and attention mechanisms will be employed to emphasize underrepresented features like hemorrhages to ensure more balanced learning across lesion types. Such refinements, combined with continued exploration of interpretability methods, will further enhance the reliability and clinical applicability of automated DR screening systems.

Author Contributions

T.A. investigation, methodology, software implementation, writing the original draft, conceptualization, and formal analysis. C.C. funding acquisition, investigation, project administration, validation, supervision, review, and writing—review and editing. S.M. project administration, results verification, formal analysis, resources, writing—review and editing. B.O.A. writing—review and editing, formal analysis. M.E.B. visualization, writing—original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

A publicly available dataset was used in the experimentation and is available at https://riadd.grand-challenge.org/download-all-classes/ (accessed on 17 May 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Haider, S.; Thayakaran, R.; Subramanian, A.; Toulis, K.A.; Moore, D.; Price, M.J.; Nirantharakumar, K. Disease burden of diabetes, diabetic retinopathy and their future projections in the UK: Cross-sectional analyses of a primary care database. BMJ Open 2021, 11, e050058. [Google Scholar] [CrossRef]
Aziz, T.; Charoenlarpnopparut, C.; Mahapakulchai, S. Comparing conventional and deep feature models for classifying fundus photography of hemorrhages. J. Healthc. Eng. 2022, 2022, 7387174. [Google Scholar] [CrossRef] [PubMed]
Kusuhara, S.; Fukushima, Y.; Ogura, S.; Inoue, N.; Uemura, A. Pathophysiology of diabetic retinopathy: The old and the new. Diabetes Metab. J. 2018, 42, 364–376. [Google Scholar] [CrossRef]
Ansari, P.; Tabasumma, N.; Snigdha, N.N.; Siam, N.H.; Panduru, R.V.; Azam, S.; Hannan, J.M.A.; Abdel-Wahab, Y.H. Diabetic retinopathy: An overview on mechanisms, pathophysiology and pharmacotherapy. Diabetology 2022, 3, 159–175. [Google Scholar] [CrossRef]
Parravano, M.; Cennamo, G.; Di Antonio, L.; Grassi, M.O.; Lupidi, M.; Rispoli, M.; Savastano, M.C.; Veritti, D.; Vujosevic, S. Multimodal imaging in diabetic retinopathy and macular edema: An update about biomarkers. Surv. Ophthalmol. 2024, 69, 893–904. [Google Scholar] [CrossRef] [PubMed]
Aziz, T.; Ilesanmi, A.E.; Charoenlarpnopparut, C. Efficient and accurate hemorrhages detection in retinal fundus images using smart window features. Appl. Sci. 2021, 11, 6391. [Google Scholar] [CrossRef]
Patibandla, R.L.; Rao, B.T.; Murty, M.R. Revolutionizing diabetic retinopathy diagnostics and therapy through artificial intelligence: A smart vision initiative. In Transformative Approaches to Patient Literacy and Healthcare Innovation; IGI Global Scientific Publishing: Palmdale, PA, USA, 2024; pp. 136–155. [Google Scholar]
Sher, M.; Sharma, R.; Remyes, D.; Nasef, D.; Nasef, D.; Toma, M. Stratified Multisource Optical Coherence Tomography Integration and Cross-Pathology Validation Framework for Automated Retinal Diagnostics. Appl. Sci. 2025, 15, 4985. [Google Scholar] [CrossRef]
Goutam, B.; Hashmi, M.F.; Geem, Z.W.; Bokde, N.D. A comprehensive review of deep learning strategies in retinal disease diagnosis using fundus images. IEEE Access 2022, 10, 57796–57823. [Google Scholar] [CrossRef]
Ghosh, K.; Bellinger, C.; Corizzo, R.; Branco, P.; Krawczyk, B.; Japkowicz, N. The class imbalance problem in deep learning. Mach. Learn. 2024, 113, 4845–4901. [Google Scholar] [CrossRef]
Liu, X.; Nguyen, T.D. Medical images enhancement by integrating CLAHE with wavelet transform and non-local means denoising. Acad. J. Comput. Inf. Sci. 2024, 7, 52–58. [Google Scholar] [CrossRef]
Tsiknakis, N.; Theodoropoulos, D.; Manikis, G.; Ktistakis, E.; Boutsora, O.; Berto, A.; Scarpa, F.; Scarpa, A.; Fotiadis, D.I.; Marias, K. Deep learning for diabetic retinopathy detection and classification based on fundus images: A review. Comput. Biol. Med. 2021, 135, 104599. [Google Scholar] [CrossRef]
Mehboob, A.; Akram, M.U.; Alghamdi, N.S.; Abdul Salam, A. A deep learning based approach for grading of diabetic retinopathy using large fundus image dataset. Diagnostics 2022, 12, 3084. [Google Scholar] [CrossRef]
Ayala, A.; Ortiz Figueroa, T.; Fernandes, B.; Cruz, F. Diabetic retinopathy improved detection using deep learning. Appl. Sci. 2021, 11, 11970. [Google Scholar] [CrossRef]
Devi, T.M.; Karthikeyan, P.; Muthu Kumar, B.; Manikandakumar, M. Diabetic retinopathy detection via deep learning based dual features integrated classification model. Technol. Health Care 2025, 33, 1066–1080. [Google Scholar] [CrossRef]
Yang, Z.; Tan, T.E.; Shao, Y.; Wong, T.Y.; Li, X. Classification of diabetic retinopathy: Past, present and future. Front. Endocrinol. 2022, 13, 1079217. [Google Scholar] [CrossRef]
Atwany, M.Z.; Sahyoun, A.H.; Yaqub, M. Deep learning techniques for diabetic retinopathy classification: A survey. IEEE Access 2022, 10, 28642–28655. [Google Scholar] [CrossRef]
Mohanty, C.; Mahapatra, S.; Acharya, B.; Kokkoras, F.; Gerogiannis, V.C.; Karamitsos, I.; Kanavos, A. Using deep learning architectures for detection and classification of diabetic retinopathy. Sensors 2023, 23, 5726. [Google Scholar] [CrossRef] [PubMed]
Ali, G.; Dastgir, A.; Iqbal, M.W.; Anwar, M.; Faheem, M. A hybrid convolutional neural network model for automatic diabetic retinopathy classification from fundus images. IEEE J. Transl. Eng. Health Med. 2023, 11, 341–350. [Google Scholar] [CrossRef]
Hayati, M.; Muchtar, K.; Maulina, N.; Syamsuddin, I.; Elwirehardja, G.N.; Pardamean, B. Impact of CLAHE-based image enhancement for diabetic retinopathy classification through deep learning. Procedia Comput. Sci. 2023, 216, 57–66. [Google Scholar] [CrossRef]
Aziz, T.; Charoenlarpnopparut, C.; Mahapakulchai, S. Deep learning-based hemorrhage detection for diabetic retinopathy screening. Sci. Rep. 2023, 13, 1479. [Google Scholar] [CrossRef]
Das, D.; Biswas, S.K.; Bandyopadhyay, S. Detection of diabetic retinopathy using convolutional neural networks for feature extraction and classification (DRFEC). Multimed. Tools Appl. 2023, 82, 29943–30001. [Google Scholar] [CrossRef]
Pachade, S.; Porwal, P.; Thulkar, D.; Kokare, M.; Deshmukh, G.; Sahasrabuddhe, V.; Giancardo, L.; Quellec, G.; Mériaudeau, F. Retinal Fundus Multi-disease Image Dataset (RFMiD). IEEE Dataport 2020, 6, 14. [Google Scholar] [CrossRef]
Zhang, J.; Yu, X.; Lei, X.; Wu, C. A novel deep LeNet-5 convolutional neural network model for image recognition. Comput. Sci. Inf. Syst. 2022, 19, 1463–1480. [Google Scholar] [CrossRef]
Tao, J.; Gu, Y.; Sun, J.; Bie, Y.; Wang, H. Research on vgg16 convolutional neural network feature classification algorithm based on Transfer Learning. In Proceedings of the 2021 2nd China International SAR Symposium (CISS), Shanghai, China, 3–5 November 2021; IEEE: New York, NY, USA; pp. 1–3.
Chen, Z.; Jiang, Y.; Zhang, X.; Zheng, R.; Qiu, R.; Sun, Y.; Zhao, C.; Shang, H. ResNet18DNN: Prediction approach of drug-induced liver injury by deep neural network with ResNet18. Brief. Bioinform. 2022, 23, bbab503. [Google Scholar] [CrossRef] [PubMed]
Siuly, S.; Khare, S.K.; Kabir, E.; Sadiq, M.T.; Wang, H. An efficient Parkinson’s disease detection framework: Leveraging time-frequency representation and AlexNet convolutional neural network. Comput. Biol. Med. 2024, 174, 108462. [Google Scholar] [CrossRef] [PubMed]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-balanced loss based on effective number of samples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9268–9277. [Google Scholar]
Khan, S.; Hayat, M.; Zamir, S.W.; Shen, J.; Shao, L. Striking the right balance with uncertainty. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 103–112. [Google Scholar]

Figure 1. Key limitations of retinal fundus images for Computer-Aided Diagnosis. (a) Poor illumination, (b) low contrast, (c) uneven brightness, (d) varying resolution, (e) different aspect ratios, and (f) inconsistent field of view.

Figure 2. Proposed Algorithm for Automated Diabetic Retinopathy Classification.

Figure 3. Architecture of the Custom CNN Classifier with Multi-Scale Fusion and Adaptive Noise Injection.

Figure 4. Comparison of the Proposed Model with Baseline Classifiers in terms of F1-score.

Figure 5. ROC Analysis of the Proposed Model with Baseline Classifiers.

Figure 6. Latent Feature Visualization for Model Interpretation: Original Fundus Images (a–c) and Grad-CAM Overlays (d–f).

Table 1. Performance comparison of the proposed lightweight model with Baseline Classifiers.

Method	Set	Accuracy (%)	F1-Score
LeNet-5	Train	85.99	0.79
	Validation	84.38	0.75
	Test	80.31	0.69
VGG-16	Train	86.56	0.79
	Validation	80.78	0.72
	Test	81.41	0.70
ResNet18	Train	98.91	0.98
	Validation	83.12	0.75
	Test	82.97	0.71
AlexNet	Train	94.64	0.92
	Validation	88.44	0.82
	Test	88.12	0.80
Proposed Model	Train	91.98	0.86
	Validation	90.31	0.84
	Test	87.66	0.78

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aziz, T.; Charoenlarpnopparut, C.; Mahapakulchai, S.; Ajayi, B.O.; Bamisaye, M.E. Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics. Signals 2026, 7, 34. https://doi.org/10.3390/signals7020034

AMA Style

Aziz T, Charoenlarpnopparut C, Mahapakulchai S, Ajayi BO, Bamisaye ME. Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics. Signals. 2026; 7(2):34. https://doi.org/10.3390/signals7020034

Chicago/Turabian Style

Aziz, Tamoor, Chalie Charoenlarpnopparut, Srijidtra Mahapakulchai, Babatunde Oluwaseun Ajayi, and Mayowa Emmanuel Bamisaye. 2026. "Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics" Signals 7, no. 2: 34. https://doi.org/10.3390/signals7020034

APA Style

Aziz, T., Charoenlarpnopparut, C., Mahapakulchai, S., Ajayi, B. O., & Bamisaye, M. E. (2026). Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics. Signals, 7(2), 34. https://doi.org/10.3390/signals7020034

Article Menu

Custom Deep Learning Framework for Interpreting Diabetic Retinopathy in Healthcare Diagnostics

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Dataset Description

3.2. Image Calibration

3.3. Image Enhancement

3.4. Custom Deep Learning Model

3.5. Weighted Cross-Entropy Loss

4. Experiment Results and Comparative Analysis

5. Discussion

5.1. Evaluation of Models Using ROC Curves

5.2. Model Interpretation Using the GRAD-CAM Method

5.3. Limitations of Weighted-Cross Entropy Loss

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI