Explainable AI for Diabetic Retinopathy: Utilizing YOLO Model on a Novel Dataset
Abstract
1. Introduction
- We provide a unique, carefully chosen collection of retinal fundus images that are particularly made for classifying DR. The dataset has well-labeled images that may be used for both binary (DR vs. Normal) and multi-class (with different levels of severity) classification tasks.
- We use five different types of models (nano, small, medium, large, and extra-large) to do both binary and multi-class DR classification. We do this by using the latest image classification architectures, YOLOv8 (or YOLO 8) and YOLOv11 (or YOLO 11). This comparison of designs and configurations gives a strong picture of how well the model works. Attaining superior performance in both classification tasks using efficient architecture signifies an innovative effectiveness for clinical implementation.
- CLAHE was implemented during preprocessing to enhance the visibility of retinal features. This improved the model’s capacity to recognize subtle clinical patterns and the quality of the images.
- To improve model transparency, we utilize Eigen-CAM to display and elucidate the decision-making process of the best YOLO models. This facilitates a clear comprehension of the image regions that most significantly impacted the classification results. Adding Eigen-CAM not only made it easier to understand, but it also lets ophthalmologists check the location of diseased areas, which is important for clinical use.
- The study provides both automated categorization and visual explainability to help ophthalmologists make decisions.
- We developed an interactive model demonstration with Gradio to provide real-time evaluation. This tool not only enables the presentation and evaluation of the model’s functioning but also establishes a foundation for the future construction of a comprehensive application.
2. Literature Review
YOLO Models
3. Materials and Methods
3.1. Dataset
- Color: RGB;
- Field of view: 133°;
- Optical resolution: 7.3 µm;
- Minimum Pupil Diameter: 2.5 mm;
- Image Format: PNG;
- Width: Varying (1366 pixels and 1920 pixels);
- Height: Varying (991 pixels and 679 pixels).
3.1.1. Cleaning the Dataset
3.1.2. Sample Image
3.1.3. Image Quality
3.2. Splitting, Preprocessing, and Augmenting
3.3. YOLO Models
3.4. Model Parameters
3.5. Performance Metrics
3.6. Model Explainability and Demonstration
4. Results
4.1. Binary Classification
4.2. Multi-Class Classification
4.3. Comparison with Other Pretrained Models
5. Discussion
5.1. Ablation Study
5.2. Limitation and Future Work
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Boyd, K. Diabetic Retinopathy: Causes, Symptoms, Treatment. American Academy of Ophthalmology. Available online: https://www.aao.org/eye-health/diseases/what-is-diabetic-retinopathy (accessed on 16 June 2025).
- Mutawa, A.M.; Al-Sabti, K.; Raizada, S.; Sruthi, S. A Deep Learning Model for Detecting Diabetic Retinopathy Stages with Discrete Wavelet Transform. Appl. Sci. 2024, 14, 4428. [Google Scholar] [CrossRef]
- Macsik, P.; Pavlovicova, J.; Kajan, S.; Goga, J.; Kurilova, V. Image preprocessing-based ensemble deep learning classification of diabetic retinopathy. IET Image Process. 2024, 18, 807–828. [Google Scholar] [CrossRef]
- Zaylaa, A.J.; Kourtian, S. From Pixels to Diagnosis: Early Detection of Diabetic Retinopathy Using Optical Images and Deep Neural Networks. Appl. Sci. 2025, 15, 2684. (In English) [Google Scholar] [CrossRef]
- Renu, D.S.; Saji, K.S. Hybrid deep learning framework for diabetic retinopathy classification with optimized attention AlexNet. Comput. Biol. Med. 2025, 190, 110054. [Google Scholar] [CrossRef]
- Moannaei, M.; Jadidian, F.; Doustmohammadi, T.; Kiapasha, A.M.; Bayani, R.; Rahmani, M.; Jahanbazy, M.R.; Sohrabivafa, F.; Anar, M.A.; Magsudy, A.; et al. Performance and limitation of machine learning algorithms for diabetic retinopathy screening and its application in health management: A meta-analysis. Biomed. Eng. Online 2025, 24, 34. [Google Scholar] [CrossRef]
- Alsohemi, R.; Dardouri, S. Fundus Image-Based Eye Disease Detection Using EfficientNetB3 Architecture. J. Imaging 2025, 11, 279. [Google Scholar] [CrossRef]
- Yu, T.; Shao, A.; Wu, H.; Su, Z.; Shen, W.; Zhou, J.; Lin, X.; Shi, D.; Grzybowski, A.; Wu, J.; et al. A Systematic Review of Advances in AI-Assisted Analysis of Fundus Fluorescein Angiography (FFA) Images: From Detection to Report Generation. Ophthalmol. Ther. 2025, 14, 599–619. [Google Scholar] [CrossRef] [PubMed]
- Seo, H.; Park, S.-J.; Song, M. Diabetic Retinopathy (DR): Mechanisms, Current Therapies, and Emerging Strategies. Cells 2025, 14, 376. (In English) [Google Scholar] [CrossRef] [PubMed]
- Manohar, R.; Aarthi, M.S.; Ancy Jenifer, J. Leveraging Deep Learning For Early Stage Diabetic Retinopathy Detection: A Novel CNN and Transfer Learning Comparison. In Proceedings of the 2024 2nd International Conference on Artificial Intelligence and Machine Learning Applications Theme: Healthcare and Internet of Things (AIMLA), Namakkal, India, 15–16 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Asif, S. DEO-Fusion: Differential evolution optimization for fusion of CNN models in eye disease detection. Biomed. Signal Process. Control. 2025, 107, 107853. [Google Scholar] [CrossRef]
- Zubair, M.; Umair, M.; Naqvi, R.A.; Hussain, D.; Owais, M.; Werghi, N. A comprehensive computer-aided system for an early-stage diagnosis and classification of diabetic macular edema. J. King Saud Univ. Comput. Inf. Sci. 2023, 35, 101719. [Google Scholar] [CrossRef]
- Hussain, M. YOLOv1 to v8: Unveiling Each Variant–A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
- Mao, M.; Hong, M. YOLO Object Detection for Real-Time Fabric Defect Inspection in the Textile Industry: A Review of YOLOv1 to YOLOv11. Sensors 2025, 25, 2270. (In English) [Google Scholar] [CrossRef]
- Du, J. Understanding of Object Detection Based on CNN Family and YOLO. J. Phys. Conf. Ser. 2018, 1004, 012029. (In English) [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar] [CrossRef]
- Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
- Jocher, G.; Stoken, A.; Chaurasia, A.; Borovec, J.; NanoCode; TaoXie; Kwon, Y.; Michael, K.; Liu, C.; Fang, J.; et al. ultralytics/yolov5: v6.0—YOLOv5n ‘Nano’ Models, Roboflow Integration, TensorFlow Export, OpenCV DNN Support; Zenodo: Geneve, Switzerland, 2021. [Google Scholar] [CrossRef]
- Li, C.; Li, L.; Geng, Y.; Jiang, H.; Cheng, M.; Zhang, B.; Ke, Z.; Xu, X.; Chu, X. YOLOv6 v3.0: A full-scale reloading. arXiv 2023, arXiv:2301.05586. [Google Scholar] [CrossRef]
- Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar] [CrossRef]
- Varghese, R.; Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In Proceedings of the 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 18–19 April 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Wang, Y.; Rong, Q.; Hu, C. Ripe tomato detection algorithm based on improved YOLOv9. Plants 2024, 13, 3253. [Google Scholar] [CrossRef] [PubMed]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. YOLOv10: Real-time end-to-end object detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar] [CrossRef]
- Jocher, G.; Qiu, J. Ultralytics YOLO11. Available online: https://docs.ultralytics.com/models/yolo11/ (accessed on 30 September 2024).
- Tian, Y.; Ye, Q.; Doermann, D. YOLOv12: Attention-centric real-time object detectors. arXiv 2025, arXiv:2502.12524. [Google Scholar]
- Zhang, Z.; Zhao, H.; Dong, L.; Luo, L.; Wang, H. A Study on the Interpretability of Diabetic Retinopathy Diagnostic Models. Bioengineering 2025, 12, 1231. [Google Scholar] [CrossRef]
- Şevik, U.; Mutlu, O. Automated Multi-Class Classification of Retinal Pathologies: A Deep Learning Approach to Unified Ophthalmic Screening. Diagnostics 2025, 15, 2745. [Google Scholar] [CrossRef]
- Singh, A.; Jain, S.; Arora, V. A Multi-Model Image Enhancement and Tailored U-Net Architecture for Robust Diabetic Retinopathy Grading. Diagnostics 2025, 15, 2355. [Google Scholar] [CrossRef]
- Youldash, M.; Rahman, A.; Alsayed, M.; Sebiany, A.; Alzayat, J.; Aljishi, N.; Alshammari, G.; Alqahtani, M. Early Detection and Classification of Diabetic Retinopathy: A Deep Learning Approach. AI 2024, 5, 2586–2617. [Google Scholar] [CrossRef]
- Wei, X.; Liu, Y.; Zhang, F.; Geng, L.; Shan, C.; Cao, X.; Xiao, Z. MSTNet: Multi-scale spatial-aware transformer with multi-instance learning for diabetic retinopathy classification. Med. Image Anal. 2025, 102, 103511. [Google Scholar] [CrossRef] [PubMed]
- Hidri, M.S.; Hidri, A.; Alsaif, S.A.; Alahmari, M.; AlShehri, E. Optimal Convolutional Networks for Staging and Detecting of Diabetic Retinopathy. Information 2025, 16, 221. (In English) [Google Scholar] [CrossRef]
- Sharma, N.; Lalwani, P. A multi model deep net with an explainable AI based framework for diabetic retinopathy segmentation and classification. Sci. Rep. 2025, 15, 8777. [Google Scholar] [CrossRef]
- Herrero-Tudela, M.; Romero-Oraá, R.; Hornero, R.; Tobal, G.C.G.; López, M.I.; García, M. An explainable deep-learning model reveals clinical clues in diabetic retinopathy through SHAP. Biomed. Signal Process. Control. 2025, 102, 107328. [Google Scholar] [CrossRef]
- Posham, U.; Bhattacharya, S. Diabetic Retinopathy Detection Using Deep Learning Framework and Explainable Artificial Intelligence Technique. In Proceedings of the 2024 14th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 18–19 January 2024; pp. 411–415. [Google Scholar] [CrossRef]
- Ohri, K.; Kumar, M. Supervised fine-tuned approach for automated detection of diabetic retinopathy. Multimed. Tools Appl. 2024, 83, 14259–14280. [Google Scholar] [CrossRef]
- Naz, H.; Nijhawan, R.; Ahuja, N.J.; Saba, T.; Alamri, F.S.; Rehman, A. Micro-segmentation of retinal image lesions in diabetic retinopathy using energy-based fuzzy C-Means clustering (EFM-FCM). Microsc. Res. Tech. 2024, 87, 78–94. [Google Scholar] [CrossRef]
- Navaneethan, R.; Devarajan, H. Enhancing diabetic retinopathy detection through preprocessing and feature extraction with MGA-CSG algorithm. Expert Syst. Appl. 2024, 249, 123418. [Google Scholar] [CrossRef]
- Manoj S, H.; Bosale, A.A. Detection and Classification of Diabetic Retinopathy using Deep Learning Algorithms for Segmentation to Facilitate Referral Recommendation for Test and Treatment Prediction. arXiv 2024, arXiv:2401.02759. [Google Scholar] [CrossRef]
- Bhimavarapu, U. Diagnosis and multiclass classification of diabetic retinopathy using enhanced multi thresholding optimization algorithms and improved Naive Bayes classifier. Multimed. Tools Appl. 2024, 83, 81325–81359. [Google Scholar] [CrossRef]
- Nahiduzzaman, M.; Islam, M.R.; Goni, M.O.F.; Anower, M.S.; Ahsan, M.; Haider, J.; Kowalski, M. Diabetic Retinopathy Identification Using Parallel Convolutional Neural Network Based Feature Extractor and ELM Classifier. Expert Syst. Appl. 2023, 217, 119557. [Google Scholar] [CrossRef]
- Li, Z.; Han, Y.; Yang, X. Multi-Fundus Diseases Classification Using Retinal Optical Coherence Tomography Images with Swin Transformer V2. J. Imaging 2023, 9, 203. [Google Scholar] [CrossRef]
- Acosta-Jiménez, S.; Maeda-Gutiérrez, V.; Galván-Tejada, C.E.; Mendoza-Mendoza, M.M.; Reveles-Gómez, L.C.; Celaya-Padilla, J.M.; Galván-Tejada, J.I.; García-Domínguez, A. Assessing ResNeXt and RegNet Models for Diabetic Retinopathy Classification: A Comprehensive Comparative Study. Diagnostics 2025, 15, 1966. [Google Scholar] [CrossRef]
- Touati, M.; Touati, R.; Nana, L.; Benzarti, F.; Ben Yahia, S. DRCCT: Enhancing Diabetic Retinopathy Classification with a Compact Convolutional Transformer. Big Data Cogn. Comput. 2025, 9, 9. [Google Scholar] [CrossRef]
- Ema, R.R.; Shill, P.C. Multi-model approach for precise lesion localization and severity grading for diabetic retinopathy and age-related macular degeneration. Front. Comput. Sci. 2025, 7, 1497929. (In English) [Google Scholar] [CrossRef]
- Geetha, D.A.; Lakshmi, T.H.; Sagar, K.V.; Chaitanya, M.; Kantamaneni, S.; Battula, V.V.R.; Borra, S.P.R.; Meena, P.; Gupta, P.; Agarwal, D.S.; et al. Detection and Classification of Diabetic Retinopathy Using YOLO-V8 Deep Learning Methodology. J. Theor. Appl. Inf. Technol. 2024, 102, 7580–7588. [Google Scholar]
- Rizzieri, N.; Dall’asta, L.; Ozoliņš, M. Diabetic Retinopathy Features Segmentation without Coding Experience with Computer Vision Models YOLOv8 and YOLOv9. Vision 2024, 8, 48. (In English) [Google Scholar] [CrossRef] [PubMed]
- Zhang, B.; Li, J.; Bai, Y.; Jiang, Q.; Yan, B.; Wang, Z. An Improved Microaneurysm Detection Model Based on SwinIR and YOLOv8. Bioengineering 2023, 10, 1405. (In English) [Google Scholar] [CrossRef] [PubMed]
- Sait, A.R.W. A Lightweight Diabetic Retinopathy Detection Model Using a Deep-Learning Technique. Diagnostics 2023, 13, 3120. [Google Scholar] [CrossRef]
- L., R.; Padyana, A. Detection of Diabetic Retinopathy in Retinal Fundus Image Using YOLO-RF Model. In Proceedings of the 2021 Sixth International Conference on Image Information Processing (ICIIP), Shimla, India, 26–28 November 2021; Volume 6, pp. 105–109. [Google Scholar] [CrossRef]
- Santos, C.; Aguiar, M.; Welfer, D.; Belloni, B. A New Approach for Detecting Fundus Lesions Using Image Processing and Deep Neural Network Architecture Based on YOLO Model. Sensors 2022, 22, 6441. [Google Scholar] [CrossRef] [PubMed]
- Mahapadi, A.A.; Shirsath, V.; Pundge, A. Real-Time Diabetic Retinopathy Detection Using YOLO-v10 with Nature-Inspired Optimization. Biomed. Mater. Devices 2025, 3, 1–23. [Google Scholar] [CrossRef]
- Liao, Y.; Li, L.; Xiao, H.; Xu, F.; Shan, B.; Yin, H. YOLO-MECD: Citrus Detection Algorithm Based on YOLOv11. Agronomy 2025, 15, 687. [Google Scholar] [CrossRef]
- Dihin, R.A.; AlShemmary, E.N.; Al-Jawher, W.A.M. Wavelet-Attention Swin for Automatic Diabetic Retinopathy Classification. Baghdad Sci. J. 2024, 21, 2741–2756. [Google Scholar] [CrossRef]
- Abid, A.; Abdalla, A.; Abid, A.; Khan, D.; Alfozan, A.; Zou, J. Gradio: Hassle-free sharing and testing of ML models in the wild. arXiv 2019, arXiv:1906.02569. [Google Scholar] [CrossRef]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-Reference Image Quality Assessment in the Spatial Domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef]
- Ž, T.; Pintarić, N.; Saulig, N. Blind Image Quality Assessment Score for Humanities Online Digital Repositories. In Proceedings of the 2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), Split, Croatia, 25–28 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
- Setiawan, A.W.; Mengko, T.R.; Santoso, O.S.; Suksmono, A.B. Color retinal image enhancement using CLAHE. In Proceedings of the International Conference on ICT for Smart Society, Yogyakarta, Indonesia, 13–14 June 2013; pp. 1–3. [Google Scholar] [CrossRef]
- Ultralytics. Data Preprocessing Techniques for Annotated Computer Vision Data. Available online: https://docs.ultralytics.com/guides/preprocessing_annotated_data/ (accessed on 26 April 2025).
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. RandAugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); Seattle, WA, USA, 13–19 June 2020, pp. 702–703.
- Ali, M.L.; Zhang, Z. The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection. Computers 2024, 13, 336. [Google Scholar] [CrossRef]
- Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
- Hussain, M. YOLOv5, YOLOv8 and YOLOv10: The go-to detectors for real-time vision. arXiv 2024, arXiv:2407.02988. [Google Scholar]
- Liu, Y.; Liu, Y.; Guo, X.; Ling, X.; Geng, Q. Metal surface defect detection using SLF-YOLO enhanced YOLOv8 model. Sci. Rep. 2025, 15, 11105. [Google Scholar] [CrossRef] [PubMed]
- Glenn, J.; Ayush, C.; Jing, Q. Ultralytics YOLOv8. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 26 September 2024).
- Khanam, R.; Hussain, M. YOLOv11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]
- Sun, R.-Y. Optimization for Deep Learning: An Overview. J. Oper. Res. Soc. China 2020, 8, 249–294. [Google Scholar] [CrossRef]
- Fan, J.; Upadhye, S.; Worster, A. Understanding receiver operating characteristic (ROC) curves. Can. J. Emerg. Med. 2006, 8, 19–20. [Google Scholar] [CrossRef]
- Muhammad, M.B.; Yeasin, M. Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks. SN Comput. Sci. 2021, 2, 47. [Google Scholar] [CrossRef]
- Nandhini, E.; Vadivu, G. Convolutional Neural Network-Based Multi-Fruit Classification and Quality Grading with a Gradio Interface. In Proceedings of the 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 12–13 December 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Selvaraj, J.; Sadaf, K.; Aslam, S.M.; Umapathy, S. Multiclassification of Colorectal Polyps from Colonoscopy Images Using AI for Early Diagnosis. Diagnostics 2025, 15, 1285. (In English) [Google Scholar] [CrossRef]
- Cuadros, J.; Bresnick, G. EyePACS: An adaptable telemedicine system for diabetic retinopathy screening. J. Diabetes Sci. Technol. 2009, 3, 509–516. (In English) [Google Scholar] [CrossRef] [PubMed]




















| YOLO Version | Year | Key Architectural Innovation | Core Methodological Improvement | Reference |
|---|---|---|---|---|
| YOLOv1 | 2015 | Darknet based custom CNN | First end-to-end detection, greatly improving speed, and single staged real time prediction. | [16] |
| YOLOv2 | 2016 | Darknet-19 | Introduced anchor boxes and dimension clusters for better localization; used batch normalization for stability. | [17] |
| YOLOv3 | 2018 | Darknet-53 | Introduced multi-scale detection via a Feature Pyramid Network (FPN) mechanism for improved small object detection. | [18] |
| YOLOv4 | 2020 | CSPDarknet53 | Incorporated numerous ‘Bag of Freebies’ (e.g., CutMix, Mosaic) for significant performance gains. | [19] |
| YOLOv5 | 2020 | PyTorch 1.8.0 Implementation, Focus Layer | Introduced Focus Layer (later replaced by Conv) and major pipeline optimizations; emphasized efficiency and accessibility. | [20] |
| YOLOv6 | 2022 | EfficientRep Backbone | Added bidirectional concatenation module to improve localization signals and auxiliary regression branch during training. | [21] |
| YOLOv7 | 2022 | E-ELAN Block, Trainable Bag-of-Freebies | Introduced extended efficient layer aggregation networks (E-ELAN) for better parameter usage and re-parameterization techniques. | [22] |
| YOLOv8 | 2023 | C2f Block (C3-like), Decoupled Head | Used a simpler C2f module and replaced the coupling of classification and detection with a decoupled head for improved convergence. | [23] |
| YOLOv9 | 2024 | Generalized Efficient Layer Aggregation Network (GELAN) | Introduced to prevent information loss in deep networks, enhance data utilization and boosting performance. | [24] |
| YOLOv10 | 2024 | Enhanced version of CSPNet | Including large-kernel convolutions and partial self-attention modules to improve performance. | [25] |
| YOLO 11 | 2024 | Enhanced version of CSPNet | Expanded capabilities across multiple computer vision tasks (object detection, instance segmentation, classification). | [26] |
| YOLO 12 | 2025 | Residual Efficient Layer Aggregation Networks | Attention-centric architecture | [27] |
| Reference, Year | Dataset | Dataset Size | Methodology | Findings | Limitations |
|---|---|---|---|---|---|
| [4], 2025 | APTOS with 5 classes | 3662 images | Employed different pretrained models | ResNet-50 and GoogleNet models outperform other pretrained models with 93.56% accuracy. | The model must be externally validated to be generalized. |
| [5], 2025 | APTOS and EyePACS with 5 classes | APTOS-3662 EyePACS-35,126 | Attention block with AlexNet model | The model reached 99.51% accuracy for APTOS data and 99.43% for EyePACS data. | The model must be externally validated to be generalized. |
| [32], 2025 | APTOS, Messidor-5 classes, RFMiD2020, IDRiD-binary | APTOS-3662, RFMiD2020-1900 Messidor-1200 IDRiD-516 | A transformer network is utilized for the classification model. | The APTOS and RFMiD2020 datasets performed better accuracy with 97% each. | Different evaluation metrics are used for different datasets. |
| [34], 2025 | DiaRetDB1-binary, APTOS and EyePACS-5 classes. | DiaRetDB1-1008 APTOS-7170 EyePACS-21,600 | Different feature extraction and optimization algorithms are employed with the U-Net classification model. | The model achieved 99% on all three datasets. | Despite the model’s lightweight architecture, real-time applications may still encounter difficulties due to the complexity of fundus images. |
| [35], 2025 | APTOS-2019, EyePACS, DDR, IDRiD, and SUSTech-SYSU | APTOS-3662 EyePACS-35,126 SUSTech-SYSU-1219 DDR-12,552 IDRiD-516 | Employed different pretrained models and SHAP explainability for the model. | Achieved an accuracy of 89% when tested on the SUSTech-SYSU dataset. | Despite the model’s lightweight architecture, real-time applications may still encounter difficulties due to the complexity of fundus images. |
| [46], 2025 | KLC and Shiromon1 datasets with binary classification | KLC-8000 Shiromon1-10,000 | Uses YOLO model for localization, CNN, RF, and SVM for classification | With CNN-RF, the model shows 98.81% accuracy for KLC data and 92.11% with the Shiromon1 dataset. | The necessity for additional inquiry into the interpretability and explainability of models in medical applications is overlooked. |
| [47], 2024 | Not specified 5 classes | - | The YOLOv8 model is compared with CNN, SVM, VGG16, and ResNet50. | VGG16 shows 63.47% accuracy compared to other models. | The study needs to evaluate more performance metrics, external validation, and the model’s interpretability. |
| [55], 2024 | APTOS with 5 classes | 3662 images | Wavelet with Swin Transformer model | The accuracy of classification was enhanced. | The study employed just a single image set for evaluating the model. |
| [50], 2023 | APTOS and EyePACS with 5 classes | APTOS-5590 EyePACS-35,100 | Employed YOLOv7 for feature extraction and MobileNetV3 for classification. | The model achieved 98% for APTOS and 98.4% for the EyePACS dataset. | Despite the model’s lightweight architecture, real-time applications may still encounter difficulties due to the complexity of fundus images. |
| [51], 2021 | EyePACS and IDRiD dataset with 5 classes | EyePACS-35,100 IDRiD-516 | The YOLO-RF model is compared to YOLO, RF, SVM, and Decision Tree. | The model achieved 99.3% accuracy compared to other models. | Only accuracy, precision, and recall are provided. The model must be externally validated to be generalized. |
| Stages | Normal | Mild | Moderate | Severe | PDR |
|---|---|---|---|---|---|
| Before cleaning | 426 | 81 | 113 | 84 | 392 |
| After cleaning | 319 | 69 | 92 | 69 | 257 |
| For binary classification | 319 | 487 | |||
| Metrics | Original | Noisy | Blurry |
|---|---|---|---|
| BRISQUE | 30.487 | 48.618 | 50.072 |
| NIQE | 3.709 | 12.438 | 5.593 |
| PIQE | 8.099 | 74.114 | 91.703 |
| Classes | Training | Validation | Testing |
|---|---|---|---|
| Normal | 279 | 18 | 22 |
| Mild | 54 | 10 | 5 |
| Moderate | 63 | 13 | 16 |
| Severe | 48 | 10 | 11 |
| PDR | 195 | 34 | 28 |
| Model | Layers | Params (M) | FLOPs (B) | Architectural Difference |
|---|---|---|---|---|
| YOLO11n | 86 | 1.53 | 3.3 | Employed C3k2 block in backbone and added C2PSA to improve spatial attention |
| YOLO11s | 86 | 5.46 | 12.1 | |
| YOLO11m | 106 | 10.36 | 39.6 | |
| YOLO11l | 176 | 12.84 | 49.8 | |
| YOLO11x | 176 | 28.36 | 111.0 | |
| YOLOv8n | 56 | 1.44 | 3.4 | Dynamic Kernel Attention, Path Aggregation Network |
| YOLOv8s | 56 | 5.08 | 12.6 | |
| YOLOv8m | 80 | 15.77 | 41.9 | |
| YOLOv8l | 104 | 36.20 | 99.1 | |
| YOLOv8x | 104 | 56.14 | 154.3 |
| Parameter | Value |
|---|---|
| Image size | 640 × 640 × 3 |
| Optimizer | AdamW (auto select according to the dataset) |
| Learning rate | 0.01 |
| Batch size | 16 |
| Cosine learning rate scheduler | True |
| Epochs | 100 |
| Patience | 15 |
| Models | EarlyStopping | Final Learning Rate | Training Time (hours) | Inference Latency (ms) |
|---|---|---|---|---|
| YOLO 8n | 55 | 0.0016 | 0.033 | 1.1 |
| YOLO 8s | 35 | 0.0016 | 0.026 | 2.0 |
| YOLO 8m | 39 | 0.0016 | 0.026 | 3.8 |
| YOLO 8l | 33 | 0.0016 | 0.021 | 4.7 |
| YOLO 8x | 43 | 0.0016 | 0.042 | 6.6 |
| YOLO 11n | 33 | 0.0016 | 0.028 | 8.7 |
| YOLO 11s | 33 | 0.0016 | 0.030 | 13.2 |
| YOLO 11m | 25 | 0.0016 | 0.024 | 7.1 |
| YOLO 11l | 29 | 0.0016 | 0.026 | 11.6 |
| YOLO 11x | 34 | 0.0016 | 0.036 | 16.8 |
| Models | Validation Accuracy | Testing Accuracy | Precision | Recall | F1-Score | MCC |
|---|---|---|---|---|---|---|
| YOLO 8n | 0.9601 | 0.9431 | 0.9436 | 0.9428 | 0.9430 | 0.8864 |
| YOLO 8s | 0.9504 | 0.9593 | 0.9593 | 0.9594 | 0.9593 | 0.9187 |
| YOLO 8m | 0.9582 | 0.9621 | 0.9620 | 0.9621 | 0.9621 | 0.9242 |
| YOLO 8l | 0.9598 | 0.9539 | 0.9543 | 0.9543 | 0.9539 | 0.9085 |
| YOLO 8x | 0.9600 | 0.9621 | 0.9626 | 0.9625 | 0.9621 | 0.9250 |
| YOLO 11n | 0.9640 | 0.9621 | 0.9623 | 0.9624 | 0.9621 | 0.9246 |
| YOLO 11s | 0.9651 | 0.9566 | 0.9571 | 0.9570 | 0.9566 | 0.9142 |
| YOLO 11m | 0.9541 | 0.9521 | 0.9512 | 0.9513 | 0.9512 | 0.9025 |
| YOLO 11l | 0.9524 | 0.9702 | 0.9702 | 0.9701 | 0.9702 | 0.9404 |
| YOLO 11x | 0.9630 | 0.9539 | 0.9540 | 0.9542 | 0.9539 | 0.9082 |
| Models | EarlyStopping | Final Learning Rate | Training Time (hours) | Inference Latency (ms) |
|---|---|---|---|---|
| YOLO 8n | 23 | 0.0011 | 0.015 | 1.6 |
| YOLO 8s | 24 | 0.0011 | 0.016 | 3.8 |
| YOLO 8m | 24 | 0.0011 | 0.020 | 6.5 |
| YOLO 8l | 23 | 0.0011 | 0.021 | 11.2 |
| YOLO 8x | 18 | 0.0011 | 0.018 | 16.8 |
| YOLO 11n | 25 | 0.0011 | 0.024 | 8.3 |
| YOLO 11s | 26 | 0.0011 | 0.025 | 5.7 |
| YOLO 11m | 23 | 0.0011 | 0.024 | 20.2 |
| YOLO 11l | 28 | 0.0011 | 0.029 | 11.6 |
| YOLO 11x | 30 | 0.0011 | 0.032 | 9.2 |
| Models | Validation Accuracy | Testing Accuracy | Precision | Recall | F1-Score | MCC |
|---|---|---|---|---|---|---|
| YOLO 8n | 0.7680 | 0.7751 | 0.7521 | 0.7751 | 0.7559 | 0.6529 |
| YOLO 8s | 0.7818 | 0.7642 | 0.7483 | 0.7642 | 0.7514 | 0.6385 |
| YOLO 8m | 0.7956 | 0.7642 | 0.7608 | 0.7642 | 0.7576 | 0.6468 |
| YOLO 8l | 0.7680 | 0.7561 | 0.7368 | 0.7561 | 0.7427 | 0.6244 |
| YOLO 8x | 0.8228 | 0.8012 | 0.8514 | 0.8012 | 0.8143 | 0.7288 |
| YOLO 11n | 0.7983 | 0.7642 | 0.7503 | 0.7642 | 0.7529 | 0.6390 |
| YOLO 11s | 0.8066 | 0.7751 | 0.7671 | 0.7751 | 0.7700 | 0.6583 |
| YOLO 11m | 0.7707 | 0.7805 | 0.7585 | 0.7804 | 0.7646 | 0.6643 |
| YOLO 11l | 0.7845 | 0.7696 | 0.7585 | 0.7696 | 0.7467 | 0.6517 |
| YOLO 11x | 0.8039 | 0.7886 | 0.7774 | 0.7886 | 0.7816 | 0.6777 |
| Classification | Models | Testing Accuracy | Precision | Recall | F1-Score | AUC |
|---|---|---|---|---|---|---|
| Binary | ConvNeXtSmall | 0.61 | 0.37 | 0.61 | 0.46 | 0.50 |
| ConvNeXtBase | 0.62 | 0.76 | 0.61 | 0.47 | 0.51 | |
| ConvNeXtTiny | 0.60 | 0.30 | 0.50 | 0.38 | 0.50 | |
| EfficientNetV2S | 0.60 | 0.30 | 0.50 | 0.38 | 0.50 | |
| EfficientNetB0 | 0.61 | 0.37 | 0.61 | 0.46 | 0.50 | |
| ViT | 0.61 | 0.37 | 0.61 | 0.46 | 0.50 | |
| 5-class | ConvNeXtSmall | 0.65 | 0.54 | 0.65 | 0.58 | 0.89 |
| ConvNeXtBase | 0.68 | 0.59 | 0.68 | 0.63 | 0.89 | |
| ConvNeXtTiny | 0.66 | 0.55 | 0.66 | 0.59 | 0.88 | |
| EfficientNetV2S | 0.69 | 0.60 | 0.69 | 0.63 | 0.90 | |
| EfficientNetB0 | 0.66 | 0.55 | 0.66 | 0.59 | 0.85 | |
| ViT | 0.39 | 0.15 | 0.39 | 0.22 | 0.47 |
| Classification | Models | Val Accuracy | Test Accuracy | Precision | Recall | F1-Score | MCC | Avg Rank |
|---|---|---|---|---|---|---|---|---|
| Binary, Chi-square = 35.77 p-value = 4.33 × 10−5 | YOLO 8n | 4 | 10 | 10 | 10 | 10 | 10 | 9 |
| YOLO 8s | 10 | 5 | 5 | 5 | 5 | 5 | 5.83 | |
| YOLO 8m | 7 | 3 | 4 | 4 | 3 | 4 | 4.16 | |
| YOLO 8l | 6 | 7.5 | 7 | 7 | 7.5 | 7 | 7 | |
| YOLO 8x | 5 | 3 | 2 | 2 | 3 | 2 | 2.83 | |
| YOLO 11n | 2 | 3 | 3 | 3 | 3 | 3 | 2.83 | |
| YOLO 11s | 1 | 6 | 6 | 6 | 6 | 6 | 5.16 | |
| YOLO 11m | 8 | 9 | 9 | 9 | 9 | 9 | 8.83 | |
| YOLO 11l | 9 | 1 | 1 | 1 | 1 | 1 | 2.33 | |
| YOLO 11x | 3 | 7.5 | 8 | 8 | 7.5 | 8 | 7 | |
| 5-class Chi-square =44.76 p-value = 1.02 × 10−6 | YOLO 8n | 9.5 | 4.5 | 7 | 4.5 | 6 | 5 | 6.08 |
| YOLO 8s | 7 | 8 | 9 | 8 | 8 | 9 | 8.16 | |
| YOLO 8m | 5 | 8 | 4 | 8 | 5 | 7 | 6.16 | |
| YOLO 8l | 9.5 | 10 | 10 | 10 | 10 | 10 | 9.91 | |
| YOLO 8x | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| YOLO 11n | 4 | 8 | 8 | 8 | 7 | 8 | 7.16 | |
| YOLO 11s | 2 | 4.5 | 3 | 4.5 | 3 | 4 | 3.5 | |
| YOLO 11m | 8 | 3 | 5.5 | 3 | 4 | 3 | 4.41 | |
| YOLO 11l | 6 | 6 | 5.5 | 6 | 9 | 6 | 6.41 | |
| YOLO 11x | 3 | 2 | 2 | 2 | 2 | 2 | 2.16 |
| Reference, Year | Dataset | Methodology | Findings |
|---|---|---|---|
| [46], 2025 | KLC and Shiromon1 datasets with binary classification | YOLO model for localization CNN, RF, and SVM for classification | With CNN-RF, the model shows 98.81% accuracy for the KLC data and 92.11% with the Shiromon1 dataset. |
| [47], 2024 | Dataset Not specified 5 classes | The YOLOv8 model is compared with CNN, SVM, VGG16, and ResNet50. | VGG16 shows 63.47% accuracy compared to other models. |
| [50], 2023 | APTOS and EyePACS with 5 classes | YOLOv7 for feature extraction MobileNetV3 for classification. | Accuracy: 98% for APTOS and 98.4% for the EyePACS dataset. |
| Proposed Work | Own Dataset | YOLO Version 8 and 11 for classification | Accuracy: 97.02% for binary and 80.12% for multiclass. |
| Classification | CLAHE | Image Size | Accuracy | Precision | F1 Score |
|---|---|---|---|---|---|
| Binary model | No | 640 × 640 | 0.9398 | 0.9789 | 0.9489 |
| No | 240 × 240 | 0.9036 | 0.9775 | 0.9159 | |
| 5-class model | No | 640 × 640 | 0.7711 | 0.7331 | 0.7461 |
| No | 240 × 240 | 0.7651 | 0.7541 | 0.7424 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mutawa, A.M.; Al Sabti, K.; Raizada, S.; Sruthi, S. Explainable AI for Diabetic Retinopathy: Utilizing YOLO Model on a Novel Dataset. AI 2025, 6, 301. https://doi.org/10.3390/ai6120301
Mutawa AM, Al Sabti K, Raizada S, Sruthi S. Explainable AI for Diabetic Retinopathy: Utilizing YOLO Model on a Novel Dataset. AI. 2025; 6(12):301. https://doi.org/10.3390/ai6120301
Chicago/Turabian StyleMutawa, A. M., Khalid Al Sabti, Seemant Raizada, and Sai Sruthi. 2025. "Explainable AI for Diabetic Retinopathy: Utilizing YOLO Model on a Novel Dataset" AI 6, no. 12: 301. https://doi.org/10.3390/ai6120301
APA StyleMutawa, A. M., Al Sabti, K., Raizada, S., & Sruthi, S. (2025). Explainable AI for Diabetic Retinopathy: Utilizing YOLO Model on a Novel Dataset. AI, 6(12), 301. https://doi.org/10.3390/ai6120301

