A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data
Abstract
1. Introduction
Primary Clinical Promise and Contributions
- A hierarchical deep learning architecture that models the clinical dependency between retinal diseases (e.g., DR as a risk factor for AMD).
- A cross-modal translation mechanism enables the model to learn specific features of fundus images, which are essential for DR staging, even when paired datasets are limited.
- A contrast-equalization bridge that aligns domain-specific features, improving calibration and accuracy in real-world clinical settings.
2. Methods
2.1. Overview of the System Architecture
2.2. Operation of HMS Components
2.2.1. The Parent Model
2.2.2. The Child Model
3. Results
3.1. Creating a Data Set
3.2. Label Harmonization and Grading Protocol
Harmonization Rules
- In-house OCT: AMD staging labels are based on AREDS-derived OCT interpretation; these stages correspond to the binary AMD label used for the parent model, whereas NORM/DR/DME labels adhere to the internal clinical annotation protocol established during dataset curation.
- OCTID: the original dataset provides disease-level OCT categories, distinguishing between options such as Normal and DR. As a result, OCTID samples were used only to support the corresponding binary parent labels (NORM or DR) and were not used for DR staging.
- OCT–fundus dataset: fundus images are utilized for DR staging based on the ICDR scale. In contrast, OCT images are used to establish the parent labels (NORM, AMD, DR, or DME) according to the diagnostic framework provided by the dataset.
3.3. A Comparison of Backbone Architectures for Choosing a Base Classifier Model
3.4. Calibrating Thresholds to Compensate for Class Imbalance
3.5. Outcomes of the Parent Model Operation
3.6. Results from the Specialized AMD Staging Module
3.7. The Results from the Specialized Module for Staging DR
3.8. The Cross-Modal Bridge and Analysis of Cross-Modal Inconsistencies
- Geometric Alignment (): Directly minimizes the metric distance and maximizes angular similarity between projected OCT and fundus embeddings.
- Contrastive & Structural Constraints (): ensures discriminative separation of positive/negative pairs, while clusters embeddings around class centroids to preserve semantic separability.
- Statistical Distribution Matching (): These regularizers align the higher-order moments of feature distributions to help reduce domain shift.
- Critical components: Removing or results in the largest drops in fundus agreement (3.91 and 3.13 percentage points, respectively). This confirms that contrastive pressure and angular alignment are the primary contributors to cross-modal transfer.
- Secondary regularizers: Removing , , or leads to minor but significant decreases in agreement (2.35, 1.57, and 1.56 percentage points, respectively), suggesting that these terms help stabilize the mapping and enhance semantic consistency.
- Lightweight distribution matching: Removing has a relatively modest effect of 0.78 percentage points, indicating that it primarily functions as an auxiliary regularizer in our context.
Failure Taxonomy for OCT-Only DR Staging When Fundus Is Absent
3.9. Calibration Assessment, Risk Interpretation in Clinical Scenarios, and Computational Efficiency
3.10. Comparative Analysis of Model Effectiveness in Diagnosis and Staging
3.11. Cross-Scan Validation and Robustness to Domain Shifts
4. Discussion
- Anatomical Field-of-View (FOV) Constraints. The most significant clinical limitation is the discrepancy in FOV between macular OCT and wide-field fundus photography. As detailed in our failure taxonomy (Table 2), peripheral diabetic lesions (e.g., NVE elsewhere) are optically invisible to standard macular OCT. Consequently, our OCT-only DR staging is inherently limited to macula-correlated signs. To address this issue, we have implemented a QC gate () that typically flags such ambiguous cases for mandatory fundus review. However, there remains a risk of under-staging diseases that are present exclusively in the peripheral retina.
- Unpaired Training Risks. Relying on unpaired cross-modal translation carries the risk of semantic misalignment, leading the system to generate plausible fundus features that do not actually exist in the OCT images. While our contrastive constraints and validation on the 128-pair holdout reduce this risk, the system’s reliability when encountering rare, anomalous pathologies that were not included in the training data remains untested.
- Dataset and Labeling Bias. The study relies primarily on single-center annotations (Optimed), which may introduce institutional bias. Furthermore, the dataset-specific heuristic penalty was implemented to stabilize training due to the limited availability of AMD–DR co-labels. Although our audit (see Section S4.12) confirms that strong co-signals are still detected, we recommend relaxing this constraint as larger, more diverse comorbidity datasets become available.
5. Practical Deployment Workflow and Audit Logic
5.1. When OCT-Only Is Sufficient vs. When Fundus Is Required
- Step 1 (OCT acquisition and parent screening). In an OCT scan, the parent model predicts probabilities for NORM, AMD, DR, and DME. If all pathology probabilities remain below their calibrated operating thresholds, the case is classified as low risk and can be scheduled for routine follow-up.
- Step 2 (Specialist staging). If AMD is detected, the AMD specialist module performs OCT-based staging using AREDS-based interpretation. If DR is detected, the system attempts OCT-only DR staging via the OCT → Fundus latent bridge.
- Step 3 (Quality control gate for OCT-only DR staging). OCT-only DR staging is accepted only if the cross-modal projection quality score (cosine similarity between the projected OCT embedding and the reference fundus embedding) satisfies , which covers 96.1% of paired hold-out cases in our study. If , the system defers staging and recommends fundus imaging and/or expert review, acting as a conservative fail-safe.
5.2. Referral Triggers and Safety-Oriented Deferral Rules
- Fundus required: (i) for OCT-only DR staging; (ii) suspected severe DR/PDR near stage boundaries; (iii) poor OCT quality (motion/shadowing) that is associated with reduced bridge agreement.
- OCT-only acceptable: AMD staging and DME assessment are performed on OCT directly; DR staging is accepted only when the QC gate is satisfied ().
- Referral recommendation: any high-confidence pathology prediction (above the calibrated per-class threshold) combined with uncertainty signals (near-threshold probabilities or QC failure) triggers referral for confirmatory imaging and clinical adjudication.
5.3. Post-Deployment Auditing over Time
6. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bhatwadekar, A.D.; Shughoury, A.; Belamkar, A.; Ciulla, T.A. Genetics of Diabetic Retinopathy, a Leading Cause of Irreversible Blindness in the Industrialized World. Genes 2021, 12, 1200. [Google Scholar] [CrossRef]
- Benhamza, M.; Dahlui, M.; Said, M.A. Determining direct, indirect healthcare and social costs for diabetic retinopathy management: A systematic review. BMC Ophthalmol. 2024, 24, 424. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Zhang, Y.; Zhang, N.; Ma, X. Clinical trial landscape of diabetic retinopathy: Global advancements and future directions. Int. J. Surg. 2025. [Google Scholar] [CrossRef] [PubMed]
- Sakini, A.S.A.; Hamid, A.K.; Alkhuzaie, Z.A.; Al-Aish, S.T.; Al-Zubaidi, S.; Tayem, A.A.; Alobi, M.A.; Sakini, A.S.A.; Al-Aish, R.T.; Al-Shami, K.; et al. Diabetic macular edema (DME): Dissecting pathogenesis, prognostication, diagnostic modalities along with current and futuristic therapeutic insights. Int. J. Retina Vitreous 2024, 10, 83. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Stahl, A. The Diagnosis and Treatment of Age-Related Macular Degeneration. Dtsch. Arztebl. Int. 2020, 117, 513–520. [Google Scholar] [CrossRef]
- Teo, Z.L.; Tham, Y.C.; Yu, M.; Chee, M.L.; Rim, T.H.; Cheung, N.; Bikbov, M.M.; Wang, Y.X.; Tang, Y.; Lu, Y.; et al. Global Prevalence of Diabetic Retinopathy and Projection of Burden through 2045: Systematic Review and Meta-analysis. Ophthalmology 2021, 128, 1580–1591. [Google Scholar] [CrossRef] [PubMed]
- Cheung, N.; Cheung, C.M.G.; Talks, S.J.; Wong, T.Y. Management of diabetic macular oedema: New insights and global implications of DRCR protocol V. Eye 2020, 34, 999–1002. [Google Scholar] [CrossRef]
- Ratnapriya, R.; Chew, E.Y. Age-related macular degeneration—clinical review and genetics update. Clin. Genet. 2013, 84, 160–166. [Google Scholar] [CrossRef]
- Wong, W.L.; Su, X.; Li, X.; Cheung, C.M.G.; Klein, R.; Cheng, C.-Y.; Wong, T.Y. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: A systematic review and meta-analysis. Lancet Glob. Health 2014, 2, e106–e116. [Google Scholar] [CrossRef]
- Lin, H.-T.; Zheng, C.-M.; Tsai, C.-H.; Chen, C.-L.; Chou, Y.-C.; Zheng, J.-Q.; Lin, Y.-F.; Lin, C.-W.; Chen, Y.-C.; Sun, C.-A.; et al. The Association between Diabetic Retinopathy and Macular Degeneration: A Nationwide Population-Based Study. Biomedicines 2024, 12, 727. [Google Scholar] [CrossRef]
- Patel, A.J.; Downes, K.; Davis, A.; Das, A. Are Proliferative Diabetic Retinopathy and Diabetic Macular Edema two different disease processes? A Retrospective Cross-sectional Study. Investig. Ophthalmol. Vis. Sci. 2012, 53, 377. [Google Scholar]
- Wang, W.; Sun, G.; Xu, A.; Chen, C. Proliferative diabetic retinopathy and diabetic macular edema are two factors that increase macrophage-like cell density characterized by en face optical coherence tomography. BMC Ophthalmol. 2023, 23, 46. [Google Scholar] [CrossRef] [PubMed]
- Flaxel, C.J.; Adelman, R.A.; Bailey, S.T.; Fawzi, A.; Lim, J.I.; Vemulakonda, G.A.; Ying, G.-S. Age-Related Macular Degeneration Preferred Practice Pattern®. Ophthalmology 2020, 127, P1–P65. [Google Scholar] [CrossRef] [PubMed]
- Bouma, B.E.; de Boer, J.F.; Huang, D.; Jang, I.-K.; Yonetsu, T.; Leggett, C.L.; Leitgeb, R.; Sampson, D.D.; Suter, M.; Vakoc, B.J.; et al. Optical coherence tomography. Nat. Rev. Methods Prim. 2022, 2, 79. [Google Scholar] [CrossRef]
- Metrangolo, C.; Donati, S.; Mazzola, M.; Fontanel, L.; Messina, W.; D’alterio, G.; Rubino, M.; Radice, P.; Premi, E.; Azzolini, C. OCT Biomarkers in Neovascular Age-Related Macular Degeneration: A Narrative Review. J. Ophthalmol. 2021, 2021, 9994098. [Google Scholar] [CrossRef]
- Virgili, G.; Menchini, F.; Casazza, G.; Hogg, R.; Das, R.R.; Wang, X.; Michelessi, M. Optical coherence tomography (OCT) for detection of macular oedema in patients with diabetic retinopathy. Cochrane Database Syst. Rev. 2015, 2015, CD008081. [Google Scholar] [CrossRef]
- Rundo, L.; Militello, C. Image biomarkers and explainable AI: Handcrafted features versus deep learned features. Eur. Radiol. Exp. 2024, 8, 130. [Google Scholar] [CrossRef]
- Wang, J.; Wang, Y.X.; Zeng, D.; Zhu, Z.; Li, D.; Liu, Y.; Wong, T.Y. Artificial intelligence-enhanced retinal imaging as a biomarker for systemic diseases. Theranostics 2025, 15, 3223. [Google Scholar] [CrossRef]
- Attiku, Y.; Nittala, M.G.; Velaga, S.B.; Ramachandra, C.; Bhat, S.; Solanki, K.; Jayadev, C.; Choudhry, N.; Orr, S.M.A.; Jiang, S.; et al. Comparison of diabetic retinopathy severity grading on ETDRS 7-field versus ultrawide-field assessment. Eye 2023, 37, 2946–2949. [Google Scholar] [CrossRef]
- Xiao, Y.; Dan, H.; Du, X.; Michaelide, M.; Nie, X.; Wang, W.; Zheng, M.; Wang, D.; Huang, Z.; Song, Z. Assessment of early diabetic retinopathy severity using ultra-widefield Clarus versus conventional five-field and ultra-widefield Optos fundus imaging. Sci. Rep. 2023, 13, 17131. [Google Scholar] [CrossRef]
- Wilkinson, C.; Ferris, F.L.; Klein, R.E.; Lee, P.P.; Agardh, C.D.; Davis, M.; Dills, D.; Kampik, A.; Pararajasegaram, R.; Verdaguer, J.T. Proposed international clinical diabetic retinopathy and diabetic macular edema disease severity scales. Ophthalmology 2003, 110, 1677–1682. [Google Scholar] [CrossRef] [PubMed]
- Kumari, S.; Venkatesh, P.; Tandon, N.; Chawla, R.; Takkar, B.; Kumar, A. Selfie fundus imaging for diabetic retinopathy screening. Eye 2022, 36, 1988–1993. [Google Scholar] [CrossRef] [PubMed]
- Fenner, B.J.; Wong, R.L.M.; Lam, W.-C.; Tan, G.S.W.; Cheung, G.C.M. Advances in Retinal Imaging and Applications in Diabetic Retinopathy Screening: A Review. Ophthalmol. Ther. 2018, 7, 333–346. [Google Scholar] [CrossRef] [PubMed]
- Prawira, R.; Bustamam, A.; Anki, P. Multi Label Classification Of Retinal Disease On Fundus Images Using AlexNet And VGG16 Architectures. In Proceedings of the 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 16–17 December 2021; pp. 464–468. [Google Scholar] [CrossRef]
- Ju, L.; Wang, X.; Zhao, X.; Lu, H.; Mahapatra, D.; Bonnington, P.; Ge, Z. Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-Task Learning. IEEE J. Biomed. Health Inform. 2021, 25, 3709–3720. [Google Scholar] [CrossRef]
- Anitha, T.N.; Anitha, M.L.; Arun Kumar, M.N. Disease Grading of Diabetic Retinopathy using Deep Learning Techniques. In Proceedings of the 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–31 March 2022; pp. 1019–1024. [Google Scholar] [CrossRef]
- Zang, P.; Gao, L.; Hormel, T.T.; Wang, J.; You, Q.; Hwang, T.S.; Jia, Y. DcardNet: Diabetic Retinopathy Classification at Multiple Levels Based on Structural and Angiographic Optical Coherence Tomography. IEEE Trans. Biomed. Eng. 2021, 68, 1859–1870. [Google Scholar] [CrossRef]
- Nakayama, L.F.; Restrepo, D.; Matos, J.; Ribeiro, L.Z.; Malerbi, F.K.; Celi, L.A.; Regatieri, C.S. BRSET: A Brazilian Multilabel Ophthalmological Dataset of Retina Fundus Photos. PLoS Digit. Health 2024, 3, e0000454. [Google Scholar] [CrossRef]
- Sarki, R.; Ahmed, K.; Wang, H.; Zhang, Y. Automated detection of mild and multi-class diabetic eye diseases using deep learning. Health Inf. Sci. Syst. 2020, 8, 32. [Google Scholar] [CrossRef]
- Sükei, E.; Rumetshofer, E.; Schmidinger, N.; Mayr, A.; Schmidt-Erfurth, U.; Klambauer, G.; Bogunović, H. Multi-modal representation learning in retinal imaging using self-supervised learning for enhanced clinical predictions. Sci. Rep. 2024, 14, 26802. [Google Scholar] [CrossRef]
- Xu, Z.; Yang, Y.; Chen, H.; Han, R.; Han, X.; Zhao, J.; Yu, W.; Yang, Z.; Chen, Y. Enhancing pathological myopia diagnosis: A bimodal artificial intelligence approach integrating fundus and optical coherence tomography imaging for precise atrophy, traction and neovascularisation grading. Br. J. Ophthalmol. 2025, 109, 1179–1186. [Google Scholar] [CrossRef]
- Wu, J.; Fang, H.; Li, F.; Fu, H.; Lin, F.; Li, J.; Huang, Y.; Yu, Q.; Song, S.; Xu, X.; et al. GAMMA challenge: Glaucoma grAding from Multi-Modality imAges. Med. Image Anal. 2023, 90, 102938. [Google Scholar] [CrossRef]
- Hirsch, E.; Dawidowicz, G.; Tal, A. MedCycle: Unpaired Medical Report Generation via Cycle-Consistency. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16–21 June 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 1929–1944. [Google Scholar] [CrossRef]
- Bian, X.; Luo, X.; Wang, C.; Liu, W.; Lin, X. DDA-Net: Unsupervised cross-modality medical image segmentation via dual domain adaptation. Comput. Methods Programs Biomed. 2022, 213, 106531. [Google Scholar] [CrossRef] [PubMed]
- Zhao, Z.; Zhao, Y.; Yang, J.; Yao, X.; Liang, Q.; Zapp, D.; Huang, K.; Navab, N.; Nasseri, M.A. UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis Augmentation. arXiv 2025. [Google Scholar] [CrossRef]
- Chen, W.; Liu, Y.; Wang, C.; Zhu, J.; Li, G.; Liu, C.-L.; Lin, L. Cross-Modal Causal Representation Learning for Radiology Report Generation. IEEE Trans. Image Process. 2025, 34, 2970–2985. [Google Scholar] [CrossRef] [PubMed]
- Ishwaran, H.; O’Brien, R. Commentary: The Problem of Class Imbalance in Biomedical Data. J. Thorac. Cardiovasc. Surg. 2021, 161, 1940–1941. [Google Scholar] [CrossRef]
- Zhao, Y.; Wong, Z.S.-Y.; Tsui, K.L. A Framework of Rebalancing Imbalanced Healthcare Data for Rare Events’ Classification: A Case of Look-Alike Sound-Alike Mix-Up Incident Detection. J. Healthc. Eng. 2018, 2018, 6275435. [Google Scholar] [CrossRef]
- Kumar, V.; Lalotra, G.S.; Sasikala, P.; Rajput, D.S.; Kaluri, R.; Lakshmanna, K.; Shorfuzzaman, M.; Alsufyani, A.; Uddin, M. Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques. Healthcare 2022, 10, 1293. [Google Scholar] [CrossRef]
- Koch, V.; Holmberg, O.; Spitzer, H.; Schiefelbein, J.; Asani, B.; Hafner, M.; Theis, F.J. Noise Transfer for Unsupervised Domain Adaptation of Retinal OCT Images. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2022, Singapore, 18–22 September 2022; Springer: Cham, Switerland, 2022; pp. 699–708. [Google Scholar]
- Heinke, A.; Zhang, H.; Broniarek, K.; Michalska-Małecka, K.; Elsner, W.; Galang, C.M.B.; Deussen, D.N.; Warter, A.; Kalaw, F.; Nagel, I.; et al. Cross-instrument optical coherence tomography-angiography (OCTA)-based prediction of age-related macular degeneration (AMD) disease activity using artificial intelligence. Sci. Rep. 2024, 14, 27085. [Google Scholar] [CrossRef]
- Munk, M.R.; Giannakaki-Zimmermann, H.; Berger, L.; Huf, W.; Ebneter, A.; Wolf, S.; Zinkernagel, M.S. OCT-angiography: A qualitative and quantitative comparison of 4 OCT-A devices. PLoS ONE 2017, 12, e0177059. [Google Scholar] [CrossRef]
- Guan, H.; Liu, M. Domain Adaptation for Medical Image Analysis: A Survey. IEEE Trans. Biomed. Eng. 2022, 69, 1173–1185. [Google Scholar] [CrossRef]
- Vijayan, M.; Prasad, D.K.; Srinivasan, V. Advancing Glaucoma Diagnosis: Employing Confidence-Calibrated Label Smoothing Loss for Model Calibration. Ophthalmol. Sci. 2024, 4, 100555. [Google Scholar] [CrossRef]
- Dawood, T.; Chen, C.; Sidhu, B.S.; Ruijsink, B.; Gould, J.; Porter, B.; Elliott, M.K.; Mehta, V.; Rinaldi, C.A.; Puyol-Antón, E.; et al. Uncertainty aware training to improve deep learning model calibration for classification of cardiac MR images. Med. Image Anal. 2023, 88, 102861. [Google Scholar] [CrossRef]
- Guo, Y.; Liu, A.; Zhu, X.; Wang, Y. Calibration of Machine Learning Models for Medical Imaging: A Comprehensive Survey. IEEE Access 2023, 11, 45789–45803. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
- Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16 × 16 Words: Transformers for Image Recognition at Scale. 2 October 2020. Available online: https://openreview.net/forum?id=YicbFdNTTy (accessed on 2 January 2026).
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Kumar, A.; Ekbal, A.; Kawahra, D.; Kurohashi, S. Improving Transfer Learning Through Deep Convolutional Neural Network Ensembles. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Raghu, M.; Zhang, C.; Kleinberg, J.; Bengio, S. Transfusion: Understanding Transfer Learning for Medical Imaging. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://proceedings.neurips.cc/paper/8596-transfusion-understanding-transfer-learning-for-medical-imaging (accessed on 2 January 2026).
- Liu, Y. Optimization of Deep Learning Algorithms in Image Recognition. In Proceedings of the International Conference on Computer Graphics, Artificial Intelligence, and Data Processing (ICCAID 2024), Nanchang, China, 13–15 December 2024; SPIE: St Bellingham, WA, USA, 2025; Volume 13560, pp. 831–836. [Google Scholar]
- Kornblith, S.; Shlens, J.; Le, Q.V. Do Better ImageNet Models Transfer Better? In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 2656–2666. [Google Scholar] [CrossRef]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A Survey on Deep Transfer Learning. In Artificial Neural Networks and Machine Learning—ICANN 2018; Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11141, pp. 270–279. [Google Scholar] [CrossRef]
- Hayat, M.; Ahmad, N.; Nasir, A.; Tariq, Z.A. Hybrid deep learning efficientnetv2 and vision transformer (effnetv2-vit) model for breast cancer histopathological image classification. IEEE Access 2024, 12, 184119–184131. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar] [CrossRef]
- Chen, Y.-W.; Lin, L.; Jain, R. Multimodal Deep Learning for Medical Image Analysis. In Recent Advances in Deep Learning for Medical Image Analysis; Springer: Berlin/Heidelberg, Germany, 2025; pp. 129–150. ISBN 978-3-031-94790-2. [Google Scholar]
- Twinanda, A.P.; Shehata, S.; Mutter, D.; Marescaux, J.; de Mathelin, M.; Padoy, N. EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos. IEEE Trans. Med. Imaging 2017, 36, 86–97. [Google Scholar] [CrossRef]
- Plis, S.M.; Hjelm, D.R.; Salakhutdinov, R.; Allen, E.A.; Bockholt, H.J.; Long, J.D.; Johnson, H.J.; Paulsen, J.S.; Turner, J.A.; Calhoun, V.D. Deep Learning for Neuroimaging: A Validation Study. Front. Neurosci. 2014, 8, 92071. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv 2021, arXiv:2103.00020. [Google Scholar] [CrossRef]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning 2020. arXiv 2020. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Asano, Y.M.; Rupprecht, C.; Vedaldi, A. Self-labelling via simultaneous clustering and representation learning. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 9912–9924. [Google Scholar]
- Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15745–15753. [Google Scholar] [CrossRef]
- Grill, J.-B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G.; et al. Bootstrap Your Own Latent a New Approach to Self-Supervised Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020; pp. 21271–21284. [Google Scholar]
- Diakou, I.; Iliopoulos, E.; Papakonstantinou, E.; Dragoumani, K.; Yapijakis, C.; Iliopoulos, C.; Spandidos, D.A.; Chrousos, G.P.; Eliopoulos, E.; Vlachakis, D. Multi-label Classification of Biomedical Data. Med. Int. 2024, 4, 1–9. [Google Scholar] [CrossRef]
- Priyadharshini, M.; Banu, A.F.; Sharma, B.; Chowdhury, S.; Rabie, K.; Shongwe, T. Hybrid Multi-Label Classification Model for Medical Applications Based on Adaptive Synthetic Data and Ensemble Learning. Sensors 2023, 23, 6836. [Google Scholar] [CrossRef]
- Yuan, S.; Chen, Y.; Ye, C.; Bhatt, M.W.; Saradeshmukh, M.; Hossain, M.S. Cross-modal multi-label image classification modeling and recognition based on nonlinear. Nonlinear Eng. 2023, 12, 20220194. [Google Scholar] [CrossRef]
- Lemay, A.; Gros, C.; Karthik, E.N.; Cohen-Adad, J. Label fusion and training methods for reliable representation of inter-rater uncertainty. MELBA J. 2023, 1, 1–27. [Google Scholar] [CrossRef]
- Neyestanak, M.S.; Jahani, H.; Khodarahmi, M.; Zahiri, J.; Hosseini, M.; Fatoorchi, A.; Yekaninejad, M.S. A Quantitative Comparison between Focal Loss and Binary Cross-Entropy Loss in Brain Tumor Auto-Segmentation Using U-Net. J. Biostat. Epidemiol. 2025, 11, 15–35. [Google Scholar] [CrossRef]
- Watanabe, C. Interpreting Layered Neural Networks via Hierarchical Modular Representation. In Neural Information Processing; Gedeon, T., Wong, K.W., Lee, M., Eds.; Communications in Computer and Information Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 1143, pp. 376–388. [Google Scholar] [CrossRef]
- Egba, A.F.; Okonkwo, O.R.; Iduh, B.N. An Enhanced Modular-Based Neural Network Framework for Effective Medical Diagnosis. J. Comput. Mech. Power Syst. Control 2024, 7, 1–13. [Google Scholar] [CrossRef]
- Waugh, N.; Loveman, E.; Colquitt, J.; Royle, P.; Yeong, J.L.; Hoad, G.; Lois, N. Introduction to Age-Related Macular Degeneration. In Treatments for Dry Age-Related Macular Degeneration and Stargardt Disease: A Systematic Review; NIH: Bethesda, MA, USA, 2018. [Google Scholar]
- Sasaki, M.; Kawasaki, R.; Yanagi, Y. Early Stages of Age-Related Macular Degeneration: Racial/Ethnic Differences and Proposal of a New Classification Incorporating Emerging Concept of Choroidal Pathology. J. Clin. Med. 2022, 11, 6274. [Google Scholar] [CrossRef]
- Davis, M.D.; Bressler, S.B.; Aiello, L.P.; Bressler, N.M.; Browning, D.J.; Flaxel, C.J.; Fong, D.S.; Foster, W.J.; Glassman, A.R.; Hartnett, M.E.R.; et al. Comparison of Time-Domain OCT and Fundus Photographic Assessments of Retinal Thickening in Eyes with Diabetic Macular Edema. Invest. Ophthalmol. Vis. Sci. 2008, 49, 1745–1752. [Google Scholar] [CrossRef]
- Sikorski, B.L.; Malukiewicz, G.; Stafiej, J.; Lesiewska-Junk, H.; Raczynska, D. The Diagnostic Function of OCT in Diabetic Maculopathy. Mediat. Inflamm. 2013, 2013, 434560. [Google Scholar] [CrossRef]
- Zhang, M.-L.; Zhou, Z.-H. A Review on Multi-Label Learning Algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar] [CrossRef]
- Cui, Y.; Jia, M.; Lin, T.-Y.; Song, Y.; Belongie, S. Class-Balanced Loss Based on Effective Number of Samples. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 9260–9269. [Google Scholar] [CrossRef]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput.Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Lipton, Z.C.; Elkan, C.; Naryanaswamy, B. Optimal Thresholding of Classifiers to Maximize F1 Measure. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8725, pp. 225–239. [Google Scholar] [CrossRef]
- Stahlschmidt, S.R.; Ulfenborg, B.; Synnergren, J. Multimodal deep learning for biomedical data fusion: A review. Briefings Bioinform. 2022, 23, bbab569. [Google Scholar] [CrossRef]
- Gholami, P.; Roy, P.; Parthasarathy, M.K.; Lakshminarayanan, V. OCTID: Optical coherence tomography image database. Comput. Electr. Eng. 2020, 81, 106532. [Google Scholar] [CrossRef]
- Traslational-Visual-Health-Laboratory. Traslational-Visual-Health-Laboratory/OCT-AND-EYE-FUNDUS-DATASET. 30 September 2025. Available online: https://github.com/Traslational-Visual-Health-Laboratory/OCT-AND-EYE-FUNDUS-DATASET (accessed on 9 October 2025).
- Xu, W.; Fu, Y.L.; Zhu, D. ResNet and its application to medical image processing: Research progress and challenges. Comput. Methods Programs Biomed. 2023, 240, 107660. [Google Scholar] [CrossRef]
- Agastya, N.; Novamizanti, L.; Budiman, G. Tuna Loin Quality Grading Using Image Processing and EfficientNetV2. In Proceedings of the 2025 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), Bali, Indonesia, 3–5 July 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 833–840. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
- Alomar, K.; Aysel, H.I.; Cai, X. Data Augmentation in Classification and Segmentation: A Survey and New Strategies. J. Imaging 2023, 9, 46. [Google Scholar] [CrossRef]
- Natarajan, S.; Jain, A.; Krishnan, R.; Rogye, A.; Sivaprasad, S. Diagnostic Accuracy of Community-Based Diabetic Retinopathy Screening with an Offline Artificial Intelligence System on a Smartphone. JAMA Ophthalmol. 2019, 137, 1182–1188. [Google Scholar] [CrossRef]
- Hao, R.; Namdar, K.; Liu, L.; Haider, M.A.; Khalvati, F. A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-Weighted MRI Using Convolutional Neural Networks. J. Digit. Imaging 2021, 34, 862–876. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
- Kulyabin, M.; Zhdanov, A.; Nikiforova, A.; Stepichev, A.; Kuznetsova, A.; Ronkin, M.; Borisov, V.; Bogachev, A.; Korotkich, S.; Constable, P.A.; et al. OCTDL: Optical Coherence Tomography Dataset for Image-Based Deep Learning Methods. Sci. Data 2024, 11, 365. [Google Scholar] [CrossRef]
- Yakoubi, M.A.; Khiari, N.; Khiari, A.; Melouah, A. Deep Neural Network-Based Model for Breast Cancer Lesion Diagnosis in Mammography Images. Acta Inform. Pragensia 2024, 13, 213–233. [Google Scholar] [CrossRef]
- Rajaraman, S.; Ganesan, P.; Antani, S. Deep learning model calibration for improving performance in class-imbalanced medical image classification tasks. PLoS ONE 2022, 17, e0262838. [Google Scholar] [CrossRef]
- Carse, J.; Olmo, A.A.; McKenna, S. Calibration of deep medical image classifiers: An empirical comparison using dermatology and histopathology datasets. In International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging; Springer: Berlin/Heidelberg, Germany, 2022; pp. 89–99. [Google Scholar]
- Kiss, S.; Chandwani, H.S.; Cole, A.L.; Patel, V.D.; Lunacsek, O.E.; Dugel, P.U. Comorbidity and health care visit burden in working-age commercially insured patients with diabetic macular edema. Clin. Ophthalmol. 2016, 10, 2443–2453. [Google Scholar] [CrossRef]
- Lobo, C.; Santos, T.; Marques, I.P.; Madeira, M.H.; Santos, A.R.; Figueira, J.; Cunha-Vaz, J. Characterisation of Progression of Macular Oedema in the Initial Stages of Diabetic Retinopathy: A 3-Year Longitudinal Study. Eye 2023, 37, 313–319. [Google Scholar] [CrossRef] [PubMed]
- Toma, C.; Cavallari, E.; Varano, P.; Servillo, A.; Gatti, V.; Ferrante, D.; Torti, E.; Muraca, A.; De Cillà, S. Microvascular Changes in Eyes with Non-Proliferative Diabetic Retinopathy with or without Macular Microaneurysms: An OCT-Angiography Study. Acta Diabetol. 2025, 62, 753–761. [Google Scholar] [CrossRef] [PubMed]
- Bi, Y.; Xie, J.; Wang, H. Contrastive Learning-Based Feature Modulation Strategy for Test-Time Adaptation in Medical Image Segmentation. In Proceedings of the 2025 28th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Compiegne, France, 5–7 May 2025; pp. 916–921. [Google Scholar] [CrossRef]
- Rajaraman, S.; Zamzmi, G.; Antani, S.K. Novel loss functions for ensemble-based medical image classification. PLoS ONE 2021, 16, e0261307. [Google Scholar] [CrossRef] [PubMed]
- Viñals, R.; Thiran, J.-P. A KL Divergence-Based Loss for In Vivo Ultrafast Ultrasound Image Enhancement with Deep Learning. J. Imaging 2023, 9, 256. [Google Scholar] [CrossRef]
- Lei, X.; Chen, Z.; Liu, H.; Chen, J.; Tan, H.; Dai, W.; Wang, X.; Xu, H. A Cross-Modal Feature Fusion Method to Diagnose Macular Fibrosis in Neovascular Age-Related Macular Degeneration. In Proceedings of the 2024 IEEE International Symposium on Biomedical Imaging (ISBI), Athens, Greece, 27–30 May 2024; pp. 1–5. [Google Scholar]
- Zedadra, A.; Salah-Salah, M.Y.; Zedadra, O.; Guerrieri, A. Multi-Modal AI for Multi-Label Retinal Disease Prediction Using OCT and Fundus Images: A Hybrid Approach. Sensors 2025, 25, 4492. [Google Scholar] [CrossRef]
- Ashrafkhorasani, M.; Habibi, A.; Nittala, M.G.; Corradetti, G.; Emamverdi, M.; Sadda, S.R. Peripheral retinal lesions in diabetic retinopathy on ultra-widefield imaging. Saudi J. Ophthalmol. 2024, 38, 123–131. [Google Scholar] [CrossRef]
- Chen, W.; Wang, H. OCTSharp: An open-source and real-time OCT imaging software based on C#. Biomed. Opt. Express 2023, 14, 6060–6071. [Google Scholar] [CrossRef]
- Li, J.; Wang, Z.; Chen, Y.; Zhu, C.; Xiong, M.; Bai, H.X. A Transformer utilizing bidirectional cross-attention for multi-modal classification of Age-Related Macular Degeneration. Biomed. Signal Process. Control 2025, 109, 107887. [Google Scholar] [CrossRef]




| Source | Native Labels (Used) | Mapping to Unified Label Space (This Paper) |
|---|---|---|
| In-house OCT | AMD stage (AREDS-based), plus clinical labels for DR/DME/NORM | Parent OCT labels: AMD = 1 if any AREDS stage; DR = 1 if DR present; DME = 1 if DME present; NORM = 1 if no pathology |
| OCTID (OCT-only) | Disease-level OCT categories (e.g., Normal, DR) | Parent OCT labels only: Normal → NORM; DR → DR; not used for DR staging |
| OCT–fundus dataset | Fundus DR stage (ICDR-based); OCT diagnostic classes | Fundus DR staging: Mild/Moderate/Severe NPDR, PDR; Parent OCT labels: NORM/AMD/DR/DME as provided |
| Failure Category | Fundus-Visible Evidence (ICDR-Relevant) | Why OCT-Only May Miss It (FOV/Visibility) | Typical Error & Mitigation |
|---|---|---|---|
| Peripheral proliferative signs | Neovascularization away from the macula (e.g., NVE/NVD), peripheral hemorrhages | Standard macular OCT does not cover peripheral retina; the lesion may be outside the scanned area (missing FOV) | Under-staging (PDR → severe/moderate). Mitigation: acquire fundus or wider-field imaging; defer if QC fails. |
| Severity driven by lesion burden across fields | Stage boundaries depend on lesion extent across multiple retinal regions (counting-based rules) | OCT B-scans sample limited regions; disease burden outside sampled locations is unobserved (missing FOV) | Moderate↔severe confusion. Mitigation: request fundus/UWFI for confirmation in borderline cases. |
| Subtle early fundus signs | Single/few microaneurysms and small hemorrhages | These signs are more reliably assessed en-face on fundus; OCT-only may show weak or ambiguous correlates | Under-staging (mild → no DR). Mitigation: conservative routing/deferral for low-confidence cases. |
| Image-quality-driven errors | Fundus has sufficient quality, but OCT has artifacts (motion, shadowing, low signal) | Artifacts corrupt structural cues and degrade cross-modal projection quality | Unstable staging. Mitigation: quality control and manual verification when QC is triggered. |
| Borderline/subjective boundaries | Cases close to ICDR thresholds even for experts | Small differences in captured evidence across modalities amplify ambiguity | Stage flip near boundary. Mitigation: defer to fundus when clinically consequential. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lopukhova, E.A.; Idrisova, G.M.; Mukhamadeev, T.R.; Voronkov, G.S.; Kutluyarov, R.V.; Topolskaya, E.P. A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data. J. Imaging 2026, 12, 36. https://doi.org/10.3390/jimaging12010036
Lopukhova EA, Idrisova GM, Mukhamadeev TR, Voronkov GS, Kutluyarov RV, Topolskaya EP. A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data. Journal of Imaging. 2026; 12(1):36. https://doi.org/10.3390/jimaging12010036
Chicago/Turabian StyleLopukhova, Ekaterina A., Gulnaz M. Idrisova, Timur R. Mukhamadeev, Grigory S. Voronkov, Ruslan V. Kutluyarov, and Elizaveta P. Topolskaya. 2026. "A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data" Journal of Imaging 12, no. 1: 36. https://doi.org/10.3390/jimaging12010036
APA StyleLopukhova, E. A., Idrisova, G. M., Mukhamadeev, T. R., Voronkov, G. S., Kutluyarov, R. V., & Topolskaya, E. P. (2026). A Hierarchical Deep Learning Architecture for Diagnosing Retinal Diseases Using Cross-Modal OCT to Fundus Translation in the Lack of Paired Data. Journal of Imaging, 12(1), 36. https://doi.org/10.3390/jimaging12010036

