Performance and Clinical Utility of Deep Learning for Detecting Referable Age-Related Macular Degeneration on Fundus Photographs: A Systematic Review and Meta-Analysis
Abstract
1. Introduction
2. Materials and Methods
2.1. General Guideline
2.2. Database Searches and Identification of Eligible Manuscripts
2.3. Data Extraction and Management
2.4. Quality Assessment
2.5. Statistical Analysis
3. Results
3.1. Study Identification and Selection
3.2. Overview of Study Characteristics
3.3. Comparative Overview of Deep Learning Algorithms in Diagnostic Studies
3.4. Quality Assessment
3.5. Overall Meta-Analysis of Deep Learning Algorithms
3.6. Subgroup Analysis
3.7. Clinical Applicability
3.8. Contrast-Based Meta-Analysis of Deep Learning Versus Human Graders
4. Discussion
4.1. Principal Findings
4.2. Relationship to Prior Evidence
4.3. DL Versus Human Graders: A Clinically Relevant Trade-Off
4.4. Sources of Heterogeneity and Why They Matter
4.4.1. Differences in the Definition of “Referable AMD”
4.4.2. Reference Standards and Adjudication
4.4.3. Imaging Devices, Field of View, and Image Quality
4.4.4. Handling of Ungradable Images
4.4.5. Model Families and Non-Traditional Architectures
4.4.6. Dataset Provenance (Public vs. Private) and Population Spectrum
4.4.7. Methodological Quality and Risk of Bias
4.5. Implications for Screening, Triage, and Service Delivery
4.6. Methodological Strengths and Limitations
4.7. Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AMD | Age-related macular degeneration |
| rAMD | Referable age-related macular degeneration |
| DL | Deep learning |
| AI | Artificial intelligence |
| CNN | Convolutional neural network |
| OCT | Optical coherence tomography |
| AREDS | Age-Related Eye Disease Study |
| DTA | Diagnostic test accuracy |
| PRISMA-DTA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy |
| PROBAST | Prediction model Risk Of Bias ASsessment Tool |
| PROBAST-AI | PROBAST adapted for artificial intelligence models |
| TP | True positive |
| FP | False positive |
| TN | True negative |
| FN | False negative |
| CI | Confidence interval |
| SROC | Summary receiver operating characteristic |
| PLR | Positive likelihood ratio |
| NLR | Negative likelihood ratio |
| UWF | Ultra-wide field |
| ViT | Vision Transformer |
References
- Wong, W.L.; Su, X.; Li, X.; Cheung, C.M.G.; Klein, R.; Cheng, C.Y.; Wong, T.Y. Global prevalence of age-related macular degeneration and disease burden projection for 2020 and 2040: A systematic review and meta-analysis. Lancet Glob. Health 2014, 2, e106–e116. [Google Scholar] [CrossRef] [PubMed]
- Hsu, T.K.; Lai, I.P.; Tsai, M.J.; Lee, P.-J.; Hung, K.-C.; Yang, S.; Chan, L.-W.; Lin, I.-C.; Chang, W.-H.; Huang, Y.-J.; et al. A deep learning approach for the screening of referable age-related macular degeneration - Model development and external validation. J. Formos. Med. Assoc. 2026, 125, 21–25. [Google Scholar] [CrossRef] [PubMed]
- Burlina, P.M.; Joshi, N.; Pacheco, K.D.; Joshi, N.; Freund, D.E.; Bressler, N.M. Comparing humans and deep learning performance for grading AMD: A study using universal deep features and transfer learning for automated analysis. Comput. Biol. Med. 2017, 82, 80–86. [Google Scholar] [CrossRef] [PubMed]
- Dong, L.; He, W.; Zhang, R.; Ge, Z.; Wang, Y.X.; Zhou, J.; Xu, J.; Shao, L.; Wang, Q.; Yan, Y.; et al. Artificial intelligence for screening of multiple retinal and optic nerve diseases. JAMA Netw. Open 2022, 5, e229960. [Google Scholar] [CrossRef]
- Burlina, P.M.; Pacheco, K.D.; Joshi, N.; Pekala, M.; Freund, D.E.; Bressler, N.M. Automated grading of age-related macular degeneration from color fundus images using deep convolutional neural networks. JAMA Ophthalmol. 2017, 135, 1170–1176. [Google Scholar] [CrossRef]
- Ting, D.S.W.; Cheung, C.Y.; Lim, G.; Tan, G.S.W.; Quang, N.D.; Gan, A.; Hamzah, H.; Garcia-Franco, R.; Yeo, I.Y.S.; Lee, S.Y.; et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multi-ethnic populations with diabetes. JAMA 2017, 318, 2211–2223. [Google Scholar] [CrossRef]
- Grassmann, F.; Mengelkamp, J.; Brandl, C.; Harsch, S.; Zimmermann, M.E.; Linkohr, B.; Peters, A.; Heid, I.M.; Palm, C.; Weber, B.H. A deep learning algorithm for prediction of AMD severity on the AREDS simplified severity scale using fundus photographs. Ophthalmology 2018, 125, 1410–1420. [Google Scholar] [CrossRef]
- De Fauw, J.; Ledsam, J.R.; Romera-Paredes, B.; Nikolov, S.; Tomasev, N.; Blackwell, S.; Askham, H.; Glorot, X.; O’donoghue, B.; Visentin, D.; et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 2018, 24, 1342–1350. [Google Scholar] [CrossRef]
- Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.S.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 2018, 172, 1122–1131.e9. [Google Scholar] [CrossRef]
- Burlina, P.M.; Freund, D.E.; Bressler, N.M. Utility of deep learning methods for referability classification of age-related macular degeneration. JAMA Ophthalmol. 2018, 136, 1305–1307. [Google Scholar] [CrossRef]
- Savoy, F.M.; Rao, D.P.; Toh, J.K.; Ong, B.; Sivaraman, A.; Sharma, A.; Das, T. Empowering portable age-related macular degeneration screening: Evaluation of a deep learning algorithm for a smartphone fundus camera. BMJ Open. 2024, 14, e081398. [Google Scholar] [CrossRef] [PubMed]
- Leng, X.; Shi, R.; Wu, Y.; Zhu, S.; Cai, X.; Lu, X.; Liu, R. Deep learning for detection of age-related macular degeneration: A systematic review and meta-analysis of diagnostic test accuracy studies. PLoS ONE 2023, 18, e0284060. [Google Scholar] [CrossRef] [PubMed]
- Keane, P.A.; Topol, E.J. With an eye to AI and autonomous diagnosis. npj Digit. Med. 2018, 1, 40. [Google Scholar] [CrossRef]
- Yim, J.; Chopra, R.; Spitz, T.; Winkens, J.; Obika, A.; Kelly, C.; Askham, H.; Lukic, M.; Huemer, J.; Fasler, K.; et al. Predicting conversion to wet age-related macular degeneration using deep learning. Nat. Med. 2020, 26, 892–899. [Google Scholar] [CrossRef] [PubMed]
- Kanagasingam, Y.; Abramoff, M.D.; Jonnal, R.; Bhuiyan, A.; Goldschmidt, L.; Wong, T.Y. Progress on retinal image analysis for age-related macular degeneration. Prog. Retin. Eye Res. 2014, 38, 20–42. [Google Scholar] [CrossRef]
- Chew, E.Y.; Schachat, A.P. Should we add screening of age-related macular degeneration to current screening programs for diabetic retinopathy? Ophthalmology 2015, 122, 2155–2156. [Google Scholar] [CrossRef]
- Nagendran, M.; Chen, Y.; Lovejoy, C.A.; Gordon, A.C.; Komorowski, M.; Harvey, H.; Topol, E.J.; A Ioannidis, J.P.; Collins, G.S.; Maruthappu, M. Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020, 368, m689. [Google Scholar] [CrossRef]
- Ting, D.S.W.; Pasquale, L.R.; Peng, L.; Campbell, J.P.; Lee, A.Y.; Raman, R.; Tan, G.S.W.; Schmetterer, L.; Keane, P.A.; Wong, T.Y. Artificial intelligence and deep learning in ophthalmology. Br. J. Ophthalmol. 2019, 103, 167–175. [Google Scholar] [CrossRef]
- Elsman, E.B.M.; van Rens, G.H.M.B.; van Nispen, R.M.A. Artificial intelligence in retina: From segmentation to disease prediction. Acta Ophthalmol. 2019, 97, 165–172. [Google Scholar] [CrossRef]
- Liu, Y.; Holekamp, N.M.; Heier, J.S. Prospective, longitudinal study: Daily self-imaging with home OCT for neovascular age-related macular degeneration. Ophthalmol. Retin. 2022, 6, 575–585. [Google Scholar] [CrossRef]
- McInnes, M.D.F.; Moher, D.; Thombs, B.D.; McGrath, T.A.; Bossuyt, P.M.; Clifford, T.; Cohen, J.F.; Deeks, J.J.; Gatsonis, C.; Hooft, L.; et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: The PRISMA-DTA statement. JAMA 2018, 319, 388–396. [Google Scholar] [CrossRef] [PubMed]
- Luo, W.; Wang, T. Diagnostic accuracy of deep learning for referable age-related macular degeneration: A systematic review and meta-analysis. INPLASY Protocol 2025. [Google Scholar] [CrossRef]
- Moons, K.G.M.; Damen, J.A.A.; Kaul, T.; Hooft, L.; Navarro, C.A.; Dhiman, P.; Beam, A.L.; Van Calster, B.; Celi, L.A.; Denaxas, S.; et al. PROBAST+AI: An updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 2025, 388, e082505. [Google Scholar] [CrossRef] [PubMed]
- World Bank. World Bank Country Classifications by Income Level for 2024-2025. World Bank Blogs. Available online: https://blogs.worldbank.org/opendata/world-bank-country-classifications-by-income-level-for-2024-2025 (accessed on 1 February 2025).
- Deeks, J.J.; Macaskill, P.; Irwig, L. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy. J. Clin. Epidemiol. 2005, 58, 882–893. [Google Scholar] [CrossRef]
- Nyaga, V.N.; Arbyn, M. Metadta: A Stata command for meta-analysis and meta-regression of diagnostic test accuracy data - a tutorial. Arch Public Health 2022, 80, 95, Erratum in Arch Public Health 2022, 80, 216. https://doi.org/10.1186/s13690-022-00953-9. [Google Scholar] [CrossRef]
- Most, J.A.; Folk, G.A.; Walker, E.H.; Nagel, I.D.; Mehta, N.N.; Flester, E.; Borooah, S. Evaluating the clinical utility of multimodal large language models for detecting age-related macular degeneration from retinal imaging. Sci. Rep. 2025, 15, 33214. [Google Scholar] [CrossRef]
- Negiloni, K.; Baskaran, P.; Rao, D.P.; Maitray, A.; Savoy, F.M.; Suresh, S.; Mahalingam, M.; Vighnesh, M.J.; Rajendran, A. Advancing AMD screening with an offline, AI-powered smartphone-based fundus camera: A prospective, real-world clinical validation. Eye 2025, 39, 2548–2554. [Google Scholar] [CrossRef]
- Taylor, J.R.; Drinkwater, J.; Sousa, D.C.; Shah, V.; Turner, A.W. Real-world evaluation of RetCAD deep-learning system for the detection of referable diabetic retinopathy and age-related macular degeneration. Clin. Exp. Optom. 2025, 108, 601–606. [Google Scholar] [CrossRef]
- González-Gonzalo, C.; Sánchez-Gutiérrez, V.; Hernández-Martínez, P.; Contreras, I.; Lechanteur, Y.T.; Domanian, A.; van Ginneken, B.; Sánchez, C.I. Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration. Acta Ophthalmol. 2020, 98, 368–377. [Google Scholar] [CrossRef]
- Bhuiyan, A.; Wong, T.Y.; Ting, D.S.W.; Govindaiah, A.; Souied, E.H.; Smith, R.T. Artificial intelligence to stratify severity of age-related macular degeneration (AMD) and predict risk of progression to late AMD. Transl. Vis. Sci. Technol. 2020, 9, 25. [Google Scholar] [CrossRef]
- Peng, Y.; Dharssi, S.; Chen, Q.; Keenan, T.D.; Agrón, E.; Wong, W.T.; Chew, E.Y.; Lu, Z. DeepSeeNet: A deep learning model for automated classification of patient-based age-related macular degeneration severity from color fundus photographs. Ophthalmology 2019, 126, 565–575. [Google Scholar] [CrossRef]






| Study (Ref) | Design | Study Period | Source/Setting | Country | Population | Age (Reported) | Female % | Sample Size (Patients/Images) | Other Targets | Income Category | AMD Target |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Most et al., 2025 [27] | Retrospective | Apr 2023–Jun 2024 (collection); Dec 2024–Feb 2025 (testing) | UCSD Shiley Eye Institute; tertiary | United States | Adults | Mean 81.1 | 69.7 | 76/136 | None | High income | Referable AMD |
| Negiloni et al., 2025 [28] | Prospective | Nov 2022–Nov 2023 | Aravind Eye Hospital (Chennai); tertiary | India | Adults ≥40 | Mean 61.8 ± 9.9 | 56.0 | 492/984 | None | Lower-middle income | Referable AMD |
| Taylor et al., 2025 [29] | Retrospective | Jan 2020–Jun 2022 | Rural ophthalmology clinics; primary care | Australia | High-risk rural adults (known DR or AMD) | Mean 64.0 ± 12.8 | 51.0 | 82/150 | Diabetic retinopathy | High income | Referable AMD |
| Hsu et al., 2024 [2] | Retrospective | Dev Aug 2010–Jun 2019; Val Jul 2019–Apr 2021 | NTUH; tertiary | Taiwan | Adults ≥50 | NR | NR | NR/7738 | None | High income | Referable AMD |
| Savoy et al., 2024 [11] | Retrospective | Mar 2013–Oct 2020; Jan 2022–Mar 2022 | AREDS + South Asian target-device evaluation; mixed settings | United States; India | Adults (AREDS + South Asian cohorts) | Mean 51.9 | 44.3 † | AREDS: 909/108,251; Target: 238/1108 | None | Mixed (High + Lower-middle) | Referable AMD |
| Dong et al., 2022 [4] | Prospective | Dev Jun 2018–Jun 2020; Val Nov 2020–Feb 2021 | iKang health check-up centers; primary care | China | General screening population | Median 42 (range 8–87) | 55.3 | Dev: 63,400/120,002; Val: 110,784/208,758 | Multiple (e.g., DR and glaucoma) | Upper-middle income | Referable AMD |
| González-Gonzalo et al., 2020 [30] | Retrospective | Images acquired Aug 2011–Oct 2016 | Routine clinical practice + public datasets | Europe; United States | Routine retinal imaging patients | NR | NR | DR-AMD: 288/600; Messidor: NR/1200; AREDS: NR/133,821 | Diabetic retinopathy | High income (predominant) | Referable AMD |
| Bhuiyan et al., 2020 [31] | Retrospective | AREDS long-term follow-up (~12 years) | AREDS research cohorts | France | Adults (AREDS participants) | Mean 69.4 | 55.7 | 4753/16,875 | None | High income | Referable AMD |
| Peng et al., 2019 [32] | Retrospective | AREDS follow-up (~12 years) | AREDS dataset | United States | Adults (AREDS participants) | NR | NR | 4549/59,302 | None | High income | Referable AMD |
| Burlina et al., 2018 [10] | Retrospective | Nov 1992–Nov 2005 | AREDS study period | United States | Adults (AREDS participants) | NR | NR | 4613/67,401 | None | High income | Referable AMD |
| Grassmann et al., 2018 [7] | Retrospective | AREDS follow-up (~12 years) | AREDS + KORA cohorts | Germany | Adults (AREDS; KORA analyses) | NR | NR | AREDS: 3654/120,656; KORA: 5555/NR | None | High income | Referable AMD |
| Burlina et al., 2017a [5] | Retrospective | AREDS follow-up (~12 years) | AREDS dataset | United States | Adults (AREDS participants) | NR | NR | 4613/133,821 | None | High income | Referable AMD |
| Burlina et al., 2017b [3] | Retrospective | AREDS follow-up (~12 years) | AREDS dataset | United States | Adults (AREDS participants) | >50 (reported) | NR | NR/5664 | None | High income | Referable AMD |
| Ting et al., 2017 [6] | Retrospective | Train 2010–2013; Val 2014–2017 | National DR screening + multi-country validation | Singapore + multi-country | Adults with diabetes | Mean 60.2 | 45.4 | 14,880/35,948 | Diabetic retinopathy (and related eye diseases) | Mixed (multi-country) | Referable AMD |
| Study (Year) | Participants | Predictors (AI Model) | Outcome (Reference Standard) | Analysis | Overall ROB |
|---|---|---|---|---|---|
| Most et al. (2025) [27] | High | High | Low | High | High |
| Negiloni et al. (2025) [28] | Low | Low | Low | Low | Low |
| Taylor et al. (2025) [29] | Unclear | Low | Low | Unclear | Unclear |
| Hsu et al. (2024) [2] | Low | Low | Low | Low | Low |
| Savoy et al. (2024) [11] | Unclear | Low | Low | Unclear | Unclear |
| Dong et al. (2022) [4] | Low | Low | Low | Unclear | Unclear |
| Gonzalez-Gonzalo et al. (2020) [30] | Unclear | Low | Low | Unclear | Unclear |
| Bhuiyan et al. (2020) [31] | Low | Low | Low | Unclear | Unclear |
| Peng (2019) [32] | Low | Low | Low | High | High |
| Burlina et al. (2018) [10] | Low | Low | Low | High | High |
| Grassmann (2018) [7] | Unclear | Low | Low | High | High |
| Burlina et al. (2017) a [5] | Low | Low | Low | High | High |
| Burlina et al. (2017) b [3] | Low | Low | Low | High | High |
| Ting et al. (2017) [6] | Low | Low | Low | Unclear | Unclear |
| Moderator | Category | N | Deep Learning Algorithms | |||
|---|---|---|---|---|---|---|
| Sensitivity | p-Value | Specificity | p-Value | |||
| Study design | Retrospective | 10 | 0.91 (0.87–0.94) | 0.67 | 0.91 (0.84–0.95) | 0.21 |
| Prospective | 4 | 0.89 (0.72–0.96) | 0.97 (0.86–0.99) | |||
| Economic status | High | 11 | 0.91 (0.86–0.94) | 0.74 | 0.92 (0.84–0.96) | 0.53 |
| Low-Middle | 3 | 0.90 (0.77–0.96) | 0.95 (0.81–0.99) | |||
| Healthcare setting | Primary | 9 | 0.88 (0.82–0.92) | 0.08 | 0.92 (0.83–0.96) | 0.73 |
| Tertiary | 5 | 0.94 (0.89–0.97) | 0.94 (0.82–0.98) | |||
| Dataset | Public dataset | 5 | 0.92 (0.87–0.95) | 0.49 | 0.93 (0.85–0.97) | 0.81 |
| Private dataset | 8 | 0.89 (0.80–0.94) | 0.92 (0.78–0.97) | |||
| External validation | Yes | 9 | 0.91 (0.85–0.95) | 0.88 | 0.94 (0.86–0.97) | 0.57 |
| No | 5 | 0.90 (0.82–0.95) | 0.91 (0.76–0.97) | |||
| Other target | Yes | 4 | 0.90 (0.79–0.95) | 0.74 | 0.93 (0.79–0.98) | 0.87 |
| No | 10 | 0.91 (0.86–0.94) | 0.92 (0.84–0.96) | |||
| Camera type | Desktop | 12 | 0.91 (0.86–0.94) | 0.92 | 0.93 (0.87–0.97) | 0.35 |
| Smartphone | 2 | 0.91 (0.76–0.97) | 0.85 (0.49–0.97) | |||
| Criteria | AREDS | 10 | 0.91 (0.87–0.94) | 0.41 | 0.92 (0.84–0.96) | 0.39 |
| Beckman | 2 | 0.86 (0.67–0.95) | 0.96 (0.81–0.99) | |||
| Vendor involved | No | 8 | 0.89 (0.82–0.93) | 0.21 | 0.88 (0.77–0.94) | 0.07 |
| Yes | 6 | 0.93 (0.88–0.96) | 0.96 (0.91–0.98) | |||
| Risk of bias | High | 6 | 0.88 (0.80–0.93) | 0.37 | 0.89 (0.76–0.96) | 0.58 |
| Unclear | 6 | 0.93 (0.88–0.96) | 0.95 (0.87–0.98) | |||
| Low | 2 | 0.90 (0.77–0.96) | 0.93 (0.69–0.99) | |||
| Article type | Original | 14 | 0.91 (0.87–0.94) | 0.51 | 0.93 (0.87–0.96) | 0.27 |
| Supplement | 2 | 0.87 (0.68–0.95) | 0.90 (0.61–0.98) | |||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Luo, W.-T.; Wang, T.-W. Performance and Clinical Utility of Deep Learning for Detecting Referable Age-Related Macular Degeneration on Fundus Photographs: A Systematic Review and Meta-Analysis. Diagnostics 2026, 16, 633. https://doi.org/10.3390/diagnostics16040633
Luo W-T, Wang T-W. Performance and Clinical Utility of Deep Learning for Detecting Referable Age-Related Macular Degeneration on Fundus Photographs: A Systematic Review and Meta-Analysis. Diagnostics. 2026; 16(4):633. https://doi.org/10.3390/diagnostics16040633
Chicago/Turabian StyleLuo, Wei-Ting, and Ting-Wei Wang. 2026. "Performance and Clinical Utility of Deep Learning for Detecting Referable Age-Related Macular Degeneration on Fundus Photographs: A Systematic Review and Meta-Analysis" Diagnostics 16, no. 4: 633. https://doi.org/10.3390/diagnostics16040633
APA StyleLuo, W.-T., & Wang, T.-W. (2026). Performance and Clinical Utility of Deep Learning for Detecting Referable Age-Related Macular Degeneration on Fundus Photographs: A Systematic Review and Meta-Analysis. Diagnostics, 16(4), 633. https://doi.org/10.3390/diagnostics16040633

