Augmented Decisions: AI-Enhanced Accuracy in Glaucoma Diagnosis and Treatment
Abstract
1. Introduction
2. Materials and Methods
- Validation (external, multicenter, and prospective).
- Calibration and clinical utility (calibration curves and decision curve analysis).
- Reporting (adherence to DECIDE-AI and TRIPOD-AI, when applicable).
- Bias management (case selection, disease spectrum, imbalances by ethnicity/age/myopia, data leakage, and overfitting).
- Robustness (subgroup and sensitivity analyses).
- Transferability (clinical domain, device/vendor, and image quality).
- Safety (uncertainty estimates and clinician hand-off).
- Evidence of impact (clinical or process endpoints).
3. Results
3.1. Diagnostic Imaging
3.2. Progression Prediction and Monitoring
3.3. Clinical Decision Support and LLMs
3.4. Teleophthalmology and Remote Monitoring
3.5. Explainable and Federated AI
3.6. Emerging Insights
4. Regulatory, Ethical, and Economic Considerations
5. Discussion
5.1. Overall Synthesis of Findings
5.2. Comparison with Current Gold-Standard Practice
5.3. Strengths and Limitations of the Evidence Base
5.4. Implementation Barriers Mapped to the NASSS Framework
5.5. Gaps and Priorities for Future Research
- Randomized controlled trials (RCTs). Large pragmatic randomized trials of AI-based glaucoma triage have not yet begun. Multicenter studies with thousands of participants from different regions will be needed to assess effects on vision-related quality of life. In the meantime, smaller investigator-initiated RCTs can compare AI-guided escalation with usual care in fast progressors [16].
- Multimodal fusion. Future multimodal fusion strategies—including corneal biomechanics, OCT angiography, and IOP telemetry—could enhance predictive accuracy beyond the current AUC of 0.832 achieved by Koornwinder et al. using RNFL OCT and EHR data [42].
- Fairness auditing and bias mitigation. Few papers publish stratified metrics by ethnicity, axial length, or disc size. Recent ethics commentaries call for routine bias audits and exploration of counterfactual or reweighting techniques [19].
- Economic and environmental sustainability. Although not directly addressed, lifecycle analyses comparing cloud inference with edge devices could complement privacy-preserving frameworks by supporting environmentally sustainable AI deployment in ophthalmology [52].
- Regulatory science. Under the EU AI Act, any substantial modification to a high-risk AI system triggers a renewed conformity assessment, effectively enforcing a locked release cycle and a formal change control plan to accompany each update [24].
5.6. Limitations of This Review
5.7. Clinical and Policy Implications
5.8. Summary
6. Conclusions and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tham, Y.C.; Li, X.; Wong, T.Y.; Quigley, H.A.; Aung, T.; Cheng, C.Y. Global prevalence of glaucoma and projections of glaucoma burden through 2040: A systematic review and meta analysis. Ophthalmology 2014, 121, 2081–2090. [Google Scholar] [CrossRef]
- Davey, P.G.; Ranganathan, A. Feast your eyes: Diet and nutrition for optimal eye health. Front. Nutr. 2025, 12, 1579901. [Google Scholar] [CrossRef]
- Vision Loss Expert Group of the Global Burden of Disease Study; The GBD 2019 Blindness and Vision Impairment Collaborators. Global estimates on the number of people blind or visually impaired by glaucoma: A meta-analysis from 2000 to 2020. Eye 2024, 38, 2036–2046. [Google Scholar] [CrossRef]
- Shan, S.; Wu, J.; Cao, J.; Feng, Y.; Zhou, J.; Luo, Z.; Song, P.; Rudan, I.; Global Health Epidemiology Research Group (GHERG). Global incidence and risk factors for glaucoma: A systematic review and meta-analysis of prospective studies. J. Glob. Health 2024, 14, 04252. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Huang, X.M.; Yang, B.F.; Zheng, W.L.; Liu, Q.; Xiao, F.; Ouyang, P.W.; Li, M.J.; Li, X.Y.; Meng, J.; Zhang, T.T.; et al. Cost-effectiveness of artificial intelligence screening for diabetic retinopathy in rural China. BMC Health Serv. Res. 2022, 22, 260. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Stuermer, L.; Braga, S.; Martin, R.; Wolffsohn, J.S. Artificial intelligence virtual assistants in primary eye care practice. Ophthalmic Physiol. Opt. 2025, 45, 437–449. [Google Scholar] [CrossRef] [PubMed]
- Thompson, A.C.; Jammal, A.A.; Medeiros, F.A. A review of deep learning for screening, diagnosis, and detection of glaucoma progression. Transl. Vis. Sci. Technol. 2020, 9, 42. [Google Scholar] [CrossRef] [PubMed]
- Lan, C.H.; Chiu, T.H.; Yen, W.T.; Lu, D.W. Artificial intelligence in glaucoma: Advances in diagnosis, progression forecasting, and surgical outcome prediction. Int. J. Mol. Sci. 2025, 26, 4473. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, V.; Iyengar, S.; Rasheed, H.; Apolo, G.; Li, Z.; Kumar, A.; Nguyen, H.; Bohner, A.; Bolo, K.; Dhodapkar, R.; et al. Comparison of deep learning and clinician performance for detecting referable glaucoma from fundus photographs in a safety net population. Ophthalmol. Sci. 2025, 5, 100751. [Google Scholar] [CrossRef]
- Hussain, S.; Chua, J.; Wong, D.; Lo, J.; Kadziauskiene, A.; Asoklis, R.; Barbastathis, G.; Schmetterer, L.; Yong, L. Predicting glaucoma progression using deep learning framework guided by generative algorithm. Sci. Rep. 2023, 13, 19960. [Google Scholar] [CrossRef]
- Ashtari Majlan, M.; Dehshibi, M.M.; Masip, D. Deep learning and computer vision for glaucoma detection: A review. Comput. Vis. Image Underst. 2024, 244, 104605. [Google Scholar] [CrossRef]
- Girard, M.J.A.; Schmetterer, L. Artificial intelligence and deep learning in glaucoma: Current state and future prospects. Prog. Brain Res. 2020, 257, 37–64. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Li, L.; Wormstone, I.M.; Qiao, C.; Zhang, C.; Liu, P.; Li, S.; Wang, H.; Mou, D.; Pang, R.; et al. Development and validation of a deep learning system to detect glaucomatous optic neuropathy using fundus photographs. JAMA Ophthalmol. 2019, 137, 1353–1360. [Google Scholar] [CrossRef] [PubMed]
- Martucci, A.; Gallo Afflitto, G.; Pocobelli, G.; Aiello, F.; Mancino, R.; Nucci, C. Lights and shadows on artificial intelligence in glaucoma: Transforming screening, monitoring, and prognosis. J. Clin. Med. 2025, 14, 2139. [Google Scholar] [CrossRef]
- Ran, A.R.; Wang, X.; Chan, P.P.; Yeung, D.; Ma, J.; Yeung, A.C.; Liu, Z.; Chan, W.; Tham, C.C.; Cheung, C.Y.-L.; et al. Developing a privacy preserving deep learning model for glaucoma detection: A multicentre study with federated learning. Br. J. Ophthalmol. 2024, 108, 1114–1123. [Google Scholar] [CrossRef]
- Chen, J.S.; Baxter, S.L.; van den Brandt, A.; Lieu, A.; Camp, A.S.; Do, J.L.; Welsbie, D.S.; Moghimi, S.; Christopher, M.; Weinreb, R.N.; et al. Usability and Clinician Acceptance of a Deep Learning-Based Clinical Decision Support Tool for Predicting Glaucomatous Visual Field Progression. J. Glaucoma 2023, 32, 151–158. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Huang, A.S.; Hirabayashi, K.; Barna, L.; Ma, D.S.; Schmetterer, L.; Chang, A.C.; Wilson, G.A.; Durkin, S.R.; Muthiah, N.; Shieh, A.; et al. Assessment of a large language model’s responses to questions and cases about glaucoma and retina management. JAMA Ophthalmol. 2024, 142, 371–375. [Google Scholar] [CrossRef]
- Wong, C.Y.T.; Antaki, F.; Woodward Court, P.; Ong, A.Y.; Keane, P.A. The role of saliency maps in enhancing ophthalmologists’ trust in artificial intelligence models. Asia Pac. J. Ophthalmol. 2024, 13, 100087. [Google Scholar] [CrossRef]
- Veritti, D.; Rubinato, L.; Sarao, V.; De Nardin, A.; Foresti, G.L.; Lanzetta, P. Behind the mask: A critical perspective on the ethical, moral, and legal implications of artificial intelligence in ophthalmology. Graefes Arch. Clin. Exp. Ophthalmol. 2024, 262, 975–982. [Google Scholar] [CrossRef]
- Sabharwal, J.; Hou, K.; Herbert, P.; Bradley, C.; Johnson, C.A.; Wall, M.; Ramulu, P.Y.; Unberath, M.; Yohannan, J. A deep learning model incorporating spatial and temporal information successfully detects visual field worsening using a consensus based approach. Sci. Rep. 2023, 13, 1041. [Google Scholar] [CrossRef]
- Akkara, J.D.; Kuriakose, A. Rise of machine learning and artificial intelligence in ophthalmology. Indian J. Ophthalmol. 2019, 67, 1009–1010. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Sharma, A.; Mehta, S.; Kaushal, S.; Liebmann, J.M.; Cioffi, G.A.; Thakoor, K.A. Automated identification of clinically relevant regions in glaucoma OCT reports using expert eye tracking data and deep learning. Transl. Vis. Sci. Technol. 2024, 13, 24. [Google Scholar] [CrossRef] [PubMed]
- Leonard Hawkhead, B.; Higgins, B.E.; Wright, D.; Azuara Blanco, A. AI for glaucoma—Are we reporting well? A systematic literature review of DECIDE AI checklist adherence. Eye 2025, 39, 1070–1080. [Google Scholar] [CrossRef] [PubMed]
- Aboy, M.; Minssen, T.; Vayena, E. Navigating the EU AI Act: Implications for regulated digital medical products. npj Digit. Med. 2024, 7, 237. [Google Scholar] [CrossRef]
- Wang, R.; Bradley, C.; Herbert, P.; Hou, K.; Ramulu, P.; Breininger, K.; Unberath, M.; Yohannan, J. Deep learning-based identification of eyes at risk for glaucoma surgery. Sci. Rep. 2024, 14, 599. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Medeiros, F.A.; Jammal, A.A.; Thompson, A.C. From machine to machine: An OCT trained deep learning algorithm for objective quantification of glaucomatous damage in fundus photographs. Ophthalmology 2019, 126, 513–521. [Google Scholar] [CrossRef]
- Rao, D.P.; Shroff, S.; Savoy, F.M.; Shruthi, S.; Hsu, C.K.; Negiloni, K.; Pradhan, Z.S.; Jayasree, P.V.; Sivaraman, A.; Rao, H.L. Evaluation of an offline, artificial intelligence system for referable glaucoma screening using a smartphone based fundus camera: A prospective study. Eye 2024, 38, 1104–1111. [Google Scholar] [CrossRef]
- Senthil, S.; Rao, D.P.; Savoy, F.M.; Negiloni, K.; Bhandary, S.; Chary, R.; Chandrashekar, G. Evaluating real world performance of an automated offline glaucoma AI on a smartphone fundus camera across glaucoma severity stages. PLoS ONE 2025, 20, e0324883. [Google Scholar] [CrossRef]
- Hemelings, R.; Elen, B.; Barbosa-Breda, J.; Blaschko, M.B.; De Boever, P.; Stalmans, I. Deep learning on fundus images detects glaucoma beyond the optic disc. Sci. Rep. 2021, 11, 20313. [Google Scholar] [CrossRef] [PubMed]
- Pascal, L.; Perdomo, O.J.; Bost, X.; Huet, B.; Otálora, S.; Zuluaga, M.A. Multi task deep learning for glaucoma detection from color fundus images. Sci. Rep. 2022, 12, 12361. [Google Scholar] [CrossRef]
- Yousefi, S.; Kiwaki, T.; Zheng, Y.; Sugiura, H.; Asaoka, R.; Murata, H.; Lemij, H.; Yamanishi, K. Detection of longitudinal visual field progression in glaucoma using machine learning. Am. J. Ophthalmol. 2018, 193, 71–79. [Google Scholar] [CrossRef] [PubMed]
- Lee, T.; Jammal, A.A.; Mariottoni, E.B.; Medeiros, F.A. Predicting glaucoma development with longitudinal deep learning predictions from fundus photographs. Am. J. Ophthalmol. 2021, 225, 86–94. [Google Scholar] [CrossRef]
- Delsoz, M.; Raja, H.; Madadi, Y.; Tang, A.A.; Wirostko, B.M.; Kahook, M.Y.; Yousefi, S. The use of ChatGPT to assist in diagnosing glaucoma based on clinical case reports. Ophthalmol. Ther. 2023, 12, 3121–3132. [Google Scholar] [CrossRef]
- Zhang, J.; Ma, Y.; Zhang, R.; Chen, Y.; Xu, M.; Su, R.; Ma, K. A comparative study of GPT 4o and human ophthalmologists in glaucoma diagnosis. Sci. Rep. 2024, 14, 30385. [Google Scholar] [CrossRef]
- Dolar Szczasny, J.; Drab, A.; Rejdak, R. Home monitoring/remote optical coherence tomography in tele ophthalmology in patients with eye disorders: A systematic review. Front. Med. 2024, 11, 1442758. [Google Scholar] [CrossRef]
- Herbert, P.; Hou, K.; Bradley, C.; Hager, G.; Boland, M.V.; Ramulu, P.; Unberath, M.; Yohannan, J. Forecasting risk of future rapid glaucoma worsening using early visual field, OCT, and clinical data. Ophthalmol. Glaucoma 2023, 6, 466–473. [Google Scholar] [CrossRef]
- Shi, M.; Tian, Y.; Luo, Y.; Elze, T.; Wang, M. RNFLT2Vec: Artifact corrected representation learning for retinal nerve fiber layer thickness maps. Med. Image Anal. 2024, 94, 103110. [Google Scholar] [CrossRef]
- Ha, A.; Sun, S.; Kim, Y.K.; Jeoung, J.W.; Kim, H.C.; Park, K.H. Deep learning based prediction of glaucoma conversion in normotensive glaucoma suspects. Br. J. Ophthalmol. 2024, 108, 927–932. [Google Scholar] [CrossRef]
- Mandal, S.; Jammal, A.A.; Malek, D.; Medeiros, F.A. Progression or aging? A deep learning approach for distinguishing glaucoma progression from age related changes in OCT scans. Am. J. Ophthalmol. 2024, 266, 46–55. [Google Scholar] [CrossRef] [PubMed]
- Mariottoni, E.B.; Datta, S.; Shigueoka, L.S.; Jammal, A.A.; Tavares, I.M.; Henao, R.; Carin, L.; Medeiros, F.A. Deep learning–assisted detection of glaucoma progression in spectral domain OCT. Ophthalmol. Glaucoma 2023, 6, 228–238. [Google Scholar] [CrossRef] [PubMed]
- Sriwatana, K.; Puttanawarut, C.; Suwan, Y.; Achakulvisut, T. Explainable deep learning for glaucomatous visual field prediction: Artifact correction enhances transformer models. Transl. Vis. Sci. Technol. 2025, 14, 22. [Google Scholar] [CrossRef]
- Koornwinder, A.; Zhang, Y.; Ravindranath, R.; Chang, R.T.; Bernstein, I.A.; Wang, S.Y. Multimodal artificial intelligence models predicting glaucoma progression using electronic health records and retinal nerve fiber layer scans. Transl. Vis. Sci. Technol. 2025, 14, 27. [Google Scholar] [CrossRef] [PubMed]
- Yuksel Elgin, C. Democratizing glaucoma care: A framework for AI driven progression prediction across diverse healthcare settings. J. Ophthalmol. 2025, 2025, 9803788. [Google Scholar] [CrossRef]
- Anton, N.; Lisa, C.; Doroftei, B.; Pîrvulescu, R.A.; Barac, R.I.; Lungu, I.I.; Bogdănici, C.M. The role of artificial intelligence in predicting the progression of intraocular hypertension to glaucoma. Life 2025, 15, 865. [Google Scholar] [CrossRef]
- Shie, S.S.; Su, W.W. High accuracy digitization of Humphrey visual field reports using convolutional neural networks. Transl. Vis. Sci. Technol. 2025, 14, 6. [Google Scholar] [CrossRef]
- Akter, N.; Gordon, J.; Li, S.; Poon, M.; Perry, S.; Fletcher, J.; Chan, T.; White, A.; Roy, M. Glaucoma detection and staging from visual field images using machine learning techniques. PLoS ONE 2025, 20, e0316919. [Google Scholar] [CrossRef]
- Jalili, J.; Jiravarnsirikul, A.; Bowd, C.; Chuter, B.; Belghith, A.; Goldbaum, M.H.; Baxter, S.L.; Weinreb, R.N.; Zangwill, L.M.; Christopher, M. Glaucoma detection and feature identification via GPT 4V fundus image analysis. Ophthalmol. Sci. 2025, 5, 100667. [Google Scholar] [CrossRef] [PubMed]
- Boverhof, B.J.; Corro Ramos, I.; Vermeer, K.A.; de Vries, V.A.; Klaver, C.C.W.; Ramdas, W.D.; Lemij, H.G.; Rutten-van Mölken, M. The Cost-Effectiveness of an Artificial Intelligence-Based Population-Wide Screening Program for Primary Open-Angle Glaucoma in The Netherlands. Value Health 2025, 28, 1317–1326. [Google Scholar] [CrossRef] [PubMed]
- Liu, H.; Li, R.; Zhang, Y.; Zhang, K.; Yusufu, M.; Liu, Y.; Mou, D.; Chen, X.; Tian, J.; Li, H.; et al. Economic evaluation of combined population-based screening for multiple blindness-causing eye diseases in China: A cost-effectiveness analysis. Lancet Glob. Health 2023, 11, e240–e249. [Google Scholar] [CrossRef] [PubMed]
- El Arab, R.A.; Al Moosa, O.A. Systematic review of cost effectiveness and budget impact of artificial intelligence in healthcare. npj Digit. Med. 2025, 8, 548. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Greenhalgh, T.; Wherton, J.; Papoutsi, C.; Lynch, J.; Hughes, G.; A’Court, C.; Hinder, S.; Fahy, N.; Procter, R.; Shaw, S. Beyond adoption: A new framework for theorising and evaluating non adoption, abandonment, and challenges to the scale up, spread, and sustainability of health and care technologies. J. Med. Internet Res. 2017, 19, e367. [Google Scholar] [CrossRef] [PubMed]
- Teo, Z.L.; Zhang, X.; Yang, Y.; Jin, L.; Zhang, C.; Poh, S.S.J.; Yu, W.; Chen, Y.; Jonas, J.B.; Wang, Y.X.; et al. Privacy preserving technology using federated learning and blockchain in protecting against adversarial attacks for retinal imaging. Ophthalmology 2025, 132, 484–494. [Google Scholar] [CrossRef] [PubMed]
- Akerman, M.A.; Choudhary, S.; Liebmann, J.M.; Medeiros, F.A.; Moghimi, S.; Bowd, C.; Schlottmann, P.G.; Weinreb, R.N. Extracting decision making features from the unstructured eye movements of clinicians on glaucoma OCT reports and developing AI models to classify expertise. Front. Med. 2023, 10, 1251183. [Google Scholar] [CrossRef] [PubMed]
Aspect | Description |
---|---|
Databases searched | PubMed/MEDLINE, Scopus, IEEE Xplore |
Timeframe | 1 January 2019–1 July 2025 |
Search terms | glaucoma, artificial intelligence, machine learning, deep learning, augmented intelligence, OCT, fundus, visual field, decision support, teleophthalmology (combined with Boolean operators) |
Inclusion criteria | Articles published in peer-reviewed journals |
Study (Year) | Model Type | Input Modality | Primary Task | Performance | Advantages | Limitations |
---|---|---|---|---|---|---|
Liu et al. (2019) [13] | GD-CNN (ResNet-based) | Fundus photographs | Detection of glaucomatous optic neuropathy | AUC 0.996 (internal); 0.995–0.987 (clinical); 0.964 (population); 0.923 (multi-ethnic). Sens 93.6–96.2%; Spec 95.6–97.7% (drops to Sens 87.7%, Spec 80.8% on multi-ethnic; 82%/70% on low-quality images) | Very large (241,032) training set; multiple external validations; online learning module | Needs centered, good-quality fundus images; performance falls on multi-ethnic and low-quality datasets |
Sabharwal et al. (2023) [20] | CNN–LSTM | Serial SAP deviation maps | Detection of VF worsening | AUC 0.94 (full data) vs. 0.82 mixed effects, p < 0.001; AUC 0.78 with six VFs removed | Spatiotemporal modeling; robust with fewer tests | Single-center; ≥7 reliable VFs; external validation pending |
Wang et al. (2024) [25] | Vision transformer + FC classifier | Baseline VF 24-2 grid + OCT RNFL grid + clinical and demographic data | Forecast need for glaucoma surgery | AUROC 0.92 (0–3 months); ≥0.85 up to 2 years; 0.76 at 4–5 years | Single-visit risk stratification; SHAP interpretability highlights IOP, MD, and RNFL | Tertiary center cohort; performance declines > 3 years; requires both OCT and VF |
Chen et al. (2023) [16] | Deep learning CDSS | EHR data + visual field metrics | Usability of a CDSS displaying predicted MD (from OCT) + clinical data | Likert trust = 3.27, Likert utility = 3.42, Likert willingness to decrease VF = 2.64, SUS = 66.1 ± 16.0 (43rd percentile) Tendency to maintain management in mild cases and to escalate in advanced cases, without formally measuring percentage changes from a predefined plan | Real-world integration; enhances shared decision-making | Single center, n = 10 clinicians/6 cases, demographically non-diverse sample (all white patients), usability study (not clinical outcomes) |
Ran et al. (2024) [15] | Federated 3D CNNs (FedProx) | Volumetric OCT scans (7 centers) | Glaucoma detection via FL | Accuracy per center 78–98%; two unseen test sets 81–88%; FL not inferior to the centralized model | Preserves data privacy; leverages multi-site OCT without data sharing | Needs FL infrastructure; performance still variable across sites; OCT required |
Study (Year) | Model Type | Input Modality | Primary Task | Performance | Advantages | Limitations |
---|---|---|---|---|---|---|
Liu et al. (2019) [13] | ResNet-based CNN | Fundus photographs | Detection of glaucomatous optic neuropathy | AUC up to 0.996 (internal); 0.964 (population); 0.923 (multi-ethnic). Sensitivity 93–96%; specificity 95–97% (drops to ~82/70% on low-quality images). | Very large training set (>240 k images); multiple external validations; online module for continuous learning. | Requires high-quality, centered fundus images; performance falls in multi-ethnic and low-quality datasets. |
Medeiros et al. (2019) [26] | ResNet-34 CNN | Fundus photographs → OCT regression | Cross-modal prediction of RNFL thickness | r ≈ 0.83; MAE ≈ 7 µm. | First demonstration of cross-modal learning; bridges fundus and OCT. | Single center; requires further external validation. |
Sabharwal et al. (2023) [20] | CNN–LSTM | Serial SAP deviation maps | Detection of visual field worsening | AUC 0.94 vs. 0.82 (linear model); robust even with fewer tests (AUC 0.78). | Spatiotemporal modeling; detects change earlier with limited data. | Single center; requires ≥ 7 reliable VFs; external validation pending. |
Wang et al. (2024) [25] | Vision transformer + classifier | Baseline VF + OCT RNFL + demographics | Forecast need for glaucoma surgery | AUROC 0.92 at 3 months; ≥0.85 up to 2 years; 0.76 at 4–5 years. | Single-visit risk stratification; interpretable (SHAP values highlighted IOP, MD, and RNFL). | Cohort from a tertiary center; performance drops at longer horizons; requires both VF and OCT. |
Rao et al. (2024) [27] | Offline CNN | Smartphone fundus camera | Referable glaucoma screening | Sensitivity 93.7%; specificity 85.6%; TN rate 94.7%. | Works offline; feasible in low-resource settings. | Performance depends on image quality; prospective trial but limited geography. |
Senthil et al. (2025) [28] | Offline CNN | Smartphone fundus camera | Real-world screening across glaucoma stages | Sensitivity 91.4%; specificity 94.1%. | Specialist-confirmed cohort; good performance across stages. | False negatives in early/moderate stages; limited to Indian clinics. |
Ran et al. (2024) [15] | Federated 3D CNNs (FedProx) | Multicenter OCT scans (7 sites) | Glaucoma detection via federated learning | Accuracy 78–98% per center; 81–88% on unseen test sets; FL non-inferior to centralized model. | Preserves privacy; leverages data across centers without sharing raw images. | Requires federated infrastructure; variable performance across sites. |
Chen et al. (2023) [16] | Deep learning CDSS | EHR data + VF metrics | Usability of decision support for VF progression | Trust Likert 3.27; usefulness 3.42; SUS 66.1 (43rd percentile). | Real-world integration; supports shared decision-making. | Pilot usability study; only 10 clinicians, 6 cases; not clinical outcomes. |
Tian et al. (2024) [22] | U-Net trained on expert gaze | OCT glaucoma reports with eye tracking | Prediction of clinically relevant regions | Precision 0.72; recall 0.56; F1 0.61. | First use of expert gaze to guide saliency; enhances interpretability. | Still experimental; small dataset; requires validation for clinical use. |
Domain | Strengths | Weaknesses | Typical risks | Evidence maturity (TRL) | Next steps | Key Refs. |
---|---|---|---|---|---|---|
3.1 Diagnostic imaging (fundus and OCT) | Consistently high diagnostic accuracy (AUC ~0.90–0.97); validated across large datasets; newer models (multi-task CNNs, ViTs, and 3D U-Nets) are more efficient; multimodal approaches cut false positives. | Calibration rarely reported; metrics often image- rather than patient-based; performance falls on multi-ethnic or low-quality datasets; little attention to challenging phenotypes (NTG, high myopia, and small discs); poor vendor generalizability. | Spectrum bias from enriched case series; domain shift with new devices; risk of data leakage. | TRL 6–7: solid retrospective, some multicenter; few prospective trials with clinical endpoints. | Large prospective multi-vendor studies; patient-level metrics; decision curve analyses; subgroup robustness; explicit domain-shift protocols. | [7,9,11,14,23,26,29,30] |
3.2 Progression prediction and monitoring | Longitudinal models detect worsening earlier than linear methods; good sensitivity even with fewer tests; single-visit multimodal models predict outcomes, such as need for surgery; some provide individualized risk curves. | Mostly retrospective, often single-center; definitions of progression heterogeneous; little use of lead-time or net-benefit analyses; risk of temporal data leakage; limited testing in complex phenotypes (PXF, PACG, and NTG). | Temporal leakage; censoring bias; selective follow-up. | TRL 5–6: strong longitudinal cohorts but very few intervention studies. | Pragmatic RCTs on follow-up schedules and treatment impact; adoption of decision curve analysis; harmonized progression definitions; analyses in complex phenotypes. | [20,31,32] |
3.3 Clinical decision support and LLMs | CDSS judged useful, especially in complex cases; LLMs perform at subspecialist level in knowledge tasks; first prototypes integrated. | Small samples; usability studies dominate over outcome-driven trials; lack of calibration and uncertainty estimates; risk of hallucination; no standardized human-in-the-loop workflows; medico-legal concerns. | Over-reliance on uncalibrated output; drift in EHR data; liability uncertainties. | TRL 4–5: promising prototypes, but impact evidence still limited. | Clinical trials measuring process and patient outcomes; explicit uncertainty and calibration; governance for medico-legal accountability; periodic audits. | [16,17,33,34] |
3.4 Teleophthalmology and remote monitoring | Offline/on-device algorithms with high sensitivity and specificity; feasibility shown in low-resource settings; home OCT highly concordant with in-clinic OCT; early multimodal models forecast rapid worsening. | Evidence geographically concentrated (mainly India); performance tied to image quality; short follow-up; limited data on adherence; little cost-effectiveness outside Asia. | Workload redistribution; image quality variability; digital divide; self-selection of adherent users. | TRL 6: early prospective/real-world studies, but community-level outcomes still lacking. | Multicenter community trials; cost-effectiveness in different regions; structured referral pathways; long-term adherence data. | [5,27,28,35,36] |
3.5 Explainable and federated AI | Saliency and attention maps often overlap with clinically meaningful regions; federated learning preserves privacy and achieves non-inferior accuracy; first conceptual frameworks for privacy-preserving deployment. | No standard metrics for XAI; saliency maps not proven to improve trust or accuracy; FL infrastructure demanding; regulatory pathway for continuously updated models unclear; little evidence on clinical impact. | Misleading saliency; inter-site heterogeneity; cybersecurity threats; weak version control. | TRL 4–6: proof-of-concept strong, but long-term maintainability and impact evidence thin. | Shared benchmarks for XAI; routine bias audits; studies on maintainability and change control (EU AI Act and FDA GMLP); scalable FL implementations. | [15,18,19,22,24] |
Theme | Study (Year) | Contribution | Limitations | Future Directions |
---|---|---|---|---|
Diagnosis | Liu et al. (2019) [13] | CNN on >240 k fundus photos, AUROC ~0.96–0.99; robust across datasets. | Performance drops in low-quality and multi-ethnic datasets. | Larger, more diverse cohorts to improve generalizability. |
Nguyen et al. (2025) [9] | Head-to-head: AI matched/surpassed 13 clinicians in detecting referable glaucoma. | Tested retrospectively; generalizability beyond safety-net populations unclear. | Prospective validation in routine care. | |
Medeiros et al. (2019) [26] | Cross-modal CNN predicting OCT RNFL thickness from fundus images (r ≈ 0.83). | Proof of concept; not validated in real-world settings. | Extend to multimodal integration and prospective testing. | |
Progression | Sabharwal et al. (2023) [20] | CNN-LSTM detected VF worsening (AUC 0.94), outperforming mixed-effects models. | Single center; requires ≥ 7 reliable VFs. | Multicenter validation and inclusion of fewer tests. |
Yousefi et al. (2018) [31] | Machine learning VF progression index detected deterioration ~1.7 years earlier. | Retrospective; limited external validation. | Incorporate into clinical workflows to reduce detection lag. | |
Decision Support and LLMs | Chen et al. (2023) [16] | GLANCE CDSS dashboard judged usable; supported but did not replace management. | Usability study only; small clinician sample. | Larger trials assessing impact on patient outcomes. |
Huang et al. (2024) [17]; Delsoz et al. (2023) [33] | GPT-4 and ChatGPT matched or surpassed subspecialists on clinical cases. | Requires expert oversight; limited datasets. | Define safe integration of LLMs into decision-making. | |
Zhang et al. (2024) [34] | GPT-4o comparable to experienced ophthalmologists in differential diagnosis. | Lower absolute accuracy vs. clinicians; single center. | Evaluate LLM utility in complex cases with multicenter data. | |
Teleophthalmology | Rao et al. (2024) [27] | Smartphone-based AI achieved 93.7% sensitivity, 85.6% specificity. | Tested in Indian cohorts only. | Broader validation in community screening programs. |
Senthil et al. (2025) [28] | Real-world study confirmed 91.4% sensitivity, 94.1% specificity. | False negatives in early/moderate stages. | Improve sensitivity for early disease. | |
Dolar Szczasny et al. (2024) [35] | Systematic review: home OCT devices highly concordant with in-clinic OCT. | Pilot studies, short follow-up. | Longitudinal trials to test real-world adoption. | |
Explainable and Federated AI | Ran et al. (2024) [15] | Federated learning across 7 centers preserved privacy, accuracy up to 98%. | Infrastructure heavy; performance varied across sites. | Scalable FL frameworks for clinical use. |
Wong et al. (2024) [18] | Review: saliency maps do not reliably improve trust or accuracy. | Low empirical support; risk of misleading overlays. | Standardized metrics for XAI evaluation. | |
Teo et al. (2025) [52] | FL + blockchain framework proposed for adversarial protection. | Conceptual; not yet clinically tested. | Pilot implementations in resource-constrained settings. | |
Akerman et al. (2023) [53] | Eye tracking revealed expert gaze patterns informative for AI training. | Exploratory; small sample. | Human–AI interaction science to refine interpretability. | |
Reporting and Evidence Quality | Leonard Hawkhead et al. (2025) [23] | Systematic review: DECIDE-AI adherence ~45% overall, ~30% for AI-specific items. | Highlights poor reporting standards. | Enforce standardized reporting for AI clinical trials. |
Stuermer et al. (2025) [6] | Review: AI virtual assistants in primary care raise cost and workflow concerns. | No outcome studies; mainly conceptual. | Prospective health–economic evaluations. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeppieri, M.; Gagliano, C.; Tognetto, D.; Musa, M.; Avitabile, A.; D’Esposito, F.; Nicolosi, S.G.; Capobianco, M. Augmented Decisions: AI-Enhanced Accuracy in Glaucoma Diagnosis and Treatment. J. Clin. Med. 2025, 14, 6519. https://doi.org/10.3390/jcm14186519
Zeppieri M, Gagliano C, Tognetto D, Musa M, Avitabile A, D’Esposito F, Nicolosi SG, Capobianco M. Augmented Decisions: AI-Enhanced Accuracy in Glaucoma Diagnosis and Treatment. Journal of Clinical Medicine. 2025; 14(18):6519. https://doi.org/10.3390/jcm14186519
Chicago/Turabian StyleZeppieri, Marco, Caterina Gagliano, Daniele Tognetto, Mutali Musa, Alessandro Avitabile, Fabiana D’Esposito, Simonetta Gaia Nicolosi, and Matteo Capobianco. 2025. "Augmented Decisions: AI-Enhanced Accuracy in Glaucoma Diagnosis and Treatment" Journal of Clinical Medicine 14, no. 18: 6519. https://doi.org/10.3390/jcm14186519
APA StyleZeppieri, M., Gagliano, C., Tognetto, D., Musa, M., Avitabile, A., D’Esposito, F., Nicolosi, S. G., & Capobianco, M. (2025). Augmented Decisions: AI-Enhanced Accuracy in Glaucoma Diagnosis and Treatment. Journal of Clinical Medicine, 14(18), 6519. https://doi.org/10.3390/jcm14186519