Abstract
Purpose: To evaluate the performance of deep learning (DL) in diagnosing glaucoma and predicting its progression using fundus photography and retinal optical coherence tomography (OCT) images. Materials and Methods: Relevant studies published up to 30 October 2024 were retrieved from PubMed, Medline, EMBASE, Cochrane Library, Web of Science, and ClinicalKey. A bivariate random-effects model was employed to calculate pooled sensitivity, specificity, positive and negative likelihood ratios, and area under the receiver operating characteristic curve (AUROC). Results: A total of 48 studies were included in the meta-analysis. DL algorithms demonstrated high diagnostic performance in glaucoma detection using fundus photography and OCT images. For fundus photography, the pooled sensitivity and specificity were 0.92 (95% CI: 0.89–0.94) and 0.93 (95% CI: 0.90–0.95), respectively, with an AUROC of 0.90 (95% CI: 0.88–0.92). For the OCT imaging, the pooled sensitivity and specificity were 0.90 (95% CI: 0.84–0.94) and 0.87 (95% CI: 0.81–0.91), respectively, with an AUROC of 0.86 (95% CI: 0.83–0.90). In predicting glaucoma progression, DL models generally showed less robust performance, with pooled sensitivities and specificities ranging lower than in diagnostic tasks. Internal validation datasets showed higher accuracy than external validation datasets. Conclusions: DL algorithms achieve excellent performance in diagnosing glaucoma using fundus photography and OCT imaging. To enhance the prediction of glaucoma progression, future DL models should integrate multimodal data, including functional assessments, such as visual field measurements, and undergo extensive validation in real-world clinical settings.
1. Introduction
Glaucoma leads to irreversible vision impairment across the globe, with projections indicating its impact on over 111 million individuals by 2040 [1]. Due to the increasing disease burden of glaucoma worldwide, the accurate detection and diagnosis of glaucoma are essential. The diagnosis of glaucoma involves multiple diagnostic modalities, with the intraocular pressure (IOP) measurement being an important risk factor, alongside assessments such as optic disc evaluation, optical coherence tomography (OCT), and Humphrey’s visual field analysis [2]. In particular, cases with normal tension glaucoma require a comprehensive evaluation of optic nerve structure and visual field function, as IOP may fall within the normal range [3]. Therefore, widespread glaucoma evaluation by certified clinicians is resource-consuming and skill-dependent and, sometimes, inconsistent [4,5].
Deep learning (DL) is a subset of artificial intelligence (AI) that involves training neural networks to learn patterns and make predictions based on large datasets [6]. It can learn intricate features directly from raw data, making it suitable for tasks involving large and complex datasets, such as medical image analysis [6]. DL can be utilized in glaucoma detection via analyzing fundus photographs or OCT scans [7] by recognizing subtle structural changes indicative of glaucoma, such as optic nerve head morphology (cup-to-disc ratio) or retinal nerve fiber layer thickness [8]. Additionally, with longitudinal visual field and clinical data, DL models may extract spatiotemporal features that may provide better assessments of glaucoma progression [9].
Despite promising results from various studies, the translation of AI into clinical practice remains limited [10,11]. Conducting an updated meta-analysis will help consolidate existing evidence and identify factors that influence the performance of AI. This is important now, as recent advancements in DL models have significantly improved their ability to analyze complex imaging data, offering enhanced accuracy and efficiency in glaucoma detection and progression prediction [12,13]. Additionally, the growing availability of large, annotated datasets [14,15] provides an unprecedented opportunity to train and validate these models, necessitating a consolidation of the evidence to evaluate their generalizability and clinical applicability.
Previous meta-analyses have explored the use of AI in glaucoma diagnosis [10,16,17], highlighting the generally strong performance of AI algorithms in detecting glaucoma using fundus or OCT imaging. However, these studies have primarily focused on diagnostic accuracy, with limited analysis on glaucoma progression. This gap in the literature has guided our study design, motivating us to evaluate the potential of DL algorithms to not only diagnose glaucoma but also assess its progression.
Our primary aim is to evaluate the overall recent performance of DL algorithms in diagnosing glaucoma and identifying its progression using fundus photography and OCT, as these are the most commonly available apparatuses in clinical programs worldwide. Through this evaluation, we seek to further understand the potential utility of AI in real-world glaucoma assessment and management.
2. Materials and Methods
We describe the protocols, strategies, and analytical methods used to identify and assess studies related to the application of DL in glaucoma detection and progression prediction here. This section outlines the systematic approach to study selection, data collection, and quality assessment, followed by the analytical procedures for synthesizing results.
2.1. Protocol and Registration
The study was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy (PRISMA-DTA) extension guidelines [18]. Institutional Review Board (IRB) approval and informed consent were not required. This study protocol was registered in the International Prospective Register of Systematic Reviews (PROSPERO ID: CRD42024510390). All data were extracted from past publications, and all research adhered to the Declaration of Helsinki. These measures helped establish the transparency and reproducibility of the research process.
2.2. Search Strategy for Identifying Studies
Relevant studies published through 30 October 2024 regarding the utilization of DL or AI in glaucoma detection, diagnosis, or progression prediction were retrieved from PubMed, Medline, EMBASE, Cochrane Library, Web of Science (WoS), and ClinicalKey.
Search keywords comprised a combination of terms associated with glaucoma—“Glaucoma”, “Open-angle glaucoma”, and “Glaucomatous optic neuropathy”—with the following keywords associated with artificial intelligence: “Deep learning”, “Machine learning”, and “Artificial intelligence”. Terms involving outcomes were also used, such as “Glaucoma diagnosis” and “Glaucoma Progression”. No language or time restrictions were applied. The detailed search strategy is provided in the Supplemental Material.
2.3. Eligibility Criteria and Study Selection
The study selection, data collection, and quality assessment were carried out by the authors XCL and YSL. After removing duplicates, titles and abstracts were screened for eligibility based on the publication type and reported outcomes. The eligible studies included full-length journal articles presenting findings on the performance of DL in distinguishing glaucoma from the normal population using automated analyses of fundus or OCT images. In terms of progression predictions, the eligible studies were those that reported glaucoma progression in terms of categorical outcomes such as “no progression” or “progression/conversion”. Articles were excluded if they did not assess primary open-angle glaucoma. Case reports, guidelines, conference papers, letters, editorials, and review articles were excluded. Additionally, studies that did not provide primary outcomes related to predicting glaucoma onset, detecting glaucoma progression, or distinguishing glaucoma from other ocular diseases were omitted from consideration.
2.4. Data Collection
Data collection was conducted during the full-text review. The extracted variables from each study, when available, included the first author’s name, publication year, country of origin, type of images (fundus or OCT), primary DL classifier, total size of image data, data sources for training and validation, total size of testing data, ground truth information, validation type (internal or external), as well as the sensitivity, specificity, and area under the receiver operating characteristic (AUROC) curve of the most effective DL algorithm outcomes obtained from each validation or testing dataset. We thoroughly reviewed each full-text article included in the study to identify irrelevant outcomes and data insufficiency, defined as the absence of adequate information required to construct a 2 × 2 contingency table for the testing results.
2.5. Risk of Bias Assessment
We evaluated the quality of each incorporated study utilizing the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool. QUADAS-2 encompasses four key domains for assessment: patient selection, index test, reference standard, and flow and timing. The risk of bias for each domain was rated as low, high, or unclear based on the information available in each study. Studies with low risk across most domains were considered methodologically robust, while high or unclear risks indicated potential bias or gaps in reporting. Applicability concerns were similarly assessed to determine the relevance of the study populations, diagnostic tests, and reference standards to clinical practice. The findings from the QUADAS-2 assessments were visualized using traffic-light plots to summarize the proportion of studies with low, high, or unclear risk in each domain. Additionally, we assessed the general applicability of the studies to the population of interest, ensuring that the findings would be relevant to real-world clinical practice.
2.6. Analysis and Data Synthesis
A bivariate random-effects model was utilized to perform a meta-analysis on the diagnostic and progression prediction performance of DL in glaucoma. The bivariate method was used to model logit-transformed sensitivity and specificity together, addressing the negative correlation resulting from different threshold settings across various studies. Data from different image types (fundus and OCT) were analyzed separately. When multiple DL algorithms or feature selections were tested, the best-performing DL testing results of each dataset were chosen. Multiple datasets from a study were documented separately and were considered independent data for the meta-analysis.
We calculated the pooled sensitivity, specificity, positive and negative likelihood ratios (LRs), and AUROC, with the results presented through hierarchical summary receiver operating characteristic (sROC) curve plots. The sROC curves were generated using a bivariate random-effects model, which accounts for the correlation between sensitivity and specificity across the studies. This approach enables the estimation of a summary ROC curve that reflects the overall performance of the models while addressing the inter-study heterogeneity. We further quantified the variation of effect size due to inter-study heterogeneity with forest plots, providing visual insights into study-level variability and the pooled estimates.
To assess the risk of publication bias, Deeks’ funnel plots were used. We used Egger’s test, which applies a rank correlation to test for publication bias; thus, it is less likely to produce false positives. We employed DeLong’s test, a non-parametric statistical method, for comparing the calculated performance metrics. Given the multiple comparisons conducted across several metrics and datasets, we applied a Bonferroni correction to adjust for the increased risk of type I errors. A data synthesis and analysis were performed using RevMan, Version 5.4 (Cochrane). A 2-sided p-value of 0.05 indicated statistical significance.
3. Results
The results of this meta-analysis are structured into distinct subsections to provide clarity into the steps of the analysis, from the initial search to the final performance outcomes of the DL models.
3.1. Search Results
The literature search yielded 2781 records in total from PubMed, EMBASE, and Web of Science. After excluding 1088 duplicates, the titles and abstracts of 1693 studies were screened for eligibility. A total of 1509 records were excluded due to non-relevant titles and abstracts. Three reports could not be retrieved in full even after requesting them from the authors. Following screening, a total of 112 records were excluded, including 28 due to insufficient data to derive the sensitivity and specificity, 8 because they focused on angle-closure glaucoma, 5 because they were letters or reviews, and 71 due to inaccessible or insufficiently reported data. Finally, 72 studies were included for a systematic review, and among these, 48 were integrated for a data analysis through a meta-analysis, as a 2 × 2 contingency table could not be constructed for the other studies. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flowchart of the study selection is presented in Figure 1.
Figure 1.
PRISMA 2020 flow diagram for systematic reviews and meta-analysis.
3.2. Study Characteristics
The characteristics of all included studies are summarized in Table 1 and Table 2. A total of 55 datasets from 42 studies used fundus or OCT imaging DL algorithms to diagnose glaucoma (Table 1), while 8 datasets from 7 studies used these algorithms in glaucoma progression prediction (Table 2). Across all studies evaluating diagnostic performance, 2,229,810 fundus images and 51,386 OCT images were used for training, while fundus images from 194,259 subjects and OCT images from 17,882 subjects were used for testing. For the studies evaluating glaucoma progression, a total of 22,422 fundus and 86,123 OCT images were used for training, while fundus images from 841 subjects and 34,847 OCT images were used for testing.
Table 1.
Characteristics of selected studies (datasets) used in glaucoma diagnosis.
Table 2.
Characteristics of selected articles (datasets) used in glaucoma progression prediction.
Out of the 63 datasets from 48 studies included, 22 datasets (34.92%) utilized external testing to evaluate AI performance in diagnosis or progression prediction, which means that the tested datasets were different from the training datasets. The included studies employed a diverse range of DL architectures. Notably, convolutional neural networks (CNNs) were a common choice (58.33%) due to their proven efficacy in image-based diagnostics. Other ensemble methods for image analysis included modules such as XGBoost, random forest, gradient boosting, and many more [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66].
3.2.1. Risk of Bias and Applicability of Included Studies
The graphical results of the quality assessment are presented in Supplementary Figures S1 and S2. A majority of studies had a low risk of bias and concerns regarding applicability for the reference standard, according to the QUADAS-2 tool.
About 75% of the included studies demonstrated a low risk of bias in patient selection, indicating that the majority utilized appropriate inclusion and exclusion criteria to ensure representative study populations. A small proportion of the studies were classified as unclear or high risk of bias due to inadequate reporting of recruitment strategies or the use of convenience samples rather than random or consecutive sampling.
The index test domain assessed the risk of bias associated with the DL models used in the studies. Less than 25% of the studies were classified as (unclear) due to insufficient details regarding how the DL models were validated or thresholds were pre-specified. The applicability concerns were similarly low across most studies, indicating that the DL models were suitable for the intended diagnostic tasks.
The reference standard domain was evaluated to ensure that the criteria used to confirm glaucoma diagnoses or progression were reliable and unbiased. A large proportion of studies (>80%) had a low risk of bias, as they employed well-established diagnostic methods, such as clinical evaluations by glaucoma specialists or well-established imaging techniques.
The flow and timing domain was assessed for potential bias due to deviations in the timing between the index test and reference standard or issues related to participant attrition. A considerable number (about 20%) were classified as unclear due to insufficient reporting of the timing between tests or missing data.
3.2.2. Publication Bias
We utilized Deeks’ funnel plot asymmetry test to evaluate publication bias and to illustrate asymmetry in the funnel plot. From the funnel plot, shown in Supplementary Figure S3, the studies in this analysis appeared to be symmetrically distributed around the regression line, indicating no apparent publication bias. Egger’s test for overall publication bias revealed p = 0.55 (Supplementary Figure S3). Publication bias is considered significant if p < 0.10. This result suggested that the included studies were not disproportionately skewed toward those reporting more favorable or positive findings.
3.3. Overall Performance of DL Using Fundus Photography in Glaucoma Detection
The pooled summary of DL performance in glaucoma detection using fundus photography is shown in Table 3. Forest plots detailing the sensitivities and specificities of all datasets are shown in Figure 2. The corresponding summary receiver operating characteristic (sROC) plots for diagnostic performance are shown in Figure 3. A total of 42 datasets were plotted (data size n = 194,259). The sROC curves were plotted separately for internal validation (black) and external validation (red) to illustrate the distinction between internal and external validation of the DL algorithms in the diagnosis. The internal testing generally showed higher accuracy (higher sensitivity and specificity) than the external testing, which revealed higher variability among the datasets, as shown in the sROC plot (Figure 3).
Table 3.
Summary estimates for overall AI performance in glaucoma detection and subgroup analyses.
Figure 2.
Forest plot for sensitivities and specificities of glaucoma diagnosis with DL using fundus photography [19,21,22,23,24,25,26,27,28,29,30,31,33,37,38,39,40,41,43,44,45,46,49,50,51,52,53,55,58,59].
Figure 3.
Summary receiver operating characteristic (sROC) of all included data using fundus photographs in glaucoma diagnosis. DL: deep learning.
By considering all data for the use of fundus photography in the DL algorithm (Table 3), the pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and area under the receiver operating characteristic curve (AUROC) were 0.92 (95% CI 0.89–0.94), 0.93 (95% CI 0.90–0.95), 12.99 (95% CI 9.23–18.30), 0.09 (95% CI 0.07–0.12), and 0.90 (95% CI 0.88–0.92), respectively. From the internally validated datasets (n = 124,552), the pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC were 0.93 (95% CI 0.91–0.95), 0.95 (95% CI 0.93–0.97), 18.77 (95% CI 13.09–26.91), 0.07 (95% CI 0.05–0.10), and 0.91 (95% CI 0.90–0.93), respectively. The externally validated datasets (n = 69,707) revealed a generally decreased performance as follows: the pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC were 0.86 (95% CI 0.82–0.93), 0.88 (95% CI 0.80–0.93), 7.259 (95% CI 4.14–12.73), 0.13 (95% CI 0.08–0.22), and 0.88 (95% CI 0.86–0.91), respectively.
3.4. Overall Performance of DL Using OCT in Glaucoma Detection
In terms of OCT imaging, the aggregated data (n = 17,882) from all datasets are also shown in Table 3. The forest plots and sROC curves for both internal (black) and external (red) datasets are shown in Figure 4 and Figure 5, respectively. Similar to the fundus photography, the externally tested performance of the AI algorithm using OCT for glaucoma diagnosis has lower overall accuracy compared to the internal testing, as shown in Figure 5.
Figure 4.
Forest plot for sensitivities and specificities of glaucoma diagnosis with DL using OCT imaging [20,32,34,35,36,47,48,54,56,57,60].
Figure 5.
Summary receiver operating characteristic (sROC) of all included data using OCT imaging in glaucoma diagnosis. DL: deep learning.
The pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC for the overall OCT images in glaucoma diagnosis were 0.90 (95% CI 0.84–0.94), 0.87 (95% CI 0.81–0.91), 6.87 (95% CI 4.57–10.33), 0.11 (95% CI 0.07–0.19), and 0.86 (95% CI 0.83–0.90), respectively. The internal testing (n = 15,009) revealed much better diagnostic performance than the external testing. The pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC for the internal testing were 0.93 (95% CI 0.85–0.96), 0.93 (95% CI 0.85–0.96), 8.46 (95% CI 5.23–13.68), 0.08 (95% CI 0.04–0.17), and 0.89 (95% CI 0.85–0.93), respectively. In contrast, the external testing (n = 2873) showed pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC of 0.83 (95% CI 0.73–0.90), 0.80 (95% CI 0.68–0.88), 4.20 (95% CI 2.56–6.89), 0.21 (95% CI 0.13–0.34), and 0.80 (95% CI 0.75–0.86), respectively.
3.5. Overall Performance of DL Using Fundus Photography/OCT in Glaucoma Progression Prediction
Table 4 summarizes the overall performance of both fundus photography and OCT in DL for glaucoma progression predictions. The number of tested eyes in each dataset was used for calculating the performance metrics, instead of data size, ensuring a more accurate representation of the longitudinal nature of the data, as each eye could contribute multiple images for testing. The forest plots and sROC curves for glaucoma progression predictions are shown in Figure 6 and Figure 7, respectively.
Table 4.
Summary estimates for overall AI performance in glaucoma progression prediction and subgroup analyses.
Figure 6.
Forest plot for sensitivities and specificities of glaucoma progression predictions with DL using fundus photography/OCT imaging [37,61,62,63,64,65,66].
Figure 7.
Summary receiver operating characteristic (sROC) of all included data using fundus photography/OCT imaging in progression predictions.
The pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC for all fundus photography (tested patients, n = 841) were 0.89 (95% CI 0.78–0.95), 0.77 (95% CI 0.64–0.87), 3.88 (95% CI 2.31–6.51), 0.14 (95% CI 0.07–0.30), and 0.91 (95% CI 0.85–0.97), respectively. The internally tested datasets (tested patients, n = 375) had a pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC of 0.95 (95% CI 0.82–0.99), 0.88 (95% CI 0.60–0.97), 7.87 (95% CI 1.97–31.48), 0.06 (95% CI 0.02–0.23), and 0.94 (95% CI 0.84–1.00), respectively. Similar to the diagnostic performance, the external testing for progression predictions (tested patients, n = 466) produced a lower performance, with pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC at 0.81 (95% CI 0.77–0.85), 0.67 (95% CI 0.51–0.79), 2.44 (95% CI 2.18–2.73), 0.28 (95% CI 0.23–0.34), and 0.88 (95% CI 0.84–0.92), respectively.
The pooled mean sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and AUROC for OCT in predicting glaucoma progression were 0.74 (95% CI 0.51–0.89), 0.93 (95% CI 0.88–0.96), 10.09 (95% CI 7.60–13.39), 0.28 (95% CI 0.14–0.57), and 0.90 (95% CI 0.84–0.95), respectively. Currently, only the internally tested databases have been published in the literature and are included in this meta-analysis. The pooled specificity is significantly higher than the pooled sensitivity, indicating that the OCT AI algorithm is more effective at correctly identifying patients who will not experience glaucoma progression.
3.6. Comparative Analyses Between Performance Metrics (Sensitivity, Specificity, Positive and Negative Likelihood Ratios, and AUROC)
Each performance metric between the subgroups was statistically compared via DeLong’s test with Bonferroni correction (Supplementary Table S1). From the comparative analysis, fundus imaging DL demonstrated significantly lower sensitivity (p < 0.05) for progression predictions when tested on external datasets compared to its performance in glaucoma diagnosis when tested internally. This highlights the challenge of generalizing progression prediction models, particularly across diverse external databases. Fundus imaging DL also consistently achieved the highest specificity in internal databases, underscoring its robustness in accurately excluding non-glaucomatous cases.
OCT demonstrated a significantly higher positive likelihood ratio compared to fundus imaging in progression predictions. This indicates that OCT was more effective in confirming disease progression when an abnormal test result was observed. The negative likelihood ratio for fundus imaging was consistently lower in both glaucoma diagnosis and progression predictions compared to OCT.
The internally tested fundus imaging showed statistically significant superiority (p < 0.05) in AUROC scores for glaucoma diagnosis compared to the externally tested OCT. This highlights the strength of fundus imaging in achieving high diagnostic accuracy in controlled settings.
4. Discussion
This study evaluated recent performances of DL algorithms in diagnosing glaucoma and predicting glaucoma progression using fundus photography and OCT images. Overall, DL models achieved good diagnostic performance, as evidenced by high sensitivity, specificity, and AUROC. The glaucoma progression prediction test results with DL using fundus and OCT imaging were less robust, probably due to the limited availability of longitudinal data, the complexity of structural features, and challenges in DL training without functional assessment integration. Throughout all studies, the internal testing of both diagnostic and progression prediction DL performance produced better results compared to the external testing.
In comparison to previous reviews, this analysis demonstrated that DL algorithms based on fundus and OCT imaging modalities attained good accuracy in glaucoma diagnosis (AUC 96% and AUC 95%, respectively) and progression prediction (AUC 91% and AUC 90%, respectively). A recent study by Wu et al. [16] on machine learning algorithms showed good performance in glaucoma diagnosis. Although Chaurasia et al. [10] demonstrated in their meta-analysis that AI algorithms for both fundus and OCT imaging modalities showed comparable accuracy (96.2% AUC and 96.0% AUC, respectively) for glaucoma detection, another study by Murtagh et al. [67] reported a lower AUC (92.3%) on OCT images, which was similar to our study.
Sensitivity and specificity are key metrics for evaluating diagnostic algorithms. Sensitivity reflects the ability to identify true positive cases, while specificity measures the accuracy in identifying true negatives. The relative importance of these metrics depends on the clinical context. For glaucoma diagnosis, fundus imaging achieves a sensitivity of 92% and a specificity of 93%, while OCT achieves a sensitivity of 90% and a specificity of 87%. These high values highlight the robustness of the models in detecting glaucoma. However, sensitivity and specificity differ for glaucoma progression predictions. Fundus imaging shows a sensitivity of 89% but a lower specificity of 77%, whereas OCT demonstrates a sensitivity of 74% and a higher specificity of 93%. The higher specificity of OCT in glaucoma progression predictions reflects that the structural markers in OCT are well defined and quantifiable with time, thus making OCT more effective at identifying true negative cases (no progression).
There are several reasons why DL algorithms using OCT showed lower sensitivity, specificity, and overall AUC compared to fundus imaging in diagnosis. Firstly, studies included in this meta-analysis showed that OCT data had comparatively fewer training samples than fundus imaging, which may result in poorer generalization. Additionally, published studies often highlight that while OCT provides valuable depth information for certain aspects of glaucoma assessments, fundus images remain more straightforward for automated detection tasks due to their consistency, simplicity, and greater prevalence in large datasets used for model training [68]. Notably, fundus imaging is widely used in large screening programs, leading to more standardized protocols from which AI models can benefit. However, OCT imaging may not yet have the same level of standardization across different devices and studies, resulting in variability that can challenge model performance [69].
In general, DL models demonstrated less robust performance in predicting glaucoma progression using fundus and OCT compared to their diagnostic capabilities. Predicting disease progression requires large-scale longitudinal data that track patients over time for good accuracy. For the time being, such datasets are still less prevalent, thus limiting the amount of data available for robust model training [70]. Additionally, while longitudinal data are valuable, modeling temporal data often necessitates complex DL architectures, like recurrent neural networks (RNNs) or long short-term memory (LSTM), which could further affect the model’s performance [71]. For instance, Thakur et al. [72] trained a DL algorithm for predicting glaucoma development prior to disease onset using fundus photography from a longitudinal database. However, the longer the time was before disease onset, the less accurate the model was [72].
Additionally, this meta-analysis included studies that primarily trained DL algorithms for progression prediction using fundus and OCT imaging only. Although these modalities are effective in structural assessments and, therefore, glaucoma diagnosis, they may not reflect the functional deterioration associated with glaucoma progression [73]. Notably, additional studies [74,75,76,77] not included in this meta-analysis (due to being excluded for not meeting the outcome-type criteria) involved transforming fundus structural characteristics data into quantitative functional data, such as VF parameters, based on optic disc photographs or OCT images. Other predictors, such as IOP and central corneal thickness (CCT), could potentially enhance the performance and refinement of diagnostic algorithms by providing additional physiological and biomechanical context to the analysis [78,79]. However, these factors were not included as primary components in the training processes of the included studies. Our study results suggest that although fundus photography and OCT imaging alone can yield significant findings, the lack of robustness indicates that integrating functional assessment data, such as VF measurements, may be essential to further enhance DL models for predicting glaucoma progression.
What are the practical steps to enhance DL model performance then? Attention mechanisms can further enhance such frameworks by enabling the model to focus on critical features from each modality [80]. For instance, an attention-based DL model trained on multimodal inputs might prioritize optic nerve head parameters from fundus images and retinal nerve fiber layer thickness from OCT scans while integrating these with VF defect patterns, thus leveraging both structural and functional markers. Additionally, a practical approach involves developing a dual-stream CNN architecture that processes structural imaging data with functional assessments, therefore allowing the model to capture intricate patterns associated with glaucoma more effectively [81].
Based on this meta-analysis, the superior performance of the internal validations compared to the external validations highlights the important aspects of model training and evaluation. With internal validation, certain studies typically involve testing the model on a subset of the same dataset used for training, often split via methods like k-fold cross-validation [21,24,40]. As a result, the data used for the internal testing might share similar characteristics, such as imaging protocols, patient demographics, and disease prevalence, which can lead to inflated performance metrics [82,83].
Certain studies [37,43,45,48,49] in the meta-analysis employed both internal and external testing with different datasets, with generally lower accuracy in external validation. This highlighted that some DL models were prone to overfitting, in which the algorithms performed well on training data because they had learned both signals and noises, but made generalization difficult [84]. However, future studies are likely to include data from multiple sources, imaging devices, and patient populations during their training stage [43]. Factors such as different imaging devices, settings, and patient populations may affect the model’s performance and its ability to generalize during external testing too [85].
It is also important to compare the performances of these DL algorithms with non-AI-based approaches to assess how DL models can complement or enhance conventional diagnostic techniques. Traditional diagnostic approaches, such as visual field testing and optic nerve head evaluation, also show good diagnostic performance but with more variability across studies [86]. These methods typically yield sensitivity values between 0.75 and 0.85, depending on the stage of glaucoma and the quality of the test [87]. They are often time-consuming, require expert knowledge, and are subject to inter-observer variability. In contrast, our study aligns with previous research that has shown how AI-based tools can quickly analyze fundus and OCT images, providing reliable diagnostic results with good sensitivity and specificity and reducing the need for costly and labor-intensive manual analysis [10,16,88].
One of the limitations of our meta-analysis is the heterogeneity among the included and analyzed studies. Variations in DL architectures, imaging protocols, and patient populations from multiple nations across different continents may contribute to variability in DL performance. Nevertheless, the considerable heterogeneity observed was anticipated due to the large number of included studies and the diverse nature of real-world study designs. To account for this variability, a random-effects model was employed in the analysis. While the inclusion of progression prediction studies is a strength of this meta-analysis, the lack of longitudinal datasets and the reliance on imaging alone to capture quantitative characteristics signifying progression may have resulted in less robust findings. To more accurately reflect the state of deep learning applications in glaucoma progression prediction, a greater number of datasets incorporating multimodal data—including visual field assessments—are needed in the literature.
In conclusion, this meta-analysis demonstrates that DL algorithms achieve optimal performance in diagnosing glaucoma and predicting its progression using fundus photography and OCT imaging. However, the results also reflect real-world observations where external testing tends to have lower accuracy than internal testing due to the greater variability in datasets, reference standards, and clinical settings. The challenges for DL algorithms in glaucoma include the need for high-quality annotated datasets for training, the potential for overfitting, and the need for clinical validation in real-world settings. Despite these challenges, this study adds to the growing body of evidence supporting the current integration of AI-based tools to enhance glaucoma diagnosis and progression predictions, though they should not replace clinical judgment.
Supplementary Materials
The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/biomedicines13020420/s1, Supplementary Figure S1: Quality assessment of included studies using the QUADAS-2 tool; Supplementary Figure S2: Graphical presentation of quality assessment using the QUADAS-2 tool; Supplementary Figure S3: Deeks’ funnel plot asymmetry test; Supplementary Table S1: Statistical analyses with DeLong’s test to compare all sensitivities, specificities, likelihood ratios, and AUROCs.
Author Contributions
Conceptualization, X.C.L. and Y.-S.L.; methodology, X.C.L.; software, X.C.L.; validation, X.C.L., H.S.-L.C., and Y.-S.L.; formal analysis, X.C.L.; investigation, X.C.L.; resources, X.C.L.; data curation, X.C.L., H.S.-L.C., and P.-H.Y.; writing—original draft preparation, X.C.L.; writing—review and editing, X.C.L. and Y.-S.L.; visualization, Y.-C.C.; supervision, Y.-S.L.; project administration, Y.-S.L.; funding acquisition, Y.-C.C., C.-Y.H., and S.-C.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Ethical review and approval were waived for this study due to it involving a meta-analysis of previously published data.
Informed Consent Statement
Patient consent was waived due to the study being a meta-analysis of previously published data.
Data Availability Statement
Data sharing is not applicable to this article, as no new data were created in this study. All data supporting the findings are available from the original publications cited in the reference list.
Acknowledgments
This work did not receive funding from any research grant or any sponsoring organization. The study design, data analysis or interpretation, and writing were all completed independently by all the authors.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Allison, K.; Patel, D.; Alabi, O. Epidemiology of Glaucoma: The Past, Present, and Predictions for the Future. Cureus 2020, 12, e11686. [Google Scholar] [CrossRef]
- Jonas, J.B.; Aung, T.; Bourne, R.R.; Bron, A.M.; Ritch, R.; Panda-Jonas, S. Glaucoma. Lancet 2017, 390, 2183–2193. [Google Scholar] [CrossRef] [PubMed]
- Killer, H.E.; Pircher, A. Normal tension glaucoma: Review of current understanding and mechanisms of the pathogenesis. Eye 2018, 32, 924–930. [Google Scholar] [CrossRef] [PubMed]
- Roberti, G.; Michelessi, M.; Tanga, L.; Belfonte, L.; Del Grande, L.M.; Bruno, M.; Oddone, F. Glaucoma Progression Diagnosis: The Agreement between Clinical Judgment and Statistical Software. J. Clin. Med. 2022, 11, 5508. [Google Scholar] [CrossRef]
- Chou, R.; Selph, S.; Blazina, I.; Bougatsos, C.; Jungbauer, R.; Fu, R.; Grusing, S.; Jonas, D.E.; Tehrani, S. Screening for Glaucoma in Adults: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA 2022, 327, 1998–2012. [Google Scholar] [CrossRef]
- Sarker, I.H. Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions. SN Comput. Sci. 2021, 2, 420. [Google Scholar] [CrossRef] [PubMed]
- Gutierrez, A.; Chen, T.C. Artificial intelligence in glaucoma: Posterior segment optical coherence tomography. Curr. Opin. Ophthalmol. 2023, 34, 245–254. [Google Scholar] [CrossRef] [PubMed]
- Akter, N.; Fletcher, J.; Perry, S.; Simunovic, M.P.; Briggs, N.; Roy, M. Glaucoma diagnosis using multi-feature analysis and a deep learning technique. Sci. Rep. 2022, 12, 8064. [Google Scholar] [CrossRef]
- Dixit, A.; Yohannan, J.; Boland, M.V. Assessing Glaucoma Progression Using Machine Learning Trained on Longitudinal Visual Field and Clinical Data. Ophthalmology 2021, 128, 1016–1026. [Google Scholar] [CrossRef] [PubMed]
- Chaurasia, A.K.; Greatbatch, C.J.; Hewitt, A.W. Diagnostic Accuracy of Artificial Intelligence in Glaucoma Screening and Clinical Practice. J. Glaucoma 2022, 31, 285–299. [Google Scholar] [CrossRef]
- Mursch-Edlmayr, A.S.; Ng, W.S.; Diniz-Filho, A.; Sousa, D.C.; Arnold, L.; Schlenker, M.B.; Duenas-Angeles, K.; Keane, P.A.; Crowston, J.G.; Jayaram, H. Artificial Intelligence Algorithms to Diagnose Glaucoma and Detect Glaucoma Progression: Translation to Clinical Practice. Transl. Vis. Sci. Technol. 2020, 9, 55. [Google Scholar] [CrossRef] [PubMed]
- Tonti, E.; Tonti, S.; Mancini, F.; Bonini, C.; Spadea, L.; D’Esposito, F.; Gagliano, C.; Musa, M.; Zeppieri, M. Artificial Intelligence and Advanced Technology in Glaucoma: A Review. J. Pers. Med. 2024, 14, 1062. [Google Scholar] [CrossRef]
- Jan, C.; He, M.; Vingrys, A.; Zhu, Z.; Stafford, R.S. Diagnosing glaucoma in primary eye care and the role of Artificial Intelligence applications for reducing the prevalence of undetected glaucoma in Australia. Eye 2024, 38, 2003–2013. [Google Scholar] [CrossRef] [PubMed]
- Lemij, H.G.; Vente, C.; Sanchez, C.I.; Vermeer, K.A. Characteristics of a Large, Labeled Data Set for the Training of Artificial Intelligence for Glaucoma Screening with Fundus Photographs. Ophthalmol. Sci. 2023, 3, 100300. [Google Scholar] [CrossRef]
- Chen, J.S.; Lin, W.C.; Yang, S.; Chiang, M.F.; Hribar, M.R. Development of an Open-Source Annotated Glaucoma Medication Dataset From Clinical Notes in the Electronic Health Record. Transl. Vis. Sci. Technol. 2022, 11, 20. [Google Scholar] [CrossRef] [PubMed]
- Wu, J.H.; Nishida, T.; Weinreb, R.N.; Lin, J.W. Performances of Machine Learning in Detecting Glaucoma Using Fundus and Retinal Optical Coherence Tomography Images: A Meta-Analysis. Am. J. Ophthalmol. 2022, 237, 1–12. [Google Scholar] [CrossRef]
- Shi, N.N.; Li, J.; Liu, G.H.; Cao, M.F. Artificial intelligence for the detection of glaucoma with SD-OCT images: A systematic review and Meta-analysis. Int. J. Ophthalmol. 2024, 17, 408–419. [Google Scholar] [CrossRef] [PubMed]
- McInnes, M.D.F.; Moher, D.; Thombs, B.D.; McGrath, T.A.; Bossuyt, P.M.; the PRISMA-DTA Group; Clifford, T.; Cohen, J.F.; Deeks, J.J.; Gatsonis, C.; et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement. JAMA 2018, 319, 388–396. [Google Scholar] [CrossRef]
- Al-Aswad, L.A.; Kapoor, R.; Chu, C.K.; Walters, S.; Gong, D.; Garg, A.; Gopal, K.; Patel, V.; Sameer, T.; Rogers, T.W.; et al. Evaluation of a Deep Learning System For Identifying Glaucomatous Optic Neuropathy Based on Color Fundus Photographs. J. Glaucoma 2019, 28, 1029–1034. [Google Scholar] [CrossRef]
- Asaoka, R.; Murata, H.; Hirasawa, K.; Fujino, Y.; Matsuura, M.; Miki, A.; Kanamoto, T.; Ikeda, Y.; Mori, K.; Iwase, A.; et al. Using Deep Learning and Transfer Learning to Accurately Diagnose Early-Onset Glaucoma From Macular Optical Coherence Tomography Images. Am. J. Ophthalmol. 2019, 198, 136–145. [Google Scholar] [CrossRef] [PubMed]
- Bajwa, M.N.; Malik, M.I.; Siddiqui, S.A.; Dengel, A.; Shafait, F.; Neumeier, W.; Ahmed, S. Two-stage framework for optic disc localization and glaucoma classification in retinal fundus images using deep learning. BMC Med. Inform. Decis. Mak. 2019, 19, 136. [Google Scholar] [CrossRef]
- Chakrabarty, L.; Joshi, G.D.; Chakravarty, A.; Raman, G.V.; Krishnadas, S.R.; Sivaswamy, J. Automated Detection of Glaucoma From Topographic Features of the Optic Nerve Head in Color Fundus Photographs. J. Glaucoma 2016, 25, 590–597. [Google Scholar] [CrossRef]
- Chang, J.; Lee, J.; Ha, A.; Han, Y.S.; Bak, E.; Choi, S.; Yun, J.M.; Kang, U.; Shin, I.H.; Shin, J.Y.; et al. Explaining the Rationale of Deep Learning Glaucoma Decisions with Adversarial Examples. Ophthalmology 2021, 128, 78–88. [Google Scholar] [CrossRef]
- Christopher, M.; Belghith, A.; Bowd, C.; Proudfoot, J.A.; Goldbaum, M.H.; Weinreb, R.N.; Girkin, C.A.; Liebmann, J.M.; Zangwill, L.M. Performance of Deep Learning Architectures and Transfer Learning for Detecting Glaucomatous Optic Neuropathy in Fundus Photographs. Sci. Rep. 2018, 8, 16685. [Google Scholar] [CrossRef]
- Civit-Masot, J.M.; Domínguez-Morales, M.J.; Vicente-Díaz, S.; Civit, A. Dual Machine-Learning System to Aid Glaucoma Diagnosis Using Disc and Cup Feature Extraction. IEEE Access 2020, 8, 127519–127529. [Google Scholar] [CrossRef]
- Diaz-Pinto, A.; Morales, S.; Naranjo, V.; Kohler, T.; Mossi, J.M.; Navea, A. CNNs for automatic glaucoma assessment using fundus images: An extensive validation. BioMed. Eng. OnLine 2019, 18, 29. [Google Scholar] [CrossRef] [PubMed]
- Fu, H.; Li, F.; Xu, Y.; Liao, J.; Xiong, J.; Shen, J.; Liu, J.; Zhang, X.; for iChallenge-GON study group. A Retrospective Comparison of Deep Learning to Manual Annotations for Optic Disc and Optic Cup Segmentation in Fundus Photographs. Transl. Vis. Sci. Technol. 2020, 9, 33. [Google Scholar] [CrossRef] [PubMed]
- Gomez-Valverde, J.J.; Anton, A.; Fatti, G.; Liefers, B.; Herranz, A.; Santos, A.; Sanchez, C.I.; Ledesma-Carbayo, M.J. Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning. Biomed. Opt. Express 2019, 10, 892–913. [Google Scholar] [CrossRef]
- Hemelings, R.; Elen, B.; Schuster, A.K.; Blaschko, M.B.; Barbosa-Breda, J.; Hujanen, P.; Junglas, A.; Nickels, S.; White, A.; Pfeiffer, N.; et al. A generalizable deep learning regression model for automated glaucoma screening from fundus images. npj Digit. Med. 2023, 6, 112. [Google Scholar] [CrossRef] [PubMed]
- Kashyap, R.; Nair, R.; Gangadharan, S.M.P.; Botto-Tobar, M.; Farooq, S.; Rizwan, A. Glaucoma Detection and Classification Using Improved U-Net Deep Learning Model. Healthcare 2022, 10, 2497. [Google Scholar] [CrossRef] [PubMed]
- Kausu, T.R.; Gopi, V.P.; Khan; Wahid, A.; Doma, W.; Niwas, S.I. Combination of clinical and multiresolution features for glaucoma detection and its classification using fundus images. Biocybern. Biomed. Eng. 2018, 38, 329–341. [Google Scholar] [CrossRef]
- Kim, K.E.; Kim, J.M.; Song, J.E.; Kee, C.; Han, J.C.; Hyun, S.H. Development and Validation of a Deep Learning System for Diagnosing Glaucoma Using Optical Coherence Tomography. J. Clin. Med. 2020, 9, 2167. [Google Scholar] [CrossRef] [PubMed]
- Ko, Y.C.; Wey, S.Y.; Chen, W.T.; Chang, Y.F.; Chen, M.J.; Chiou, S.H.; Liu, C.J.; Lee, C.Y. Deep learning assisted detection of glaucomatous optic neuropathy and potential designs for a generalizable model. PLoS ONE 2020, 15, e0233079. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.; Kim, Y.; Kim, J.H.; Park, K.H. Screening Glaucoma With Red-free Fundus Photography Using Deep Learning Classifier and Polar Transformation. J. Glaucoma 2019, 28, 258–264. [Google Scholar] [CrossRef]
- Lee, J.; Kim, Y.K.; Park, K.H.; Jeoung, J.W. Diagnosing Glaucoma with Spectral-Domain Optical Coherence Tomography Using Deep Learning Classifier. J. Glaucoma 2020, 29, 287–294. [Google Scholar] [CrossRef]
- Lee, J.; Kim, J.S.; Lee, H.J.; Kim, S.J.; Kim, Y.K.; Park, K.H.; Jeoung, J.W. Discriminating glaucomatous and compressive optic neuropathy on spectral-domain optical coherence tomography with deep learning classifier. Br. J. Ophthalmol. 2020, 104, 1717–1723. [Google Scholar] [CrossRef] [PubMed]
- Li, F.; Su, Y.; Lin, F.; Li, Z.; Song, Y.; Nie, S.; Xu, J.; Chen, L.; Chen, S.; Li, H.; et al. A deep-learning system predicts glaucoma incidence and progression using retinal photographs. J. Clin. Investig. 2022, 132, e157968. [Google Scholar] [CrossRef]
- Li, Z.; Guo, C.; Lin, D.; Nie, D.; Zhu, Y.; Chen, C.; Zhao, L.; Wang, J.; Zhang, X.; Dongye, M.; et al. Deep learning for automated glaucomatous optic neuropathy detection from ultra-widefield fundus images. Br. J. Ophthalmol. 2021, 105, 1548–1554. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Xu, M.; Liu, H.; Li, Y.; Wang, X.; Jiang, L. A Large-Scale Database and a CNN Model for Attention-Based Glaucoma Detection. IEEE Trans. Med. Imaging 2020, 39, 413–424. [Google Scholar] [CrossRef] [PubMed]
- Li, F.; Yan, L.; Wang, Y.; Shi, J.; Chen, H.; Zhang, X.; Jiang, M.; Wu, Z.; Zhou, K. Deep learning-based automated detection of glaucomatous optic neuropathy on color fundus photographs. Graefe’s Arch. Clin. Exp. Ophthalmol. 2020, 258, 851–867. [Google Scholar] [CrossRef]
- Lin, M.; Hou, B.; Liu, L.; Gordon, M.; Kass, M.; Wang, F.; Van Tassel, S.H.; Peng, Y. Automated diagnosing primary open-angle glaucoma from fundus image by simulating human’s grading with deep learning. Sci. Rep. 2022, 12, 14080. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Graham, S.L.; Schulz, A.; Kalloniatis, M.; Zangerl, B.; Cai, W.; Gao, Y.; Chua, B.; Arvind, H.; Grigg, J.; et al. A Deep Learning-Based Algorithm Identifies Glaucomatous Discs Using Monoscopic Fundus Photographs. Ophthalmol. Glaucoma 2018, 1, 15–22. [Google Scholar] [CrossRef]
- Liu, H.; Li, L.; Wormstone, I.M.; Qiao, C.; Zhang, C.; Liu, P.; Li, S.; Wang, H.; Mou, D.; Pang, R.; et al. Development and Validation of a Deep Learning System to Detect Glaucomatous Optic Neuropathy Using Fundus Photographs. JAMA Ophthalmol. 2019, 137, 1353–1360. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Hong, J.; Lu, X.; Jia, X.; Lin, Z.; Zhou, Y.; Liu, Y.; Zhang, H. Joint optic disc and cup segmentation using semi-supervised conditional GANs. Comput. Biol. Med. 2019, 115, 103485. [Google Scholar] [CrossRef]
- MacCormick, I.J.C.; Williams, B.M.; Zheng, Y.; Li, K.; Al-Bander, B.; Czanner, S.; Cheeseman, R.; Willoughby, C.E.; Brown, E.N.; Spaeth, G.L.; et al. Accurate, fast, data efficient and interpretable glaucoma diagnosis with automated spatial analysis of the whole cup to disc profile. PLoS ONE 2019, 14, e0209409. [Google Scholar] [CrossRef]
- Martins, J.; Cardoso, J.S.; Soares, F. Offline computer-aided diagnosis for Glaucoma detection using fundus images targeted at mobile devices. Comput. Methods Programs Biomed. 2020, 192, 105341. [Google Scholar] [CrossRef] [PubMed]
- Medeiros, F.A.; Jammal, A.A.; Thompson, A.C. From Machine to Machine: An OCT-Trained Deep Learning Algorithm for Objective Quantification of Glaucomatous Damage in Fundus Photographs. Ophthalmology 2019, 126, 513–521. [Google Scholar] [CrossRef] [PubMed]
- Noury, E.; Mannil, S.S.; Chang, R.T.; Ran, A.R.; Cheung, C.Y.; Thapa, S.S.; Rao, H.L.; Dasari, S.; Riyazuddin, M.; Chang, D.; et al. Deep Learning for Glaucoma Detection and Identification of Novel Diagnostic Areas in Diverse Real-World Datasets. Transl. Vis. Sci. Technol. 2022, 11, 11. [Google Scholar] [CrossRef] [PubMed]
- Phene, S.; Dunn, R.C.; Hammel, N.; Liu, Y.; Krause, J.; Kitade, N.; Schaekermann, M.; Sayres, R.; Wu, D.J.; Bora, A.; et al. Deep Learning and Glaucoma Specialists: The Relative Importance of Optic Disc Features to Predict Glaucoma Referral in Fundus Photographs. Ophthalmology 2019, 126, 1627–1639. [Google Scholar] [CrossRef]
- Raghavendra, U.; Gudigar, A.; Bhandary, S.V.; Rao, T.N.; Ciaccio, E.J.; Acharya, U.R. A Two Layer Sparse Autoencoder for Glaucoma Identification with Fundus Images. J. Med. Syst. 2019, 43, 299. [Google Scholar] [CrossRef] [PubMed]
- Ran, A.R.; Cheung, C.Y.; Wang, X.; Chen, H.; Luo, L.Y.; Chan, P.P.; Wong, M.O.M.; Chang, R.T.; Mannil, S.S.; Young, A.L.; et al. Detection of glaucomatous optic neuropathy with spectral-domain optical coherence tomography: A retrospective training and validation deep-learning analysis. Lancet Digit. Health 2019, 1, e172–e182. [Google Scholar] [CrossRef]
- Rogers, T.W.; Jaccard, N.; Carbonaro, F.; Lemij, H.G.; Vermeer, K.A.; Reus, N.J.; Trikha, S. Evaluation of an AI system for the automated detection of glaucoma from stereoscopic optic disc photographs: The European Optic Disc Assessment Study. Eye 2019, 33, 1791–1797. [Google Scholar] [CrossRef] [PubMed]
- Shibata, N.; Tanito, M.; Mitsuhashi, K.; Fujino, Y.; Matsuura, M.; Murata, H.; Asaoka, R. Development of a deep residual learning algorithm to screen for glaucoma from fundus photography. Sci. Rep. 2018, 8, 14665. [Google Scholar] [CrossRef] [PubMed]
- Singh, L.K.; Pooja; Garg, H.; Khanna, M. An Artificial Intelligence-Based Smart System for Early Glaucoma Recognition Using OCT Images. Int. J. E-Health Med. Commun. 2021, 12, 32–59. [Google Scholar] [CrossRef]
- Soorya, M.; Issac, A.; Dutta, M.K. Automated Framework for Screening of Glaucoma Through Cloud Computing. J. Med. Syst. 2019, 43, 136. [Google Scholar] [CrossRef] [PubMed]
- Thompson, A.C.; Jammal, A.A.; Berchuck, S.I.; Mariottoni, E.B.; Medeiros, F.A. Assessment of a Segmentation-Free Deep Learning Algorithm for Diagnosing Glaucoma From Optical Coherence Tomography Scans. JAMA Ophthalmol. 2020, 138, 333–339. [Google Scholar] [CrossRef]
- Wu, C.W.; Chen, H.Y.; Chen, J.Y.; Lee, C.H. Glaucoma Detection Using Support Vector Machine Method Based on Spectralis OCT. Diagnostics 2022, 12, 391. [Google Scholar] [CrossRef] [PubMed]
- Xu, Y.; Hu, M.; Liu, H.; Yang, H.; Wang, H.; Lu, S.; Liang, T.; Li, X.; Xu, M.; Li, L.; et al. A hierarchical deep learning approach with transparency and interpretability based on small samples for glaucoma diagnosis. npj Digit. Med. 2021, 4, 48. [Google Scholar] [CrossRef]
- Yang, H.K.; Kim, Y.J.; Sung, J.Y.; Kim, D.H.; Kim, K.G.; Hwang, J.M. Efficacy for Differentiating Nonglaucomatous Versus Glaucomatous Optic Neuropathy Using Deep Learning Systems. Am. J. Ophthalmol. 2020, 216, 140–146. [Google Scholar] [CrossRef]
- Zheng, C.; Xie, X.; Huang, L.; Chen, B.; Yang, J.; Lu, J.; Qiao, T.; Fan, Z.; Zhang, M. Detecting glaucoma based on spectral domain optical coherence tomography imaging of peripapillary retinal nerve fiber layer: A comparison study between hand-crafted features and deep learning model. Graefes Arch. Clin. Exp. Ophthalmol. 2020, 258, 577–585. [Google Scholar] [CrossRef]
- Brown, A.; Cousins, H.; Cousins, C.; Esquenazi, K.; Elze, T.; Harris, A.; Filipowicz, A.; Barna, L.; Yonwook, K.; Vinod, K.; et al. Deep Learning for Localized Detection of Optic Disc Hemorrhages. Am. J. Ophthalmol. 2023, 255, 161–169. [Google Scholar] [CrossRef] [PubMed]
- Ha, A.; Sun, S.; Kim, Y.K.; Jeoung, J.W.; Kim, H.C.; Park, K.H. Deep-learning-based prediction of glaucoma conversion in normotensive glaucoma suspects. Br. J. Ophthalmol. 2024, 108, 927–932. [Google Scholar] [CrossRef]
- Hussain, S.; Chua, J.; Wong, D.; Lo, J.; Kadziauskiene, A.; Asoklis, R.; Barbastathis, G.; Schmetterer, L.; Yong, L. Predicting glaucoma progression using deep learning framework guided by generative algorithm. Sci. Rep. 2023, 13, 19960. [Google Scholar] [CrossRef]
- Mandal, S.; Jammal, A.A.; Malek, D.; Medeiros, F.A. Progression or Aging? A Deep Learning Approach for Distinguishing Glaucoma Progression From Age-Related Changes in OCT Scans. Am. J. Ophthalmol. 2024, 266, 46–55. [Google Scholar] [CrossRef] [PubMed]
- Mariottoni, E.B.; Datta, S.; Shigueoka, L.S.; Jammal, A.A.; Tavares, I.M.; Henao, R.; Carin, L.; Medeiros, F.A. Deep Learning-Assisted Detection of Glaucoma Progression in Spectral-Domain OCT. Ophthalmol. Glaucoma 2023, 6, 228–238. [Google Scholar] [CrossRef] [PubMed]
- Medeiros, F.A.; Jammal, A.A.; Mariottoni, E.B. Detection of Progressive Glaucomatous Optic Nerve Damage on Fundus Photographs with Deep Learning. Ophthalmology 2021, 128, 383–392. [Google Scholar] [CrossRef]
- Murtagh, P.; Greene, G.; O’Brien, C. Current applications of machine learning in the screening and diagnosis of glaucoma: A systematic review and Meta-analysis. Int. J. Ophthalmol. 2020, 13, 149–162. [Google Scholar] [CrossRef]
- Ran, A.R.; Tham, C.C.; Chan, P.P.; Cheng, C.Y.; Tham, Y.C.; Rim, T.H.; Cheung, C.Y. Deep learning in glaucoma with optical coherence tomography: A review. Eye 2021, 35, 188–201. [Google Scholar] [CrossRef]
- Mirzania, D.; Thompson, A.C.; Muir, K.W. Applications of deep learning in detection of glaucoma: A systematic review. Eur. J. Ophthalmol. 2021, 31, 1618–1642. [Google Scholar] [CrossRef]
- Thakur, S.; Dinh, L.L.; Lavanya, R.; Quek, T.C.; Liu, Y.; Cheng, C.Y. Use of artificial intelligence in forecasting glaucoma progression. Taiwan J. Ophthalmol. 2023, 13, 168–183. [Google Scholar] [CrossRef] [PubMed]
- Cascarano, A.; Mur-Petit, J.; Hernández-González, J.; Camacho, M.; de Toro Eadie, N.; Gkontra, P.; Chadeau-Hyam, M.; Vitrià, J.; Lekadir, K. Machine and deep learning for longitudinal biomedical data: A review of methods and applications. Artif. Intell. Rev. 2023, 56 (Suppl. S2), 1711–1771. [Google Scholar] [CrossRef]
- Thakur, A.; Goldbaum, M.; Yousefi, S. Predicting Glaucoma before Onset Using Deep Learning. Ophthalmol. Glaucoma 2020, 3, 262–268. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Tang, L.; Xia, M.; Cao, G. The application of artificial intelligence in glaucoma diagnosis and prediction. Front. Cell Dev. Biol. 2023, 11, 1173094. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.; Kim, Y.W.; Ha, A.; Kim, Y.K.; Park, K.H.; Choi, H.J.; Jeoung, J.W. Estimating visual field loss from monoscopic optic disc photography using deep learning model. Sci. Rep. 2020, 10, 21052. [Google Scholar] [CrossRef] [PubMed]
- Xu, L.; Asaoka, R.; Kiwaki, T.; Murata, H.; Fujino, Y.; Matsuura, M.; Hashimoto, Y.; Asano, S.; Miki, A.; Mori, K.; et al. Predicting the Glaucomatous Central 10-Degree Visual Field From Optical Coherence Tomography Using Deep Learning and Tensor Regression. Am. J. Ophthalmol. 2020, 218, 304–313. [Google Scholar] [CrossRef]
- Shin, J.; Kim, S.; Kim, J.; Park, K. Visual Field Inference From Optical Coherence Tomography Using Deep Learning Algorithms: A Comparison Between Devices. Transl. Vis. Sci. Technol. 2021, 10, 4. [Google Scholar] [CrossRef] [PubMed]
- Kim, D.; Seo, S.B.; Park, S.J.; Cho, H.K. Deep learning visual field global index prediction with optical coherence tomography parameters in glaucoma patients. Sci. Rep. 2023, 13, 18304. [Google Scholar] [CrossRef] [PubMed]
- Vinciguerra, R.; Rehman, S.; Vallabh, N.A.; Batterbury, M.; Czanner, G.; Choudhary, A.; Cheeseman, R.; Elsheikh, A.; Willoughby, C.E. Corneal biomechanics and biomechanically corrected intraocular pressure in primary open-angle glaucoma, ocular hypertension and controls. Br. J. Ophthalmol. 2020, 104, 121–126. [Google Scholar] [CrossRef]
- Kaushik, S.; Pandav, S.S.; Banger, A.; Aggarwal, K.; Gupta, A. Relationship between corneal biomechanical properties, central corneal thickness, and intraocular pressure across the spectrum of glaucoma. Am. J. Ophthalmol. 2012, 153, 840–849.e2. [Google Scholar] [CrossRef]
- Wang, S.; He, X.; Jian, Z.; Li, J.; Xu, C.; Chen, Y.; Liu, Y.; Chen, H.; Huang, C.; Hu, J.; et al. Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: A review. Eye Vis. 2024, 11, 38. [Google Scholar] [CrossRef]
- Kihara, Y.; Montesano, G.; Chen, A.; Amerasinghe, N.; Dimitriou, C.; Jacob, A.; Chabi, A.; Crabb, D.P.; Lee, A.Y. Policy-Driven, Multimodal Deep Learning for Predicting Visual Fields from the Optic Disc and OCT Imaging. Ophthalmology 2022, 129, 781–791. [Google Scholar] [CrossRef] [PubMed]
- Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018, 319, 1317–1318. [Google Scholar] [CrossRef]
- Van Calster, B.; Steyerberg, E.W.; Wynants, L.; van Smeden, M. There is no such thing as a validated prediction model. BMC Med. 2023, 21, 70. [Google Scholar] [CrossRef] [PubMed]
- Montesinos López, O.A.; Montesinos López, A.; Crossa, J. Chapter 4, Overfitting, Model Tuning, and Evaluation of Prediction Performance. In Multivariate Statistical Machine Learning Methods for Genomic Prediction; Springer: Cham, Switzerland, 2022. [Google Scholar]
- Arora, A.; Alderman, J.E.; Palmer, J.; Ganapathi, S.; Laws, E.; McCradden, M.D.; Oakden-Rayner, L.; Pfohl, S.R.; Ghassemi, M.; McKay, F.; et al. The value of standards for health datasets in artificial intelligence-based applications. Nat. Med. 2023, 29, 2929–2938. [Google Scholar] [CrossRef]
- Sihota, R.; Sidhu, T.; Dada, T. The role of clinical examination of the optic nerve head in glaucoma today. Curr. Opin. Ophthalmol. 2021, 32, 83–91. [Google Scholar] [CrossRef] [PubMed]
- Michelessi, M.; Lucenteforte, E.; Oddone, F.; Brazzelli, M.; Parravano, M.; Franchi, S.; Ng, S.M.; Virgili, G. Optic nerve head and fibre layer imaging for diagnosing glaucoma. Cochrane Database Syst. Rev. 2015, 2015, CD008803. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Wang, L.; Wu, X.; Jiang, J.; Qiang, W.; Xie, H.; Zhou, H.; Wu, S.; Shao, Y.; Chen, W. Artificial intelligence in ophthalmology: The path to the real-world clinic. Cell Rep. Med. 2023, 4, 101095. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).






