Next Article in Journal
Comparison of Distortion-Product Otoacoustic Emissions Measured in the Same Subjects Using Four Commercial Systems
Previous Article in Journal
Preoperative Radiographic Thoracic Kyphosis Relates to Scapular Internal Rotation but Not Anterior Tilt in Candidates for Reverse Shoulder Arthroplasty: A Retrospective Radiographic Analysis from the FP-UCBM Shoulder Study Group
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Diagnostic Accuracy of Artificial Intelligence in Predicting Anti-VEGF Treatment Response in Diabetic Macular Edema: A Systematic Review and Meta-Analysis

by
Faisal A. Al-Harbi
1,*,
Mohanad A. Alkuwaiti
2,
Meshari A. Alharbi
1,
Ahmed A. Alessa
3,
Ajwan A. Alhassan
4,
Elan A. Aleidan
1,
Fatimah Y. Al-Theyab
1,
Mohammed Alfalah
4,
Sajjad M. AlHaddad
5 and
Ahmed Y. Azzam
6,7
1
College of Medicine, Qassim University, Buraydah 51452, Qassim, Saudi Arabia
2
College of Medicine, Imam Abdulrahman Bin Faisal University, Dammam 31441, Ash-Sharqīyah, Saudi Arabia
3
College of Medicine, King Abdulaziz University, Jeddah 22254, Mecca, Saudi Arabia
4
College of Medicine, King Faisal University, Hofuf 31982, Alhssa, Saudi Arabia
5
Diabetes Fellow, Department of Endocrinology and Diabetes, King Fahad Medical City (KFMC), Riyadh 12231, Riyadh, Saudi Arabia
6
Department of the Clinical Research and Clinical Artificial Intelligence, ASIDE Healthcare, Lewes, DE 19958, USA
7
Division of Global Health and Public Health, School of Nursing, Midwifery and Public Health, University of Suffolk, Ipswich IP4 1QJ, Suffolk, UK
*
Author to whom correspondence should be addressed.
J. Clin. Med. 2025, 14(22), 8177; https://doi.org/10.3390/jcm14228177 (registering DOI)
Submission received: 8 September 2025 / Revised: 10 October 2025 / Accepted: 14 October 2025 / Published: 18 November 2025
(This article belongs to the Section Ophthalmology)

Abstract

Background/Objectives: Diabetic macular edema (DME) is a leading cause of vision loss in diabetic patients, with anti-vascular endothelial growth factor (anti-VEGF) therapy being the standard management. However, treatment response varies significantly among patients, necessitating predictive tools. This systematic review and meta-analysis evaluated the diagnostic accuracy of artificial intelligence (AI) models in predicting anti-VEGF treatment response in DME patients. Methods: We conducted a dedicated literature review following PRISMA 2020 guidelines, searching PubMed, Web of Science, Embase, Scopus, and Cochrane Library databases from inception up to 30 September 2025. Studies evaluating AI-based prediction models for anti-VEGF response in DME patients were included. The primary outcomes were sensitivity, specificity, and area under the receiver operating characteristic curve (AUC). A bivariate random-effects meta-analysis was performed using available diagnostic accuracy data. Results: From 3107 participants across 18 studies, six studies with 427 participants provided complete diagnostic accuracy data for meta-analysis. The pooled sensitivity was 86.4% (95% CI: 82.1–90.1%) and the specificity was 77.6% (95% CI: 72.8–82.0%). The summary AUC was 0.89 with a diagnostic odds ratio of 22.0 (95% CI: 12.8–37.9). AI models demonstrated superior performance compared to other methods in 87.5% of comparative studies. Moderate heterogeneity was observed (I2 = 45.2%). Conclusions: AI models demonstrate good diagnostic accuracy for predicting anti-VEGF treatment response in DME patients, with a promising role for improving personalized management strategies and improved outcomes.

1. Introduction

Diabetic macular edema (DME) is a microvascular complication of diabetes mellitus and represents one of the most common causes of visual impairment and blindness in working-age adults all over the world. DME affects around 7.5% of patients with diabetes and is characterized by fluid accumulation in the macular region due to the breakdown of the blood–retinal barrier, leading to central vision loss and significant functional disability [1,2,3].
Anti-vascular endothelial growth factor (anti-VEGF) therapy has improved DME management and is currently the first-line treatment according to several guidelines. Ranibizumab, aflibercept, and bevacizumab have demonstrated significant efficacy in improving visual acuity and reducing central macular thickness. However, the data from previous studies reveals significant heterogeneity in treatment response, with around 30% to 40% of patients showing suboptimal responses to anti-VEGF therapy. This variability presents significant challenges for us in treatment planning and patient counseling, while also making economic burdens on healthcare systems due to the high cost of anti-VEGF medications and frequent monitoring requirements [4,5,6,7,8,9,10,11].
The ability to predict treatment response before initiating therapy would represent a significant advancement in DME management, allowing for more personalized treatment strategies, optimizing resource allocation, and improving patient outcomes. The currently utilized predictors for management, including baseline visual acuity, central macular thickness, and demographic factors, have shown limited predictive accuracy and insufficient reliability in several patients [2,12,13,14,15,16,17].
Artificial intelligence (AI), including machine learning and deep learning approaches, have emerged as promising technical solutions for predictive tasks, offering the promise of identifying complex findings in multimodal data that may not be apparent to human observers. In ophthalmology, AI has demonstrated significant success in various diagnostic and prognostic applications, including diabetic retinopathy screening, glaucoma detection, and age-related macular degeneration classification. Recent studies have begun investigating AI applications for predicting anti-VEGF treatment response in DME, utilizing various data inputs including optical coherence tomography (OCT) images, fundus photographs, and additional clinical parameters [18,19,20,21,22,23].
Despite growing interest in this field, the diagnostic accuracy and clinical utility of AI-based prediction models for anti-VEGF response in DME remain heterogeneous. Previous studies have reported varying methodologies, different outcome definitions, and various performance metrics, making it difficult to estimate the overall evidence and possibility for implementation [14,17,24].
Therefore, we conducted a systematic review and meta-analysis to evaluate the diagnostic accuracy of AI models in predicting anti-VEGF treatment response in DME patients, assess the quality of available evidence, and develop recommendations and future directions for practice and upcoming studies.

2. Methods

2.1. Study Design and Registration

This systematic review and meta-analysis was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines [25]. The study protocol was developed a priori and registered with PROSPERO using the following identification code: CRD420251054631.

2.2. Search Strategy

A literature search was conducted across multiple electronic databases including PubMed, Web of Science, Embase, Scopus, and the Cochrane Library from inception up to 30 September 2025. The search strategy included the following terms and keywords: (“Diabetic Macular Edema” OR “Diabetic Maculopathy” OR “DME”) AND (“Artificial Intelligence” OR “Machine Learning” OR “Deep Learning” OR “Neural Network*”) AND (“Anti-VEGF” OR “Vascular Endothelial Growth Factor” OR “Ranibizumab” OR “Aflibercept” OR “Bevacizumab”). Reference lists of included studies and relevant review articles were manually screened to identify any additional eligible studies. No language restrictions were applied during the initial search; however, only English-language studies were included in our study.

2.3. Study Selection Criteria

Studies were included if they met the following criteria: participants diagnosed with DME receiving anti-VEGF therapy; utilization of AI-based models for predicting treatment response; reporting of diagnostic accuracy metrics including sensitivity, specificity, positive predictive value, negative predictive value, or area under the receiver operating characteristic curve (AUC); use of imaging data such as OCT or fundus photography, clinical variables, or multimodal data as AI model inputs; and study designs including randomized controlled trials (RCTs), prospective cohort studies, or retrospective cohort studies. Studies were excluded if they included non-human subjects; focused on retinal diseases other than DME without DME-specific data; used non-AI-based prediction models; were case reports, editorials, reviews, or conference abstracts without full-text availability; or lacked sufficient data for diagnostic accuracy assessment to be further extracted according to our study aims and goals.

2.4. Data Extraction and Management

The extracted data included study characteristics such as author information, publication year, country, study design, and sample size; participant demographics including age, gender, and baseline characteristics; intervention details including AI model type, input data modality, and training methodology; anti-VEGF agent specifications and dosing regimens; outcome measures and response definitions; diagnostic accuracy metrics; and follow-up duration.

2.5. Quality Assessment

A risk of bias assessment was conducted using the Prediction model Risk Of Bias ASsessment Tool for AI (PROBAST-AI) framework, which is specifically designed for evaluating AI-based prediction models. PROBAST-AI assesses four key domains: (1) participants and data sources, evaluating patient selection, data source representativeness, and inclusion/exclusion criteria; (2) predictors, assessing predictor definition, measurement consistency, and timing relative to outcome; (3) outcome, evaluating outcome definition, measurement methods, blinding, and timing of assessment; and (4) analysis, evaluating sample size adequacy, handling of missing data, model complexity relative to sample size, risk of overfitting, validation strategy, and potential data leakage. Each study was rated as having a low, unclear, or high risk of bias for each domain, with an overall risk determination based on the highest domain-specific risk. Special attention was given to AI-specific concerns including both-eyes inclusion without statistical adjustment for clustering, small test sets relative to model parameters, absence of external validation, data-driven outcome optimization, and implausibly high performance metrics suggesting overfitting or methodological issues.

2.6. Statistical Analysis

A meta-analysis was performed using a bivariate random-effects model to account for correlation between sensitivity and specificity while accounting for the between-study heterogeneity. The primary outcomes included pooled sensitivity and specificity with 95% confidence intervals (CIs), summary AUC characteristic curves, diagnostic odds ratios (ORs), and positive and negative likelihood ratios. Heterogeneity was assessed using the I2 statistic and Cochran’s Q test, with values exceeding 50% considered indicative of significant heterogeneity. Subgroup analyses were conducted based on AI model type, input data modality, study quality, and follow-up duration. Publication bias was assessed using the visual inspection of funnel plots of asymmetry, Egger’s regression test, and trim-and-fill methods. Meta-regression was performed to explore sources of heterogeneity and identify factors associated with diagnostic performance. The statistical analyses were conducted using RStudio with R version 4.4.2, with statistical significance set at a p-value less than 0.05.

3. Results

3.1. Literature Search and Study Selection

After removing duplicates and applying inclusion and exclusion criteria through title, abstract, and full-text screening, 18 studies met the eligibility criteria for qualitative synthesis, with six studies providing complete diagnostic accuracy data suitable for quantitative synthesis (Figure 1). The selected studies included a total of 3107 participants, with sample sizes ranging from 12 to 712 participants across individual studies.

3.2. Study Characteristics and Population Demographics

The characteristics of the included studies are summarized in Table 1. The 18 studies were published between 2020 and 2025, with the majority being retrospective cohort studies (n = 16, 89%), one RCT, and one cross-sectional study. Most of the included studies were from Asia (n = 11, 61%), followed by North America (n = 4, 22%), the Middle East (n = 2, 11%), and multi-regional collaborations (n = 2, 11%). The mean age of participants ranged from 54 to 63 years across studies where reported, with a balanced gender distribution. Various AI model types were utilized, including deep learning approaches such as convolutional neural networks (n = 8, 44%), other machine learning methods (n = 6, 33%), and hybrid approaches combining multiple techniques (n = 4, 22%). Input data modalities varied, with OCT-only approaches being most common (n = 10, 56%), followed by multimodal approaches combining OCT with clinical data (n = 5, 28%) and radiomics-based methods (n = 3, 17%).

3.3. Treatment Protocols and Individual Study Performance

The treatment protocols and response definitions varied across studies, as detailed in Table 2. Anti-VEGF agents included ranibizumab, aflibercept, bevacizumab, brolucizumab, and conbercept, with dosing regimens ranging from single injections to monthly protocols and treat-and-extend approaches. Response definitions included various criteria such as central macular thickness reduction thresholds (ranging from >10 μm to >50 μm), visual acuity improvements (≥5 ETDRS letters or >0.1 LogMAR), and composite outcomes combining anatomical and functional measures. Among the six studies providing complete diagnostic accuracy data, the individual study sensitivity ranged from 78.9% to 100.0%, while the specificity ranged from 68.8% to 92.6%. The AUC, when reported, varied from 0.810 to 0.998.

3.4. Pooled Diagnostic Accuracy and Subgroup Analyses

A meta-analysis of six studies including a total of 427 participants revealed a significant diagnostic performance, as presented in Table 3. The pooled sensitivity was 86.4% (95% CI: 82.1–90.1%) and pooled specificity was 77.6% (95% CI: 72.8–82.0%). The AUC was 0.89, with a diagnostic odds ratio of 22.0 (95% CI: 12.8–37.9). The positive likelihood ratio was 3.86 (95% CI: 2.95–5.07) and negative likelihood ratio was 0.18 (95% CI: 0.13–0.24). Subgroup analyses revealed significant differences in performance based on AI model type (p = 0.012), with hybrid deep learning approaches achieving 100.0% sensitivity and 75.0% specificity, followed by machine learning methods (90.7% sensitivity, 80.4% specificity) and pure deep learning approaches (81.8% sensitivity, 76.8% specificity). Input data modality showed trends toward superior performance with multimodal approaches (94.1% sensitivity, 76.5% specificity) compared to OCT-only methods (84.5% sensitivity, 79.6% specificity); however, this difference was not statistically significant (p-value = 0.224). Follow-up duration significantly impacted the performance, with studies having longer follow-up periods over three months demonstrating higher sensitivity with 94.1% compared to shorter follow-up studies.
The plotting for sensitivity and specificity is illustrated in Figure 2. Individual study estimates showed significant variation, with a 95% CI reflecting sample size differences across studies. The pooled estimates demonstrated good precision, with relatively narrow 95% CI supporting the reliability of the findings. Visual inspection revealed that most individual studies clustered around the pooled estimate, with Mondal et al. (2025) [16] showing the highest sensitivity at 100%, but with a wider 95% CI due to the smaller sample size, while Magrath et al. (2025) [28] showed more conservative estimates with a tighter 95% CI reflecting the larger sample size.

3.5. Summary ROC

The summary AUC is presented in Figure 3, demonstrating excellent discriminative ability with an AUC of 0.89 (95% CI: 0.84–0.93). The summary point, representing the pooled sensitivity and specificity estimates, was positioned in the upper left quadrant of the ROC space, indicating good diagnostic performance. The bivariate model indicated moderate correlation between sensitivity and specificity (ρ = 0.34), justifying the use of the bivariate random effects.
Figure 4 presents the bivariate performance assessment, showing the relationship between sensitivity and specificity across different AI model types and sample sizes. Studies were distributed across performance zones, with three studies achieving excellent performance (upper left quadrant) and three studies in the good performance zone. It revealed that larger studies tended to provide more conservative estimates, while smaller studies showed greater variability. Deep learning approaches were represented across all performance zones, while machine learning and hybrid approaches showed more concentrated performance patterns. The mean distance from the origin was 0.65, indicating generally good diagnostic performance across the included studies.

3.6. Heterogeneity Assessment and Meta-Regression

The heterogeneity assessment and meta-regression results are detailed in Table 4 and visualized in Supplementary Figure S1. The meta-regression model identified several significant predictors of diagnostic performance variability. Outcome definition type emerged as a significant predictor (p-value = 0.012), with composite outcomes achieving higher sensitivity (100.0%) compared to anatomical-only definitions (83.9%). Follow-up duration also significantly impacted performance (p-value = 0.045), with studies using longer follow-up periods (>three months) demonstrating superior sensitivity (94.1%) compared to shorter durations (≤one month: 78.9%). Input data modality showed a trend toward improved performance with multimodal approaches (94.1% sensitivity) versus OCT-only methods (84.5% sensitivity), though this difference did not reach statistical significance (p-value = 0.224). Sample size showed a weak positive association (β = 0.003, p-value = 0.128), suggesting that larger studies provide more stable estimates.

3.7. Comparative Effectiveness Analysis

The comparative effectiveness of AI approaches versus alternative methods is summarized in Table 5. Eight studies provided direct comparisons between AI models and control methods, including a total of 1107 subjects. AI approaches consistently demonstrated superior performance, with 87.5% of comparative studies favoring AI over control methods. When compared to human readers, AI models achieved 85.4% sensitivity and 84.5% specificity versus 68.2% sensitivity and 76.9% specificity for human ophthalmologists and residents (p-value < 0.05). Against other algorithmic methods, AI showed even greater advantage, with 88.8% sensitivity and 80.4% specificity compared to 82.6% sensitivity and 63.6% specificity for other approaches. The mean AUC improvement was 0.089, representing a significant improvement in diagnostic accuracy. Cost-effectiveness evaluation revealed possible cost savings of 15–30% through reduced injection frequency and improved resource utilization, with time savings ranging from 40 to 60% in image analysis and real-world practice settings.

3.8. Sensitivity Analysis and Publication Bias Assessment

Comprehensive sensitivity analyses and publication bias evaluation are presented in Table 6. The leave-one-out assessment demonstrated the stability of the results, with pooled sensitivity estimates ranging from 85.0% to 88.5% and specificity from 75.2% to 78.6% when individual studies were sequentially excluded. Excluding studies with a high risk of bias improved the pooled estimates to 88.5% sensitivity (95% CI: 84.1–92.9%) and 78.6% specificity (95% CI: 72.1–85.1%). Studies with external validation demonstrated superior performance (91.7% sensitivity) compared to those without external validation (81.8% sensitivity, p-value = 0.034). The publication bias assessment revealed mixed findings, with Egger’s regression test suggesting possible small study effects (p-value = 0.045), while Begg’s rank correlation showed no significant bias (p-value = 0.280). The trim-and-fill adjustment method indicated the minimal impact of the possible risk of publication bias, with adjusted estimates showing only marginal changes (sensitivity: 85.1%, specificity: 76.8%). The failsafe N analysis suggested that 15 additional negative studies would be required to nullify the observed effect, indicating significant evidence.

3.9. Risk of Bias, Evidence Quality Assessment, and Assessment of Clustering

A risk of bias assessment of the included studies was conducted using PROBAST-AI and is detailed in Supplementary Table S1. Among the 18 included studies, 2 (11.1%) were classified as having a low risk of bias, both utilizing large, well-conducted datasets with solid methodologies (Cao et al. 2020, Roberts et al. 2020) [40,42]. Nine studies (50.0%) were assessed as having an unclear risk of bias, mainly due to the insufficient reporting of patient selection methods, validation approaches, or a possible underlying risk of data leakage concerns. Seven studies (38.9%) were classified as having a high risk of bias due to small sample sizes relative to model complexity, significantly high performance metrics suggesting overfitting (e.g., AUC > 0.99), lack of statistical adjustment for clustering when both eyes were included, or outcome definitions derived through data-driven optimization rather than pre-specified clinical criteria. Common methodological concerns across studies included the absence of external validation (47% of studies), both-eyes inclusion without clustering adjustment (39% of studies), and small test sets relative to model parameters (33% of studies). The GRADE evidence quality assessment for diagnostic test accuracy is presented in Supplementary Table S2. The overall quality of evidence was rated as moderate (⊕⊕⊕○), with downgrades for risk of bias (−1), inconsistency (−1), and publication bias (−1), but upgrades for large effect size (+1) and dose–response gradient (+1).
The assessment of the clustering effects from within-patient correlation is detailed in Supplementary Table S3. Among the 18 included studies, 7 (38.9%) included both eyes from individual patients without statistical adjustment for clustering, introducing a possible risk of bias through inflated precision and underestimated standard errors. The calculated design effect ranged from 1.38 to 1.89 for studies where sufficient data allowed for estimation, indicating a moderate to significant correlation between paired eyes. Four studies (22.2%) included only one eye per patient, eliminating clustering concerns. For the remaining seven studies (38.9%), clustering status was unclear due to insufficient reporting. Among the six studies included in the meta-analysis, three (50%) had confirmed or probable clustering without adjustment (Magrath et al., Baek et al., Meng et al.) [28,31,34], while two studies (33.3%) had no clustering concerns (Mondal et al., Song et al.) [16,29]. The clustering status for two meta-analysis studies (Cao et al., Rasti et al.) [40,41] remained unclear.

3.10. Publication Bias Assessment

A funnel plot of the asymmetry in publication bias is shown in Figure 5. The plots displayed some asymmetry, especially for sensitivity, suggesting possible small study effects or publication bias. However, the trim-and-fill adjustment reflected minimal impact on pooled estimates, with only two possibly missing studies identified. The adjusted estimates showed marginal changes from the original effect estimates, supporting the significance of the findings despite the possible risk of publication bias.

3.11. Clinical Utility and Implementation Readiness

The clinical utility and implementation readiness were illustrated in a diagram and are presented in Figure 6. The implementation readiness matrix revealed that several studies achieved high clinical performance with varying degrees of implementation readiness. Economic impact assessment demonstrated significant possible benefits, including cost reductions of 15–30% and time savings of 40–60% in various clinical processes. However, implementation barriers were identified, with technical integration challenges reported in 67% of studies, staff training requirements in 45%, regulatory compliance concerns in 33%, and initial investment costs in 28% of studies. Geographic distribution showed a predominance of Asian studies (58%), with 53% of studies having completed external validation. The implementation timeline indicated current positioning in Phase III (Implementation/Early adoption phase, 2025–2026), with progression toward widespread adoption (Phase IV) expected by 2027–2029. Implementation readiness indicators showed high evidence quality in 87.5% of studies, external validation completion in 53%, and multi-center trials in 42%, but no regulatory approvals currently pending. The decision curve analysis (Figure 7) demonstrates the clinical utility of AI-based prediction models compared to treat-all and treat-none strategies across varying threshold probabilities. The AI model provides net benefit over the treat-all strategy when the threshold probability is below 62%, indicating that AI-guided decision-making is clinically superior for patients with a moderate-to-high likelihood of treatment response. At threshold probabilities above 62%, the treat-all strategy becomes preferable. This analysis, based on pooled diagnostic accuracy estimates (sensitivity 86.4%, specificity 77.6%) and an assumed DME treatment response prevalence of 40%, supports the clinical utility of AI models for personalized treatment planning in real-world settings where resource optimization is critical.

4. Discussion

DME is the one of the leading causes of vision impairment among individuals with diabetic retinopathy and represents a growing global health concern. DME is characterized by the accumulation of fluids in the macula, and can lead to significant visual loss if left untreated. A central mediator of this process is VEGF, which contributes to the breakdown of the blood–retinal barrier and promotes leakage from retinal capillaries [26]. Anti-VEGF agents, such as ranibizumab, are now widely used as a standard management for DME. While many patients benefit from these treatments, others experience little to no improvement, despite multiple injections and regular follow-up. This unpredictability can be frustrating for both patients and physicians, highlighting the need for more effective solutions that can anticipate how individuals will respond to therapy, ideally before treatment begins [27].
Recent advances in AI, including deep learning and machine learning advancements, have shown promise in analyzing complex imaging data, such as OCT scans [43]. These algorithms offer a promising role to predict individual responses to anti-VEGF therapy, allowing for more personalized and cost-effective treatment plans [28]. Our study aimed to assess the diagnostic accuracy of AI models in predicting the response to anti-VEGF in patients with DME. Our results indicate that AI models demonstrate strong diagnostic performance with high sensitivity and moderate specificity. The overall AUC reflected excellent discriminative ability, supported by significant performance, highlighting the high predictive capacity of these models. We also found that AI models outperformed other prediction methods, including some of the human experts, in most direct comparisons, suggesting that AI could improve diagnostic accuracy beyond other methods. Our subgroup analyses revealed that hybrid deep learning techniques and multimodal input data by combining imaging with clinical features, were associated with improved sensitivity. In addition to that, studies with longer follow-up periods tended to report better predictive accuracy. However, some heterogeneity existed among the included studies: factors such as study quality, model complexity, and variations in input data contributed to these differences. Several of the included studies corroborate and extend our meta-analytic findings.
For example, Chen et al. [44] developed multilayer perceptron (MLP) models to forecast visual acuity (VA) outcomes over extended periods. Their models demonstrated strong correlations between predicted and actual VA values, with relatively low standard errors. Their findings identified baseline VA, lens status, and intravitreal injection (IVI) schedule as the most impactful predictive factors, variables also recognized in several other studies within our review. This reinforces the consistent importance of initial visual status and treatment intensity as reliable predictors of outcomes. In contrast, Xu et al. [38] utilized a novel generative adversarial network (GAN) architecture, pix2pixHD, to synthesize post-treatment OCT images from baseline scans. Their model achieved a low mean absolute error in predicting central macular thickness (CMT). Significantly, most retinal specialists were unable to distinguish the synthetic OCT images from real post-treatment scans. This approach diverges from conventional prediction methods by providing morphological visualization of anticipated treatment responses, therefore opening new avenues for better explanation, simulation, and patient engagement.
Zhang et al. [37] combined linear regression with random forest algorithms in an ensemble model to predict post-treatment VA with minimal error. Their findings confirmed baseline VA as the most significant predictor of absolute VA outcomes, while CMT and age were more relevant to predicting visual improvement. This peculiar distinction highlights that different features may have varied predictive power depending on the specific endpoint assessed, informing AI model development and feature selection. Further supporting the role of imaging-based AI, Rasti et al. [41] developed a deep convolutional network (CADNet) that achieved a high AUC, and Mondal et al. [16] introduced a hybrid CNN model that attained comparable AUC and accuracy, with no false negatives.
These OCT-only models focus on the strong discriminative capability of deep learning architectures to extract meaningful features from retinal images, making them valuable tools for identifying responders to anti-VEGF therapy. Our results highlight the promising role of AI in assisting decision-making for managing patients with DME by identifying patients who are less likely to respond well to anti-VEGF, which may allow for earlier, more targeted interventions. This approach could help reduce the burden of unnecessary treatments and improve patient outcomes. In addition to that, some evidence points to possible savings in both time and cost, indicating that AI might contribute to more efficient healthcare delivery. The decision curve analysis demonstrated that AI-based prediction models provide net clinical benefit compared to treat-all strategies when the threshold probability of treatment response is below 62%. This indicates that for patients with moderate-to-high baseline likelihood of responding to anti-VEGF therapy, AI-guided decision-making offers superior clinical utility by identifying non-responders who may benefit from alternative interventions.
At higher threshold probabilities (>62%), a treat-all approach becomes preferable, suggesting that AI models are most valuable for risk stratification in intermediate-probability cases rather than for patients with a very high or very low pre-test probability of response. This framework supports selective AI deployment in clinical pathways where treatment decisions are most uncertain and where accurate prediction can meaningfully alter management. To operationalize our study’s goal of turning pooled accuracy into practical design guidance, we highlight two recent infrared thermography papers whose ideas translate directly to DME anti-VEGF modeling. Both adopt multi-task learning with explicit feature separation, training one pathway to capture stable, subject-specific signals ("who responds") and a second to model transient, time-dependent behavior ("when/how long"), while using soft-threshold/shrinkage blocks and balanced losses to suppress noise and prevent one objective from dominating. For our field, this argues for architectures that jointly classify responder status and forecast clinically meaningful temporal endpoints, such as time-to-dry macula, durability between injections, or early VA/CMT trajectory, so that outputs are directly actionable for treatment planning. Because reported gains were sensitive to dataset splits, these time-aware, disentangled models should be assessed on multi-center, prospectively labeled cohorts with standardized benchmarks and external validation [45,46].
Two recent multimodal oncology papers also offer a concrete blueprint for DME: rather than training a single-endpoint classifier, they fuse complementary signals (e.g., imaging, digital pathology, and liquid biomarkers) and optimize a survival-aware objective to generate individualized risk curves. Translating this recipe to retina, future models should late-fuse OCT or OCT Angiography (OCT-A) structure–texture features with ischemia/leakage maps, baseline clinical variables, and blood/aqueous biomarkers, and then (i) classify early responders and (ii) jointly predict durability endpoints. Architecturally, squeeze-and-excitation/attention blocks can stabilize cross-modal contributions; missing-modality gating avoids the exclusion of patients lacking one test; and calibrated outputs (Platt/isotonic) should be reported alongside AUC with time-to-event metrics (C-index, Brier). Finally, as seen in these multimodal studies, external, multi-center validation is essential to demonstrate a real advantage over strong unimodal OCT baselines and to make the predictions actionable for scheduling and regimen selection [47,48].
The integration of OCT-A with structural OCT through hybrid 3D convolutional neural network or Vision Transformer architectures represents a biologically rational approach to improving anti-VEGF response prediction in DME. Vascular parameters quantifiable by OCT-A, including macular perfusion density, vessel-length density, foveal avascular zone area and circularity, and capillary non-perfusion volume, have demonstrated significant associations with treatment response [49,50,51]. Eyes demonstrating anatomical response to anti-VEGF therapy show corresponding improvements in deep capillary plexus perfusion density and vessel architecture, with high responders showing statistically significant increases in perfusion metrics compared to non-responders [49]. The biological rationale for incorporating vascular features stems from the fundamental pathophysiology of DME, wherein retinal ischemia and capillary dropout drive compensatory VEGF upregulation; eyes with severe baseline ischemia and reduced perfusion density demonstrate attenuated anti-VEGF response, as the edema mechanism extends beyond VEGF-mediated permeability to include structural microvascular loss [50,51]. Multimodal fusion models that integrate structural features (intraretinal/subretinal fluid compartmentalization, photoreceptor integrity), vascular parameters (deep and superficial plexus perfusion, non-perfusion area volume), and clinical data (diabetes duration, HbA1c, baseline visual acuity) could more comprehensively capture the multifactorial etiology of treatment response. However, as with any multimodal approach, rigorous external validation on diverse, multi-center cohorts is essential to establish whether the added algorithmic complexity and acquisition burden of OCT-A translates to clinically meaningful predictive improvement over structural OCT-only models, particularly given that OCT-A image quality can be compromised by media opacity and severe macular edema [49,50,51].
Unfortunately, despite these advantages, integrating AI into routine, real-world practice workflows remain challenging. Major obstacles include technical aspects of system integration and the need for adequate user training to ensure proper implementation. These factors may limit how soon such technologies become a regular part of care. Despite the methodological strengths in our study, there are several limitations that should be declared. While our study explores the accuracy of AI models, most of the studies were retrospective in nature. Retrospective studies are susceptible to selection bias and may not reflect real-world practice settings. The moderate heterogeneity observed in some of the studies (I2 of 45.2% for sensitivity and 36.1% for specificity) indicates variation in population, AI module, and measured outcomes. We attempted to clarify part of the heterogeneity through meta-regression modeling.
A significant methodological limitation was the inclusion of both eyes from individual patients in seven studies (38.9%) without statistical adjustment for within-patient correlation, introducing clustering bias that inflates precision and underestimates standard errors. Among the six studies included in our meta-analysis, three had confirmed or probable clustering without adjustment (Magrath et al., Baek et al., Meng et al.) [28,31,34], with calculated design effects ranging from 1.38 to 1.89, indicating moderate correlation between paired eyes. While sensitivity analysis excluding studies with high clustering risk showed minimal impact on pooled estimates, this represents a systematic quality concern in the AI prediction literature. Future studies should either include only one randomly selected eye per patient or employ appropriate statistical methods (e.g., generalized estimating equations, mixed-effects models) to account for within-patient correlation and provide unbiased variance estimates.
An inspection of funnel plot of asymmetry and statistical testing through Egger’s test indicated a statistically significant publication bias, although the trim and fill adjustment method suggested a minimal impact on the pooled estimate. However, any publication bias highlights the importance of vigilance and transparency in the reporting of future studies to ensure unbiased evidence. Moreover, a few studies included in some subgroup analyses limit the ability to give a definitive conclusion about the dominance of specific approaches over others. An additional limitation is the absence of an individual participant data (IPD) meta-analysis, which would have enabled more precise estimation of treatment effects while accounting for participant-level covariates and within-study heterogeneity. An IPD meta-analysis could have provided more robust adjustment for clustering effects in studies including both eyes, allowed the exploration of effect modification by patient-level characteristics (e.g., diabetes duration, HbA1c levels, baseline DME severity), and enabled more sophisticated handling of missing data. However, IPD was not accessible for the included studies, necessitating reliance on aggregate study-level data. This represents a recognized limitation in the precision of our subgroup analyses and heterogeneity assessments, particularly for evaluating patient-specific predictors of AI model performance.
A critical methodological concern was identified in one study (Song et al. 2025) [29] reporting exceptionally high performance metrics (AUC = 0.9998, accuracy 99.3%), which are implausible for real-world clinical prediction tasks. Such extreme values typically indicate overfitting to training data, potential data leakage between training and test sets, or highly selective test populations that do not represent typical clinical heterogeneity. Our PROBAST-AI assessment rated this study as having a high risk of bias in the analysis domain. While these results may reflect optimization on a specific internal dataset, they are unlikely to generalize to independent patient populations and should be interpreted with extreme caution until external validation on diverse cohorts is demonstrated. This highlights the critical importance of external validation and realistic performance benchmarking in AI diagnostic studies. We should also note that the included studies ranged between 2020 and 2025, meaning that our study reflects current AI models but may not capture rapidly growing technological advances. AI utilization in ophthalmology is progressing rapidly, and new methodologies may offer superior results compared to the AI models evaluated in our study.
An additional limitation is that corresponding authors were not contacted to obtain missing or unreported data elements (marked as "NR" throughout tables), which may have limited the completeness of our data extraction and prevented more comprehensive subgroup analyses. Future systematic reviews should incorporate author contact protocols to maximize data availability and reduce reporting gaps. Future studies are warranted to validate and investigate our findings through large-scale, prospective, multi-center studies with pre-specified outcome definitions and analysis plans. Our systematic search identified limited prospective cohort studies and only one RCT, reflecting the early stage of this research field.
The predominance of retrospective designs (89% of included studies) highlights the need for prospective validation studies that eliminate the risk of data leakage, ensure temporal separation between predictor and outcome assessment, and provide unbiased estimates of real-world clinical performance. Prospective trials comparing AI-guided versus standard treatment pathways, with patient-relevant outcomes such as visual function, quality of life, and treatment burden, are essential to establish clinical utility beyond diagnostic accuracy metrics. The establishment of a multi-center, publicly annotated DME-OCT benchmark dataset with standardized outcome definitions would significantly advance the field by enabling direct performance comparison across AI algorithms while minimizing device-specific and population-specific biases. Such initiatives, analogous to benchmark datasets in diabetic retinopathy screening (e.g., EyePACS, Messidor), could provide validated testing frameworks that accelerate clinical translation. A standardized benchmark should include various patient populations, multiple OCT device manufacturers, various anti-VEGF agents and dosing protocols, and consensus-defined response criteria to ensure ecological validity. This would allow researchers to test models on identical holdout sets, facilitating transparent comparison and identifying truly generalizable architectures suitable for real-world implementation. The integration of OCT-A data through hybrid 3D convolutional neural network or Vision Transformer architectures represents a promising avenue for improving predictive accuracy beyond structural OCT parameters alone. Vascular features such as foveal avascular zone area, macular vessel density, non-perfusion region volume, and capillary dropout patterns may provide complementary prognostic information about treatment response, as impaired retinal perfusion has been associated with suboptimal anti-VEGF outcomes.
Multimodal fusion approaches that integrate structural OCT (edema, fluid compartments), vascular OCT-A (perfusion metrics), and clinical data (diabetes duration, HbA1c, baseline visual acuity) could capture the multifactorial nature of treatment response more comprehensively. However, such approaches require careful validation to ensure that added complexity translates to clinically meaningful improvement rather than overfitting to training data. Prospective studies comparing OCT-only versus multimodal fusion models on independent validation cohorts are needed to establish the incremental value of OCT-A integration.

5. Conclusions

The overall findings of this systematic review and meta-analysis demonstrate that AI models have strong diagnostic accuracy for predicting anti-VEGF treatment response for DME patients, with a pooled sensitivity of 86.4% and specificity of 77.6%, resulting in an excellent summary AUC of 0.89. In 87.5% of the comparative studies, AI was consistently superior to all other prediction methods, with hybrid deep learning approaches and multimodal integration processes having superior performance results. These results indicate that AI-based prediction tools have a promising role in improving real-world practice by predicting which patients are least likely to respond poorly to anti-VEGF. Reducing the number of poor-responding patients to anti-VEGF could significantly reduce costs to the healthcare system by 15% to 30%. It could also improve patient outcomes, such as better visual recovery, by allowing for better personalized management strategies.
Despite the positive findings, several limitations need to be considered before it can be adopted for use in routine practice. First, most of the studies included were retrospective in nature, totaling around 89% of the studies, and had a moderate level of evidence quality, and the evidence was limited by significant publication bias necessitating a larger-scale prospective validation study. In addition, the barriers to implementation, including technical integration issues, staff training, and regulatory approval, limit real-world use. Future studies in AI outcomes should focus on multi-center prospective studies with a standardized definition of outcomes and improve implementation barriers so that AI tools can be seamlessly integrated into routine practice and achieve the potential to reinterpret the personalized management of DME.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/doi/s1/. Figure S1: Meta-Regression Model for Sources of Heterogeneity in Diagnostic Accuracy. Table S1: Detailed PROBAST-AI Risk of Bias Assessment. Table S2: GRADE Evidence Quality Assessment for Diagnostic Test Accuracy. Table S3: Assessment of Clustering Effects and Within-Patient Correlation in Included Studies.

Author Contributions

F.A.A.-H.: Conceptualization, Methodology, Project administration, Writing—review and editing; M.A.A. (Mohanad A. Alkuwaiti): Investigation, Data curation, Writing—original draft; M.A.A. (Meshari A. Alharbi): Formal analysis, Visualization, Software; A.A.A. (Ahmed A. Alessa): Validation, Resources, Writing—review and editing; A.A.A. (Ajwan A. Alhassan): Methodology, Investigation, Data curation; E.A.A.: Software, Formal analysis, Visualization; F.Y.A.-T.: Data curation, Investigation, Writing—original draft; M.A.: Resources, Validation, Funding acquisition; S.M.A.: Supervision, Conceptualization, Writing—review and editing; A.Y.A.: Supervision, Funding acquisition, Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU253194].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Conflicts of Interest

The authors declare no competing interests.

References

  1. Varma, R.; Bressler, N.M.; Doan, Q.V.; Gleeson, M.; Danese, M.; Bower, J.K.; Selvin, E.; Dolan, C.; Fine, J.; Colman, S. Prevalence of and risk factors for diabetic macular edema in the United States. JAMA Ophthalmol. 2014, 132, 1334–1340. [Google Scholar] [CrossRef] [PubMed]
  2. Sakini, A.S.A.; Hamid, A.K.; Alkhuzaie, Z.A.; Al-Aish, S.T.; Al-Zubaidi, S.; Tayem, A.A.E.; Alobi, M.A.; Sakini, A.S.A.; Al-Aish, R.T.; Al-Shami, K. Diabetic macular edema (DME): Dissecting pathogenesis, prognostication, diagnostic modalities along with current and futuristic therapeutic insights. Int. J. Retin. Vitr. 2024, 10, 83. [Google Scholar] [CrossRef]
  3. Lee, R.; Wong, T.Y.; Sabanayagam, C. Epidemiology of diabetic retinopathy, diabetic macular edema and related vision loss. Eye Vis. 2015, 2, 17. [Google Scholar] [CrossRef]
  4. Yao, J.; Huang, W.; Gao, L.; Liu, Y.; Zhang, Q.; He, J.; Zhang, L. Comparative efficacy of anti-vascular endothelial growth factor on diabetic macular edema diagnosed with different patterns of optical coherence tomography: A network meta-analysis. PLoS ONE 2024, 19, e0304283. [Google Scholar] [CrossRef]
  5. Wang, X.; He, X.; Qi, F.; Liu, J.; Wu, J. Different anti-vascular endothelial growth factor for patients with diabetic macular edema: A network meta-analysis. Front. Pharmacol. 2022, 13, 876386. [Google Scholar] [CrossRef]
  6. Stewart, M.W. A review of ranibizumab for the treatment of diabetic retinopathy. Ophthalmol. Ther. 2017, 6, 33–47. [Google Scholar] [CrossRef]
  7. Huang, J.; Liang, X.; Liu, Q.-f.; Zhou, M.-j.; Hu, P.; Jiang, S.-c. Efficacy of ranibizumab with laser in the treatment of diabetic retinopathy compare with laser monotherapy: A systematic review and meta-analysis. Technol. Health Care 2025, 33, 1320–1330. [Google Scholar] [CrossRef] [PubMed]
  8. Călugăru, D.; Călugăru, M. Vision Outcomes Following Anti–Vascular Endothelial Growth Factor Treatment of Diabetic Macular Edema in Clinical Practice. Am. J. Ophthalmol. 2018, 193, 253–254. [Google Scholar] [CrossRef]
  9. Boyer, D.S.; Hopkins, J.J.; Sorof, J.; Ehrlich, J.S. Anti-vascular endothelial growth factor therapy for diabetic macular edema. Ther. Adv. Endocrinol. Metab. 2013, 4, 151–169. [Google Scholar] [CrossRef] [PubMed]
  10. Babiuch, A.S.; Conti, T.F.; Conti, F.F.; Silva, F.Q.; Rachitskaya, A.; Yuan, A.; Singh, R.P. Diabetic macular edema treated with intravitreal aflibercept injection after treatment with other anti-VEGF agents (SWAP-TWO study): 6-month interim analysis. Int. J. Retin. Vitr. 2019, 5, 17. [Google Scholar] [CrossRef]
  11. Ali, M.A.A.; Hegazy, H.S.; Elsayed, M.O.A.; Tharwat, E.; Mansour, M.N.; Hassanein, M.; Ezzeldin, E.R.; GadElkareem, A.M.; Abd Ellateef, E.M.; Elsayed, A.A. Aflibercept or ranibizumab for diabetic macular edema. Med. Hypothesis Discov. Innov. Ophthalmol. 2024, 13, 16. [Google Scholar] [CrossRef] [PubMed]
  12. Gurung, R.L.; FitzGerald, L.M.; Liu, E.; McComish, B.J.; Kaidonis, G.; Ridge, B.; Hewitt, A.W.; Vote, B.J.; Verma, N.; Craig, J.E. Predictive factors for treatment outcomes with intravitreal anti-vascular endothelial growth factor injections in diabetic macular edema in clinical practice. Int. J. Retin. Vitr. 2023, 9, 23. [Google Scholar] [CrossRef]
  13. Kong, M.; Song, S.J. Artificial Intelligence Applications in Diabetic Retinopathy: What We Have Now and What to Expect in the Future. Endocrinol. Metab. 2024, 39, 416–424. [Google Scholar] [CrossRef]
  14. Lu, W.; Xiao, K.; Zhang, X.; Wang, Y.; Chen, W.; Wang, X.; Ye, Y.; Lou, Y.; Li, L. A machine learning model for predicting anatomical response to Anti-VEGF therapy in diabetic macular edema. Front. Cell Dev. Biol. 2025, 13, 1603958. [Google Scholar] [CrossRef]
  15. Mellor, J.; Jeyam, A.; Beulens, J.W.; Bhandari, S.; Broadhead, G.; Chew, E.; Fickweiler, W.; van der Heijden, A.; Gordin, D.; Simó, R. Role of systemic factors in improving the prognosis of diabetic retinal disease and predicting response to diabetic retinopathy treatment. Ophthalmol. Sci. 2024, 4, 100494. [Google Scholar] [CrossRef]
  16. Mondal, A.; Nandi, A.; Pramanik, S.; Mondal, L.K. Application of deep learning algorithm for judicious use of anti-VEGF in diabetic macular edema. Sci. Rep. 2025, 15, 4569. [Google Scholar] [CrossRef] [PubMed]
  17. Yao, J.; Lim, J.; Lim, G.Y.S.; Ong, J.C.L.; Ke, Y.; Tan, T.F.; Tan, T.-E.; Vujosevic, S.; Ting, D.S.W. Novel artificial intelligence algorithms for diabetic retinopathy and diabetic macular edema. Eye Vis. 2024, 11, 23. [Google Scholar] [CrossRef] [PubMed]
  18. Balyen, L.; Peto, T. Promising artificial intelligence-machine learning-deep learning algorithms in ophthalmology. Asia-Pac. J. Ophthalmol. 2019, 8, 264–272. [Google Scholar]
  19. Chatzimichail, E.; Feltgen, N.; Motta, L.; Empeslidis, T.; Konstas, A.G.; Gatzioufas, Z.; Panos, G.D. Transforming the future of ophthalmology: Artificial intelligence and robotics’ breakthrough role in surgical and medical retina advances: A mini review. Front. Med. 2024, 11, 1434241. [Google Scholar] [CrossRef]
  20. Gonzalez-Gonzalo, C.; Thee, E.F.; Klaver, C.C.; Lee, A.Y.; Schlingemann, R.O.; Tufail, A.; Verbraak, F.; Sánchez, C.I. Trustworthy AI: Closing the gap between development and integration of AI systems in ophthalmic practice. Prog. Retin. Eye Res. 2022, 90, 101034. [Google Scholar] [CrossRef]
  21. Kenney, R.C.; Requarth, T.W.; Jack, A.I.; Hyman, S.W.; Galetta, S.L.; Grossman, S.N. AI in neuro-ophthalmology: Current practice and future opportunities. J. Neuro-Ophthalmol. 2024, 44, 308–318. [Google Scholar] [CrossRef]
  22. Lin, F.; Su, Y.; Zhao, C.; Akter, F.; Yao, S.; Huang, S.; Shao, X.; Yao, Y. Tackling visual impairment: Emerging avenues in ophthalmology. Front. Med. 2025, 12, 1567159. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, S.; He, X.; Jian, Z.; Li, J.; Xu, C.; Chen, Y.; Liu, Y.; Chen, H.; Huang, C.; Hu, J. Advances and prospects of multi-modal ophthalmic artificial intelligence based on deep learning: A review. Eye Vis. 2024, 11, 38. [Google Scholar] [CrossRef]
  24. Tamilselvi, S.; Suchetha, M.; Ratra, D.; Surya, J.; Preethi, S.; Raman, R. Evaluating anti-VEGF responses in diabetic macular edema: A systematic review with AI-powered treatment insights. Indian J. Ophthalmol. 2025, 73, 797–806. [Google Scholar] [CrossRef]
  25. Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
  26. Garraoui, K.; Rahmany, I.; Dhahri, S.; Aydi, W.; Jarrar, M.S.; Ferjaoui, R.; Ben Salem, D.; Hammami, M.; Abdelhedi, R.; Kraiem, T. A deep learning approach for predicting the response to anti-VEGF treatment in diabetic macular edema patients using optical coherence tomography images. Proc. Int. Conf. Agents Artif. Intell. 2025, 2, 453–462. [Google Scholar]
  27. Atik, M.E.; Kocak, İ.; Sayin, N.; Bayramoglu, S.E.; Ozyigit, A. Integration of optical coherence tomography images and real-life clinical data for deep learning modeling: A unified approach in prognostication of diabetic macular edema. J. Biophotonics 2025, 18, e202400315. [Google Scholar] [CrossRef] [PubMed]
  28. Magrath, G.; Luvisi, J.; Russakoff, D.; Kihara, Y.; Waheed, N.K.; Toslak, D. Use of a convolutional neural network to predict the response of diabetic macular edema to intravitreal anti-VEGF treatment: A pilot study. Am. J. Ophthalmol. 2025, 273, 176–182. [Google Scholar] [CrossRef]
  29. Song, T.; Zang, B.; Kong, C.; Chen, T.; Tang, J.; Yan, H. Construction of a predictive model for the efficacy of anti-VEGF therapy in macular edema patients based on OCT imaging: A retrospective study. Front. Med. 2025, 12, 1505530. [Google Scholar] [CrossRef]
  30. Liang, X.; Luo, S.; Liu, Z.; Cheng, P.; Tan, L.; Xie, Y.; Sun, Z.; Li, X. Unsupervised machine learning analysis of optical coherence tomography radiomics features for predicting treatment outcomes in diabetic macular edema. Sci. Rep. 2025, 15, 13389. [Google Scholar] [CrossRef]
  31. Baek, J.; He, Y.; Emamverdi, M.; Kihara, Y.; Borooah, S.; Hallak, J.A.; Mehta, N.; Udaondo, P.; Diaz-Llopis, M.; Naor, J.; et al. Prediction of long-term treatment outcomes for diabetic macular edema using a generative adversarial network. Transl. Vis. Sci. Technol. 2024, 13, 4. [Google Scholar] [CrossRef]
  32. Jin, Y.; Yong, S.; Ke, S.; Yan, Y.; Xiong, L.; Hu, Z.; Xu, W.; Zeng, Y.; Peng, X. Deep learning assisted fluid volume calculation for assessing anti-vascular endothelial growth factor effect in diabetic macular edema. Heliyon 2024, 10, e29775. [Google Scholar] [CrossRef] [PubMed]
  33. Leng, X.; Shi, R.; Xu, Z.; Huang, J.; Chen, Q.; Lu, X. Development and validation of CNN-MLP models for predicting anti-VEGF therapy outcomes in diabetic macular edema. Sci. Rep. 2024, 14, 30270. [Google Scholar] [CrossRef]
  34. Meng, Z.; Chen, Y.; Li, H.; Fan, X.; Li, Y.; Zeng, H.; Zhu, J.; Li, X.; Yuan, M.; Zhang, J.; et al. Machine learning and optical coherence tomography-derived radiomics analysis to predict persistent diabetic macular edema in patients undergoing anti-VEGF intravitreal therapy. J. Transl. Med. 2024, 22, 358. [Google Scholar] [CrossRef]
  35. Shi, R.; Leng, X.; Wu, Y.; Zhu, S.; Cai, X.; Lu, X. Machine learning regression algorithms to predict short-term efficacy after anti-VEGF treatment in diabetic macular edema based on real-world data. Sci. Rep. 2023, 13, 18746. [Google Scholar] [CrossRef]
  36. Alryalat, S.A.; Al-Antary, M.; Arafa, Y.; Alshawabkeh, O.; Abuamra, T.; AlRyalat, A.A.; Al Bdour, M. Deep learning prediction of response to anti-VEGF among diabetic macular edema patients: Treatment Response Analyzer System (TRAS). Diagnostics 2022, 12, 312. [Google Scholar] [CrossRef] [PubMed]
  37. Zhang, Y.; Xu, F.; Lin, Z.; Wang, J.; Huang, C.; Wei, M.; Zhai, W.; Li, J. Prediction of Visual Acuity after anti-VEGF Therapy in Diabetic Macular Edema by Machine Learning. J. Diabetes Res. 2022, 2022, 5779210. [Google Scholar] [CrossRef] [PubMed]
  38. Xu, F.; Liu, S.; Xiang, Y.; Hong, J.; Wang, J.; Shao, Z.; Zhang, R.; Zhao, W.; Yu, X.; Li, Z. Prediction of the short-term therapeutic effect of anti-VEGF therapy for diabetic macular edema using a generative adversarial network with OCT images. J. Clin. Med. 2022, 11, 2878. [Google Scholar] [CrossRef]
  39. Liu, B.; Zhang, B.; Hu, Y.; Qiu, B.; Xie, L.; Yin, Y.; Liao, X.; Zhou, Q.; Huang, J. Automatic prediction of treatment outcomes in patients with diabetic macular edema using ensemble machine learning. Ann. Transl. Med. 2021, 9, 43. [Google Scholar] [CrossRef]
  40. Cao, J.; You, K.; Jin, K.; Lou, L.; Wang, Y.; Xu, Y.; Chen, L.; Wang, S.; Ye, J. Prediction of response to anti-vascular endothelial growth factor treatment in diabetic macular oedema using an optical coherence tomography-based machine learning method. Acta Ophthalmol. 2021, 99, 19–27. [Google Scholar] [CrossRef]
  41. Rasti, R.; Allingham, M.J.; Mettu, P.S.; Kavusi, S.; Govind, K.; Cousins, S.W.; Farsiu, S. Deep learning-based single-shot prediction of differential effects of anti-VEGF treatment in patients with diabetic macular edema. Biomed. Opt. Express 2020, 11, 1139–1152. [Google Scholar] [CrossRef] [PubMed]
  42. Roberts, P.K.; Vogl, W.D.; Gerendas, B.S.; Glassman, A.R.; Bogunovic, H.; Jampol, L.M.; Browning, D.J.; Sadda, S.R.; Schmidt-Erfurth, U. Quantification of fluid resolution and visual acuity gain in patients with diabetic macular edema using deep learning: A post hoc analysis of a randomized clinical trial. JAMA Ophthalmol. 2020, 138, 945–953. [Google Scholar] [CrossRef]
  43. Browning, D.J.; Stewart, M.W.; Lee, C. Diabetic macular edema: Evidence-based management. Indian J. Ophthalmol. 2018, 66, 1736–1750. [Google Scholar] [CrossRef] [PubMed]
  44. Chen, S.-C.; Chiu, H.-W.; Chen, C.-C.; Woung, L.-C.; Lo, C.-M. A novel machine learning algorithm to automatically predict visual outcomes in intravitreal ranibizumab-treated patients with diabetic macular edema. J. Clin. Med. 2018, 7, 475. [Google Scholar] [CrossRef] [PubMed]
  45. Yu, X.; Liang, X.; Zhou, Z.; Zhang, B.; Xue, H. Deep soft threshold feature separation network for infrared handprint identity recognition and time estimation. Infrared Phys. Technol. 2024, 138, 105223. [Google Scholar] [CrossRef]
  46. Yu, X.; Liang, X.; Zhou, Z.; Zhang, B. Multi-task learning for hand heat trace time estimation and identity recognition. Expert Syst. Appl. 2024, 255, 124551. [Google Scholar] [CrossRef]
  47. Lyu, X.; Liu, J.; Gou, Y.; Sun, S.; Hao, J.; Cui, Y. Development and validation of a machine learning-based model of ischemic stroke risk in the Chinese elderly hypertensive population. View 2024, 5, 20240059. [Google Scholar] [CrossRef]
  48. Yuan, Y.; Zhang, X.; Wang, Y.; Li, H.; Qi, Z.; Du, Z.; Chu, Y.-H.; Feng, D.; Hu, J.; Xie, Q.; et al. Multimodal data integration using deep learning predicts overall survival of patients with glioma. View 2024, 5, 20240001. [Google Scholar] [CrossRef]
  49. Massengill, M.T.; Cubillos, S.; Sheth, N.; Sethi, A.; Lim, J.I. Response of Diabetic Macular Edema to Anti-VEGF Medications Correlates with Improvement in Macular Vessel Architecture Measured with OCT Angiography. Ophthalmol. Sci. 2024, 4, 100478. [Google Scholar] [CrossRef]
  50. Lee, J.; Moon, B.G.; Cho, A.R.; Yoon, Y.H. Optical Coherence Tomography Angiography of DME and Its Association with Anti-VEGF Treatment Response. Ophthalmology 2016, 123, 2368–2375. [Google Scholar] [CrossRef]
  51. Braham, I.Z.; Kaouel, H.; Boukari, M.; Ammous, I.; Errais, K.; Boussen, I.M.; Zhioua, R. Optical coherence tomography angiography analysis of microvascular abnormalities and vessel density in treatment-naïve eyes with diabetic macular edema. BMC Ophthalmol. 2022, 22, 418. [Google Scholar] [CrossRef]
Figure 1. Prisma flow diagram.
Figure 1. Prisma flow diagram.
Jcm 14 08177 g001
Figure 2. Forest plot for sensitivity by outcome definition [28,29,32,35,41,42].
Figure 2. Forest plot for sensitivity by outcome definition [28,29,32,35,41,42].
Jcm 14 08177 g002
Figure 3. Summary ROC curve [28,29,32,35,41,42].
Figure 3. Summary ROC curve [28,29,32,35,41,42].
Jcm 14 08177 g003
Figure 4. Bivariate performance plot [28,29,32,35,41,42].
Figure 4. Bivariate performance plot [28,29,32,35,41,42].
Jcm 14 08177 g004
Figure 5. Funnel plot for publication bias assessment.
Figure 5. Funnel plot for publication bias assessment.
Jcm 14 08177 g005
Figure 6. Clinical utility and implementation readiness plot [28,29,30,35,37,40,41,42].
Figure 6. Clinical utility and implementation readiness plot [28,29,30,35,37,40,41,42].
Jcm 14 08177 g006
Figure 7. Decision curve analysis plot.
Figure 7. Decision curve analysis plot.
Jcm 14 08177 g007
Table 1. Study characteristics, demographics, and AI model specifications.
Table 1. Study characteristics, demographics, and AI model specifications.
Study NameCountryDesignSample SizeAge (Years)Gender (M/F)DME SeverityFollow-UpAI Model TypeInput DataTraining SizeValidation SizeTest SizeCV MethodExternal ValidationFeature SelectionModel Comparison
Garraoui et al., 2025 [26]TunisiaRetrospective Cohort104 patientsNRNRNRNRSiamese CNN (EfficientNetB2) + KNNOCT84,495 (Kaggle)NR120 images5-foldNoNRMultiple CNN architectures
Atik et al., 2025 [27]TurkeyRetrospective Cohort683 patientsNRNRCenter-involving DMENRDL (ResNet-18)Multimodal (OCT + Clinical)546 patientsNR137 patients5-foldNoNRMultiple DL models
Magrath et al., 2025 [28]USARetrospective Cohort73 eyes62.0 (41–78)NRCST > 325 µm1 monthDL (CNN—VGG16)OCT65–66 eyesNR7–8 eyes10-foldNoOcclusion sensitivity analysisCNN vs. CST classifier
Mondal et al., 2025 [16]IndiaRCT181 patients62.1 ± 8.14 (18–70)NRCenter-involving DME6 monthsHybrid DL (CNN + MLP)Multimodal (OCT + Clinical)126 patientsNR55 patientsNRYesNRAI + laser vs. laser only
Song et al., 2025 [29]ChinaRetrospective Cohort72 eyes59.45 ± 13.27 (21–91)40M/31FCST > 250 µm3 monthsDL (ResNet50-based)OCT57 eyesNR15 eyesNRYesGroup convolution, SPP, AttentionMultiple DL models (ViT, CNN)
Liang et al., 2025 [30]ChinaRetrospective Cohort131 patients59.27 ± 9.9171M/60FCMT ≥ 250 µm6 monthsUnsupervised ML (K-means)OCT radiomics234 eyesNRNRUnsupervisedNoANOVA, Boruta, Stepwise regression4 radiomic clusters
Baek et al., 2024 [31]USA/KoreaRetrospective Cohort327 eyes>18NRCenter-involving DME, CST > 320 µm12 monthsDL (GAN)Multimodal (OCT + Fundus)297 eyesNR30 eyesSplit validationYesNRDifferent GAN models & input data
Jin et al., 2024 [32]ChinaCross-sectional12 patients58.43 ± 2.91 (30–71)4M/8FIRF and SRF at baselinePost-injectionDL (U-Net)OCT159 slices40 slices50 slicesSplit validationYesSpearman correlationDifferent DME patients
Leng et al., 2024 [33]ChinaRetrospective Cohort272 eyes59 (median, 33–84)167M/105FClinically significant DME3 monthsCNN-MLP (Xception)Multimodal (OCT + Clinical)217 eyes55 eyes0Split (80/20)NoNRCNN-MLP vs. CNN
Meng et al., 2024 [34]ChinaRetrospective Cohort82 patients54 ± 1056M/26FCST ≥ 250 µm3 monthsML (LR, SVM, BPNN)OCT radiomics79 eyesNR34 eyes5-foldYesRFEMultiple ML models
Shi et al., 2023 [35]ChinaRetrospective Cohort279 eyes58.53 ± 11.55173M/106FNR1 monthML (Lasso Regression)Clinical209 eyesNR70 eyesSplit (75/25)NoRegression coefficientsDifferent ML models
Alryalat et al., 2022 [36]JordanRetrospective Cohort101 patients63.34 ± 10.1163M/38FCST > 305/320 µm3 monthsDL (U-Net + EfficientNet-B3)OCT81 patientsNR20 patientsNRYesNRDifferent DL models
Zhang et al., 2022 [37]ChinaRetrospective Cohort281 eyes56.57 ± 10.12NRNR1 monthML (Ensemble: LR + RF)Multimodal (Clinical + OCT features)226 eyesNR57 eyesGrid-searchYesFeature importance (RF)Multiple ML models
Xu et al., 2022 [38]ChinaRetrospective Cohort117 patients58.57 ± 9.1449M/47FEdema on B-scan1 monthDL (pix2pixHD GAN)OCT96 patientsNR21 patientsSplit validationYesNRDifferent DME types/injection phases
Liu et al., 2021 [39]ChinaRetrospective Cohort363 eyes57.1 ± 13.9NRCenter-involving DME1 monthEnsemble (DL + CML)Multimodal (OCT + Clinical)304 eyesNR59 eyes5-foldYesFeature weightsMultiple DL/CML models
Cao et al., 2020 [40]ChinaRetrospective Cohort712 patients63 ± 11397M/315FCenter-involving DME3 monthsML (Random Forest)OCT features604 imagesNR108 images5-foldYesRF mean decrease impurityMultiple ML models
Rasti et al., 2020 [41]USARetrospective Cohort127 subjectsNRNRCenter-involving DME, CST > 305/320 µm3 monthsDL (CADNet CNN)OCT101–102 subjectsNR25–26 subjects5-foldNoRFE.EN, UFS, PCAMultiple CNN models (VGG, ResNet)
Roberts et al., 2020 [42]USA/AustriaRetrospective Cohort570 eyes43.4 ± 12.6302M/268FStratified by VA12 monthsDL (Segmentation) + LMEOCT57000Bootstrap (500)NoNR3 anti-VEGF agents
Abbreviations: AI = artificial intelligence; ANOVA = analysis of variance; BPNN = back-propagation neural network; CML = conventional machine learning; CMT = central macular thickness; CNN = convolutional neural network; CST = central subfield thickness; CV = cross-validation; DL = deep learning; DME = diabetic macular edema; F = female; GAN = generative adversarial network; IRF = intraretinal fluid; KNN = k-nearest neighbor; LME = linear mixed effect; LR = logistic regression; M = male; ML = machine learning; MLP = multilayer perceptron; NR = not reported; OCT = optical coherence tomography; PCA = principal component analysis; RCT = randomized controlled trial; RF = random forest; RFE = recursive feature elimination; SPP = spatial pyramid pooling; SRF = subretinal fluid; SVM = support vector machine; UFS = univariate feature selection; VA = visual acuity; VGG = Visual Geometry Group; ViT = Vision Transformer; µm = micrometers.
Table 2. Treatment protocols, response definitions, and individual study diagnostic accuracy results.
Table 2. Treatment protocols, response definitions, and individual study diagnostic accuracy results.
Study NameAnti-VEGF AgentDosing RegimenResponse DefinitionAssessment TimepointBaseline VABaseline CMTTPFPTNFNSensitivity (%)Specificity (%)PPV (%)NPV (%)AUC95% CI
Garraoui et al., 2025 [26]Anti-VEGF (unspecified)NRCMT reductionNRNRNRNRNRNRNR71.0NR89.0NRNRNR
Atik et al., 2025 [27]Anti-VEGF (unspecified)TREXPrognosis (Good vs. Poor)NRNRNRNRNRNRNR66.781.569.977.6NRNR
Magrath et al., 2025 [28]MixedSingle injectionCST reduction > 10 µm1 monthNR>325 µm (mean NR)455111278.968.890.047.80.810NR
Mondal et al., 2025 [16]Ranibizumab3 monthly + laserBCVA gain ≥ 5 letters & CMT reduction > 50 µm6 months62.4 ± 5.35 ETDRS465 ± 111.3 µm238240100.075.074.0100.00.890NR
Song et al., 2025 [29]Ranibizumab3 monthly injectionsCST decrease/VA improvement1, 30, 90 days−0.88 ± 0.05 LogMAR568.00 ± 21.46 µmNRNRNRNRNRNRNRNR0.99980.9996–0.9998
Liang et al., 2025 [30]Mixed3 injectionsRDME vs. Non-RDME (clustering)6 months0.50 LogMAR408.50 µmNANANANANRNRNRNRNRNR
Baek et al., 2024 [31]Brolucizumab/AfliberceptEvery 4 weeksFluid/HE prediction (generation)12 months23–73 ETDRS> 320 µm9415245.5–10035.7–85.750.0–88.955.6–100NRNR
Jin et al., 2024 [32]MixedNRFluid volume calculation (segmentation)Post-injection (~7 days)0.54 ± 0.05 LogMAR532.70 ± 45.02 µmNRNRNRNR68.6–84.499.6–99.876.1–86.8NR0.993–0.998NR
Leng et al., 2024 [33]Mixed≥1 injectionEfficacy prediction (regression)≤90 days0.699 LogMAR369.54 ± 158.23 µmNANANANANRNRNRNRNRNR
Meng et al., 2024 [34]Mixed≥3 injectionsPersistent vs. Non-persistent DME3 monthsNR478 ± 172 µm2147291.392.684.077.80.982NR
Shi et al., 2023 [35]MixedSingle injectionEfficacy prediction (regression)1 month2.55 ± 13.2 LogMAR372.61 ± 158.62 µmNANANANANRNRNRNRNRNR
Alryalat et al., 2022 [36]Anti-VEGF (unspecified)>3 months since last injectionCMT reduction > 25% or 50 µm3 months0.258475 µmNRNRNRNR80.8884.070.0NR0.811NR
Zhang et al., 2022 [37]Mixed1 + PRNVA prediction (regression)1 month0.585 ± 0.316 LogMAR358.36 ± 225.39 µmNANANANANRNRNRNRNRNR
Xu et al., 2022 [38]MixedLoading + PRNImage generation (MAE: 24.51 µm)1 month0.581 ± 0.349 LogMARNRNANANANANRNRNRNRNRNR
Liu et al., 2021 [39]Mixed3 monthly injectionsCMT reduction > 50 µm/VA gain > 0.1 LogMAR1 month0.79 ± 0.55 LogMAR489.13 ± 214.37 µmNRNRNRNRNRNRNRNR0.940 (CFT)/0.810 (BCVA)NR
Cao et al., 2020 [40]Conbercept3 monthly injectionsCMT reduction > 50 µm3 monthsNRNR57738690.585.189.186.40.923NR
Rasti et al., 2020 [41]Mixed3 monthly injectionsRT reduction > 10%3 monthsNR>305/320 µm6410371680.085.087.074.00.8660.866 ± 0.06
Roberts et al., 2020 [42]MixedProtocol T RegimenCorrelation (BCVA gain vs. Fluid resolution)Every 4 weeks up to 52 weeks65.3 ETDRSNR (Fluid Vol: 448.6 nL IRF)NANANANANRNRNRNRNRNR
Abbreviations: AUC = area under the receiver operating characteristic curve; BCVA = best-corrected visual acuity; CFT = central foveal thickness; CI = confidence interval; CMT = central macular thickness; CST = central subfield thickness; DME = diabetic macular edema; ETDRS = Early Treatment Diabetic Retinopathy Study; FN = false negative; FP = false positive; HE = hard exudates; IRF = intraretinal fluid; LogMAR = logarithm of the minimum angle of resolution; MAE = mean absolute error; NA = not applicable; NPV = negative predictive value; NR = not reported; PPV = positive predictive value; PRN = pro re nata (as needed); RDME = refractory diabetic macular edema; RT = retinal thickness; TN = true negative; TP = true positive; TREX = treat and extend; VA = visual acuity; VEGF = vascular endothelial growth factor; µm = micrometers.
Table 3. Pooled estimates and subgroup analyses.
Table 3. Pooled estimates and subgroup analyses.
Analysis CategorySubgroupStudies (n)Participants (N)Pooled Sensitivity (%)95% CIPooled Specificity (%)95% CIPositive LR95% CINegative LR95% CIDiagnostic OR95% CII2 (%)p-Value
OVERALL ESTIMATEAll Studies642786.482.1–90.177.672.8–82.03.862.95–5.070.180.13–0.2422.012.8–37.945.20.105
AI MODEL TYPEDeep Learning323081.875.9–86.976.870.1–82.73.532.48–5.020.240.17–0.3314.97.8–28.338.70.198
Machine Learning214290.785.2–94.680.473.1–86.44.623.01–7.080.120.07–0.1939.917.6–90.40.00.856
Hybrid DL155100.085.2–100.075.059.7–86.84.002.35–6.820.000.00–0.207.8-∞NANA
p-value for Subgroup Difference-------------0.012
INPUT DATA MODALITYOCT Only330884.579.3–88.979.674.1–84.44.152.98–5.770.190.14–0.2721.311.2–40.542.10.178
Multimodal28594.186.8–98.176.565.8–85.24.002.48–6.460.080.03–0.1852.015.2–178.00.00.742
OCT Radiomics13491.372.0–98.963.630.8–89.12.511.15–5.490.140.03–0.5918.42.1–159.8NANA
p-value for Subgroup Difference-------------0.224
FOLLOW-UP DURATION≤1 month17378.965.4–88.968.841.3–89.02.531.25–5.110.310.16–0.588.32.0–33.9NANA
1–3 months326987.382.4–91.479.674.1–84.44.283.08–5.950.160.11–0.2327.014.2–51.20.00.648
>3 months28594.186.8–98.176.565.8–85.24.002.48–6.460.080.03–0.1852.015.2–178.00.00.742
p-value for Subgroup Difference-------------0.045
HETEROGENEITY ASSESSMENTOverall Q Statistic--9.07-7.83---------
Overall I2--45.2%-36.1%---------
Overall p-value--0.105-0.166---------
PREDICTION INTERVALS95% Prediction Interval--72.8–94.3-65.2–86.7-2.1–7.1---5.8–83.4---
Abbreviations: AI = artificial intelligence; CI = confidence interval; DL = deep learning; I2 = I-squared statistic for heterogeneity; LR = likelihood ratio; NA = not applicable; OCT = optical coherence tomography; OR = odds ratio; p = probability value; ∞ = infinity (undefined when sensitivity = 100%).
Table 4. Heterogeneity assessment and meta-regression.
Table 4. Heterogeneity assessment and meta-regression.
Analysis ComponentParameterSensitivity95% CIp-ValueSpecificity95% CIp-Value
OVERALL HETEROGENEITYCochran’s Q statistic9.07-0.1057.83-0.166
Degrees of freedom5--5--
I2 statistic (%)45.20.0–77.6-36.10.0–72.4-
τ2 (between-study variance)0.094--0.078--
H2 statistic1.82--1.57--
META-REGRESSIONStudy Characteristics------
Sample size (continuous)β = 0.003−0.001 to 0.0070.128β = 0.002−0.002 to 0.0060.248
Publication year (continuous)β = −0.15−0.45 to 0.150.312β = −0.12−0.38 to 0.140.345
Geographic region--0.089--0.156
- North AmericaReference--Reference--
- Asiaβ = 0.18−0.08 to 0.44-β = 0.15−0.12 to 0.42-
- Multi-regionalβ = 0.12−0.22 to 0.46-β = 0.08−0.26 to 0.42-
Methodological Factors------
Risk of bias--0.045--0.067
- Low riskReference--Reference--
- Moderate riskβ = −0.22−0.48 to 0.04-β = −0.18−0.44 to 0.08-
- High riskβ = −0.34−0.67 to −0.01-β = −0.28−0.61 to 0.05-
External validation--0.192--0.298
- NoReference--Reference--
- Yesβ = 0.15−0.08 to 0.38-β = 0.12−0.11 to 0.35-
Clinical Factors------
Disease prevalence (%)β = −0.008−0.021 to 0.0050.234β = −0.006−0.018 to 0.0060.298
Follow-up duration--0.067--0.134
- ≤ 1 monthReference--Reference--
- 1–3 monthsβ = 0.24−0.02 to 0.50-β = 0.19−0.07 to 0.45-
- > 3 monthsβ = 0.310.01 to 0.61-β = 0.22−0.08 to 0.52-
Technical Factors------
AI model complexity--0.156--0.089
- ModerateReference--Reference--
- Highβ = −0.16−0.42 to 0.10-β = −0.14−0.38 to 0.10-
Input data modality--0.224 0.145
- OCT onlyReference--Reference--
- Multimodalβ = 0.28−0.03 to 0.59-β = 0.18−0.13 to 0.49-
- Radiomicsβ = 0.22−0.15 to 0.59-β = −0.24−0.61 to 0.13-
EXPLAINED HETEROGENEITYR2 from meta-regression (%)78.4--65.2--
Residual I2 after regression (%)9.8--12.5--
PUBLICATION BIAS ASSESSMENTEgger’s regression test------
- Intercept1.24−0.87 to 3.350.2340.96−1.12 to 3.040.345
- Slope−0.18−0.52 to 0.16-−0.14−0.48 to 0.20
Begg’s rank correlationρ = 0.20-0.624ρ = 0.33-0.467
Peters’ test (modified Egger’s)--0.298--0.378
SENSITIVITY ANALYSESExcluding high risk of bias studies89.2%84.6–92.8-78.9%73.1–84.0-
Fixed-effects model86.1%82.9–88.9-77.8%74.2–81.2-
Leave-one-out analysis range84.2–88.7%--75.1–80.4%--
Trim-and-fill adjustment85.8%81.2–89.6-77.2%71.8–82.1-
Abbreviations: AI = artificial intelligence; β = regression coefficient; CI = confidence interval; H2 = H-squared statistic; I2 = I-squared statistic for heterogeneity; OCT = optical coherence tomography; ρ = Spearman correlation coefficient; R2 = proportion of variance explained; τ2 = tau-squared (between-study variance).
Table 5. Comparative effectiveness between AI and other methods.
Table 5. Comparative effectiveness between AI and other methods.
Study NameComparison TypeSample SizeAI MethodAI Performance (Sens/Spec/AUC)Control MethodControl Performance (Sens/Spec/AUC)Effect Size (AUC Diff (95% CI)Statistical Significance (p-Value)Clinical Context
HUMAN READERS vs. ARTIFICIAL INTELLIGENCE
Cao et al. 2020 [40]AI vs. Ophthalmologists108 imagesRandom Forest90.0%/85.1%/0.9232 Ophthalmologists76.3%a/76.9%a/NRNRp = 0.034CMT reduction > 50 µm prediction
Alryalat et al. 2022 [36]AI vs. Multi-level Readers101 patientsEfficientNet-B3 (U-Net)80.9%/84.0%/0.811Junior Residents34.0%/NR/NRNRp = 0.012CMT reduction > 25% or 50 µm
Retina Specialists86.3%/NR/NR---
Mean (All Readers)60.2%/NR/NR---
SUMMARY—Human vs. AI-209 subjects-85.4%/84.5%/0.867-68.2%/76.9%/NRΔ +17.2%/+7.6%100% favor AIConsistent AI superiority
ALGORITHMIC METHODS vs. ARTIFICIAL INTELLIGENCE
Magrath et al., 2025 [28]AI vs. Traditional Imaging73 eyesCNN (VGG16)78.9%/68.8%/0.810Baseline CST ClassifierNR/NR/0.590+0.220 (0.181–0.259)p = 0.008CST reduction > 10 µm prediction
Song et al., 2025 [29]ResNet50 vs. ViT72 eyesResNet50-based DLNR/NR/0.9998Vision TransformerNR/NR/0.9898+0.010 (−0.029–0.049)p = 0.045CST decrease/VA improvement
Meng et al., 2024 [34]BPNN vs. Other ML82 patientsBPNN91.3%/92.6%/0.982SVM82.6%/63.6%/0.885+0.097 (0.058–0.136)p = 0.028Persistent vs. Non-persistent DME
Rasti et al., 2020 [41]CADNet vs. VGG16127 subjectsCADNet CNN80.1%/85.0%/0.866VGG16 CNNNR/NR/0.846+0.020 (−0.019–0.059)p = 0.234RT reduction > 10%
Liu et al., 2021 [39]Hybrid vs. Pure DL363 eyesEnsemble (DL + CML)NR/NR/0.940Ensemble DL onlyNR/NR/0.810+0.130 (0.091–0.169)p = 0.015CMT red. > 50 µm/VA gain > 0.1 LogMAR
Mondal et al., 2025 [16]AI-Enhanced vs. Standard181 patientsHybrid DL + Laser100.0%/75.0%/0.890Laser therapy onlyNR/NR/NRNRp = 0.003BCVA gain ≥5 letters & CMT red. >50 µm
SUMMARY—Algorithmic-898 subjects-88.8%/80.4%/0.915-82.6%/63.6%/0.826Δ +6.2%/+16.8%83.3% favor AIProposed methods superior
OVERALL COMPARATIVE EFFECTIVENESS
Total Evidence Base8 studies1107 subjectsVarious AI Approaches87.1%/82.4%/0.891Various Control Methods75.4%/70.3%/0.826Mean Δ +0.08987.5% favor AIConsistent AI advantage
UTILITY ASSESSMENT---------
Cost-Effectiveness6/8 studies report-Reduced injection frequency-Standard protocols-Cost savings: 15–30%-Resource optimization
Implementation Feasibility5/8 studies assess-Automated analysis-Manual assessment-Time savings: 40–60%-Workflow integration
GeneralizabilityExternal validation in 5/8-Robust across populations-Variable performance-Consistent accuracy-Multi-center applicability
Decision Impact7/8 studies evaluate-Enhanced precision-Standard care-Improved outcomes-Treatment optimization
SUPERIORITY ANALYSIS
Statistically Significant Superiority7/8 studies (87.5%)------p < 0.05Clear evidence of benefit
Clinically Meaningful Difference6/8 studies (75.0%)-AUC improvement ≥ 0.05---Δ AUC = 0.089-Substantial clinical impact
Consistent Direction of Effect8/8 studies (100%)-All favor AI or neutral---No studies favor control-Robust evidence
Effect Size Categories:---------
- Large effect (AUC Δ ≥ 0.10)3/6 studies (50%)-----Range: 0.097–0.220-Major improvement
- Moderate effect (AUC Δ 0.05–0.10)2/6 studies (33%)-----Range: 0.058–0.089-Meaningful improvement
- Small effect (AUC Δ < 0.05)1/6 studies (17%)-----AUC Δ = 0.020-Marginal improvement
Abbreviations: AI = artificial intelligence; AUC = area under the receiver operating characteristic curve; BCVA = best-corrected visual acuity; BPNN = back-propagation neural network; CI = confidence interval; CML = conventional machine learning; CMT = central macular thickness; CNN = convolutional neural network; CST = central subfield thickness; Δ = delta (difference); DL = deep learning; DME = diabetic macular edema; LogMAR = logarithm of the minimum angle of resolution; ML = machine learning; NR = not reported; red. = reduction; RT = retinal thickness; Sens = sensitivity; Spec = specificity; SVM = support vector machine; VA = visual acuity; ViT = Vision Transformer. a = Mean of multiple readers.
Table 6. Sensitivity analysis and publication bias assessment.
Table 6. Sensitivity analysis and publication bias assessment.
Analysis TypeSubset DescriptionStudies (n)Participants (N)Pooled Sensitivity (%)95% CIPooled Specificity (%)95% CIImpact Assessmentp-Value
BASELINE ANALYSIS
Primary meta-analysisAll included studies642786.482.2–90.677.671.4–83.9Reference standard
LEAVE-ONE-OUT ANALYSIS
Excluding Magrath et al., 2025 [28]Remove S04 (High risk bias)535488.584.1–92.978.672.1–85.1Improved estimates0.342
Excluding Mondal et al., 2025 [16]Remove S08 (RCT, Low risk)537285.080.5–89.678.371.4–85.1Minimal impact0.456
Excluding Baek et al., 2024 [31]Remove S15 (Small sample)539786.682.3–90.877.570.8–84.1Stable estimates0.789
Excluding Meng et al., 2024 [34]Remove S17 (Radiomics)539385.981.4–90.478.672.2–85.0Stable estimates0.623
Excluding Cao et al., 2020 [40]Remove S06 (Largest sample)531985.180.0–90.175.267.6–82.8Slight decrease0.267
Excluding Rasti et al., 2020 [41]Remove S10 (No external validation)530087.682.7–92.477.269.8–84.6Stable estimates0.445
Leave-one-out rangeStability assessment5300–39785.0–88.575.2–78.6Significant estimates
STUDY QUALITY ASSESSMENT
Excluding high-risk biasLow + Moderate risk only535488.584.1–92.978.672.1–85.1Improved performance0.178
Low risk of bias onlyRCT with low bias155100.0100.0–100.075.060.0–90.0Excellent sensitivity0.012
Moderate risk of bias onlyObservational studies429986.281.2–91.279.171.8–86.4Consistent performance0.245
METHODOLOGICAL SIGNIFICANCE
External validation studiesValidated on independent data422791.786.7–96.678.570.7–86.3Superior performance0.034
No external validationInternal validation only220081.875.3–88.276.265.7–86.7Lower performance0.089
Cross-validation reportedSignificant internal validation537286.882.1–91.477.970.9–84.9Stable estimates0.567
SAMPLE SIZE EFFECTS
Large studies (≥70 subjects)Adequate statistical power330884.579.5–89.579.672.0–87.2Conservative estimates0.234
Small studies (<70 subjects)Limited statistical power311993.086.4–99.674.263.3–85.1Optimistic estimates0.045
Very small studies (<50)Possible overestimation26493.584.2–100.072.757.2–88.2Inflated performance0.023
TEMPORAL TRENDS
Recent studies (2024–2025)Modern AI methods419286.079.6–92.373.163.2–82.9Current performance0.456
Older studies (2020–2022)Earlier AI methods223586.781.1–92.381.573.6–89.5Historical performance0.678
MODEL COMPARISON
Fixed-effects modelAssumes homogeneity642786.182.9–89.377.874.2–81.4Similar to random-effects0.234
Random-effects modelAccounts for heterogeneity642786.482.2–90.677.671.4–83.9Primary analysis
PUBLICATION BIAS ASSESSMENT
Egger’s regression test
- Intercept (bias indicator)2.630Significant bias0.045
- Slope (precision effect)−0.278Funnel plot asymmetry
Begg’s rank correlation-
- Kendall’s τ0.200No significant bias0.280
Peters’ testModified Egger’s for DTANo significant bias0.156
Failsafe N analysis
- Studies needed to nullify15 studiesSignificant evidence
- Current evidence strengthStrongResults unlikely to change
Trim-and-fill adjustment
- Imputed missing studies2 studiesMinimal impact expected
- Adjusted sensitivity 85.180.8–89.4Small reduction
- Adjusted specificity76.870.2–83.4Minimal change
OVERALL SIGNIFICANCE ASSESSMENT
Primary estimate stabilityLeave-one-out variance64273.5% range3.4% rangeHighly stable
Quality-adjusted estimateExcluding high-risk studies535488.584.1–92.978.672.1–85.1Significant evidence
Publication bias impactTrim-and-fill adjustment6 + 242785.180.8–89.476.870.2–83.4Minimal bias effect
Final recommendationBest available evidence5–6354–42786.4–88.582.2–92.977.6–78.671.4–85.1High confidence
Abbreviations: CI = confidence interval; DTA = diagnostic test accuracy; N = sample size; RCT = randomized controlled trial; τ = tau (Kendall’s correlation coefficient).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Harbi, F.A.; Alkuwaiti, M.A.; Alharbi, M.A.; Alessa, A.A.; Alhassan, A.A.; Aleidan, E.A.; Al-Theyab, F.Y.; Alfalah, M.; AlHaddad, S.M.; Azzam, A.Y. Diagnostic Accuracy of Artificial Intelligence in Predicting Anti-VEGF Treatment Response in Diabetic Macular Edema: A Systematic Review and Meta-Analysis. J. Clin. Med. 2025, 14, 8177. https://doi.org/10.3390/jcm14228177

AMA Style

Al-Harbi FA, Alkuwaiti MA, Alharbi MA, Alessa AA, Alhassan AA, Aleidan EA, Al-Theyab FY, Alfalah M, AlHaddad SM, Azzam AY. Diagnostic Accuracy of Artificial Intelligence in Predicting Anti-VEGF Treatment Response in Diabetic Macular Edema: A Systematic Review and Meta-Analysis. Journal of Clinical Medicine. 2025; 14(22):8177. https://doi.org/10.3390/jcm14228177

Chicago/Turabian Style

Al-Harbi, Faisal A., Mohanad A. Alkuwaiti, Meshari A. Alharbi, Ahmed A. Alessa, Ajwan A. Alhassan, Elan A. Aleidan, Fatimah Y. Al-Theyab, Mohammed Alfalah, Sajjad M. AlHaddad, and Ahmed Y. Azzam. 2025. "Diagnostic Accuracy of Artificial Intelligence in Predicting Anti-VEGF Treatment Response in Diabetic Macular Edema: A Systematic Review and Meta-Analysis" Journal of Clinical Medicine 14, no. 22: 8177. https://doi.org/10.3390/jcm14228177

APA Style

Al-Harbi, F. A., Alkuwaiti, M. A., Alharbi, M. A., Alessa, A. A., Alhassan, A. A., Aleidan, E. A., Al-Theyab, F. Y., Alfalah, M., AlHaddad, S. M., & Azzam, A. Y. (2025). Diagnostic Accuracy of Artificial Intelligence in Predicting Anti-VEGF Treatment Response in Diabetic Macular Edema: A Systematic Review and Meta-Analysis. Journal of Clinical Medicine, 14(22), 8177. https://doi.org/10.3390/jcm14228177

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop