Deep Learning for the Diagnosis of Esophageal Cancer in Endoscopic Images: A Systematic Review and Meta-Analysis

Simple Summary Esophageal cancer is the seventh leading cause of cancer-related mortality worldwide, with a 5-year survival rate of around 20%. Recently, deep learning (DL) models have shown great performance in image-based esophageal cancer diagnosis and prognosis prediction. In this study, a comprehensive literature search was conducted on studies published between 1 January 2012 and 1 August 2022 from the most popular databases, namely, PubMed, Embase, Scopus, and Web of Science. This study, thus, systematically summarizes the application of a DL model for esophageal cancer diagnosis and discusses the potential limitations and future directions of DL techniques in esophageal cancer therapy. Abstract Esophageal cancer, one of the most common cancers with a poor prognosis, is the sixth leading cause of cancer-related mortality worldwide. Early and accurate diagnosis of esophageal cancer, thus, plays a vital role in choosing the appropriate treatment plan for patients and increasing their survival rate. However, an accurate diagnosis of esophageal cancer requires substantial expertise and experience. Nowadays, the deep learning (DL) model for the diagnosis of esophageal cancer has shown promising performance. Therefore, we conducted an updated meta-analysis to determine the diagnostic accuracy of the DL model for the diagnosis of esophageal cancer. A search of PubMed, EMBASE, Scopus, and Web of Science, between 1 January 2012 and 1 August 2022, was conducted to identify potential studies evaluating the diagnostic performance of the DL model for esophageal cancer using endoscopic images. The study was performed in accordance with PRISMA guidelines. Two reviewers independently assessed potential studies for inclusion and extracted data from retrieved studies. Methodological quality was assessed by using the QUADAS-2 guidelines. The pooled accuracy, sensitivity, specificity, positive and negative predictive value, and the area under the receiver operating curve (AUROC) were calculated using a random effect model. A total of 28 potential studies involving a total of 703,006 images were included. The pooled accuracy, sensitivity, specificity, and positive and negative predictive value of DL for the diagnosis of esophageal cancer were 92.90%, 93.80%, 91.73%, 93.62%, and 91.97%, respectively. The pooled AUROC of DL for the diagnosis of esophageal cancer was 0.96. Furthermore, there was no publication bias among the studies. The findings of our study show that the DL model has great potential to accurately and quickly diagnose esophageal cancer. However, most studies developed their model using endoscopic data from the Asian population. Therefore, we recommend further validation through studies of other populations as well.


Introduction
Rationale: Esophageal cancer is one of the most commonly diagnosed adenocarcinomas globally, with an estimated 0.5 million new cases annually [1]. The prognosis of esophageal cancer is poor and is the sixth leading cause of cancer-related mortality worldwide, with over 0.5 million deaths annually [2]. The 5-year overall survival rate of early EC is only 20% [3]; however, the survival rate depends on several factors, including the stages of esophageal cancer. The 5-year survival rate of localized (confined to the primary site) esophageal cancer is 46.4%, whereas the relative survival of distant esophageal cancer (spread to lymph nodes) is 5% [4]. The two primary histologic subtypes of esophageal cancer are esophageal squamous cell carcinoma (ESCC) and esophageal adenocarcinoma (EAC), which contribute approximately 90 percent of total esophageal cancer [5]. Although the prevalence of ESCC is always high, recent years have witnessed an increasing trend of EAC in the United States of America (USA) and other Western countries [6,7]. Previous studies reported that patients with EAC have a better overall median survival than ESCC, particularly in early stage disease [8,9]. The risk factors of ESCC include smoking, alcohol, dietary, and male gender, whereas gastro-esophageal reflux disease (GERD) and obesity are two major risk factors for EAC [10,11]. Barrett's esophagus (BE) is also an established premalignant lesion [12,13], which increases the risk of EAC up to 40-fold [14].
An early and accurate diagnosis of esophageal cancer is essential in determining the appropriate management of esophageal cancer patients and improving their overall survival rate [15]. Esophageal cancer is often detected in the advanced stage, which requires highly invasive treatments such as surgical resection and chemoradiotherapy [16,17]. The early detection of esophageal cancer through widely used screening programs has shown its effectiveness in reducing esophageal cancer-related mortality and improving the overall survival rate. The introduction of image-enhanced endoscopies, such as narrow-band imaging (NBI) and white-light imaging (WLI), has improved the early detection rate of esophageal cancer [18][19][20]. However, esophageal cancer detection is always challenging and depends on substantial expertise and experience [21]. Recently, DL models, especially the convolutional neural network (CNN) model, have performed remarkably well in various medical fields, including esophageal cancer diagnosis and prognosis [22].
Goal: Notwithstanding the growing interest in and opportunities for the application of the DL model for the diagnosis of esophageal cancer using endoscopic images, previous studies have not comprehensively reviewed the extant literature reporting on the application of the DL model to diagnose esophageal cancer. If the DL algorithm might be considered in the future for use in a real-world clinical setting, the performance of this algorithm should first undergo the same degree of assessment as current practices to evaluate their acceptability. Therefore, in the present study, we aimed to conduct a systematic review and meta-analysis from studies that applied the DL to determine the diagnostic performance for esophageal cancer.

Research Design
This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) [23]. The study has not been registered.

Search Methods for Identification of Studies Electronic Database Search
A search of PubMed, EMBASE, Scopus, and Web of Science, between 1 January 2012 and 1 August 2022, was conducted to identify potential studies evaluating the diagnostic performance of the DL model for the diagnosis of esophageal cancer using endoscopic images, with the assistance of experts in systematic reviews and meta-analysis. We used appropriate MeSH (Medical Subject Headings) terms as given below: "Deep learning" OR "computer-aided system" OR "convolutional neural network/s" AND "esophageal cancer" OR "esophageal neoplasm" OR "esophageal adenocarcinoma" OR "Barrett's esophagus".

Inclusion and Exclusion Criteria
Studies were included if they satisfied all the following criteria: (a) studies evaluated the diagnostic test accuracy of DL for esophageal cancer using endoscopic images, (b) studies provided sensitivity, specificity, and accuracy, or studies provided adequate information to calculate these data, (c) a prospective or retrospective study design, (d) provided appropriate information regarding inclusion and exclusion criteria, and (e) studies were published in English. However, studies were excluded if any of the following criteria were met: (a) studies published in the form of review, letter, or case report, (b) studies used the same database (we only included recent studies), and (c) studies did not provide any information regarding the number of patients or images. Two authors (M.M.I and T.N.P.) independently assessed the eligibility criteria of all the retrieved studies.

Data Extraction
The same two authors read all the selected studies carefully and extracted the following information using a standardized form: (1) study characteristics (authors, country of origin, year of publication, total number of endoscopic images, and study design), (2) demographic characteristics (gender and number of patients), (3) model characteristics (algorithms, data partition, and model description), (4) results (sensitivity, specificity, area under receiver operating characteristic curve, positive predictive value, and negative predictive value).

Quality Assessment
The same two authors also evaluated the methodological quality of the selected studies using established questionnaires and criteria established in the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) [24]. This tool is widely accepted for assessing the risk of bias and the applicability of diagnostic studies. QUADAS-2 comprises four main domains: (a) patient selection, (b) index test, (c) reference standard, and (d) flow and timing.

Statistical Analysis
The diagnostic performance of DL for detecting esophageal cancer was the primary outcome of our meta-analysis. We calculated the pooled sensitivity and specificity with 95% confidence intervals (CIs) using the bivariate random effects model [25][26][27]. However, the random effects model by DerSimonian and Laird was used to calculate the independent proportions and their differences [28]. A summary receiver operating characteristic (SROC) curve with a 95% confidence region was plotted to visualize the study findings. We also determined the heterogeneity of the study's findings using the inconsistency index (I 2 ) as follows: 0% to 25%, might not be low; 25% to 50%, considered as low; 50% to 75%, medium heterogeneity; and 75% to 100%, considerable heterogeneity [29,30]. We also calculated the positive likelihood ratio, negative likelihood ratio, and diagnostic odd ratio (Supplementary Materials S1) [21]. p values < 0.05 were considered statistically significant. R (R Core Team and the R Foundation for Statistical Computing, version: 4.2.1) and MedCalc (MedCalc Software Ltd, Ostend, Belgium) were used to perform all statistical analyses.

Study Selection
A total of 2491 studies were retrieved during the initial search, after which 1375 duplicate studies were excluded. From 1116 nonduplicate studies, 1088 were excluded after reviewing the titles and abstracts. However, 39 studies underwent full-text evaluation. Afterward, eleven studies were excluded due to review, not using the DL model, and insufficient data. The remaining 28 studies were selected for inclusion in the metaanalysis [22, (Supplementary Mayerials Figure S1).

Study Characteristics
All of the studies used in this systematic review and meta-analysis are presented in Table 1. The selected studies were published between 2019 [57] and 2022 [31] with a total of 703,006 endoscopic images, and each study followed the standard protocol to detect esophagus cancer. The included images were captured by white-light imaging (WLI), narrow-band image (NBI), or blue light image (BLI). Expert endoscopists checked the appropriateness of images and marked images as cancerous and normal. Most of the studies were retrospective study designs, except for three studies. However, all studies used the CNN model to train and validate their findings. Twenty-three studies were conducted on Asian populations, while only five studies were conducted on Western populations. Esophageal squamous cell cancer (ESCC) was the primary target for eighteen studies, while six studies reported Barret's esophagus (BE) and four studies showed esophageal adenocarcinoma (EAC), including ESCC as the outcome. Nine studies developed a DL model using WLI, while six studies included NBI, and eleven studies utilized both WLI and NBI. Moreover, one study trained a DL model for the diagnosis of esophageal cancer using BLI, while another study trained their DLF model using volumetric laser endomicroscopy (VLE) [53] and endocytoscopic system image (ECS) [52].

Performance Comparison between DL and Endoscopists
Ten studies also compared the diagnostic performance of the DL model with that of endoscopists. The performance measures are presented in Figure 4. The sensitivity of the DL model for the diagnosis of early esophageal cancer exceeded those of endoscopists (92.87% vs. 80.43%). A further analysis of specificity showed that the DL model had a similar performance when compared with that of the endoscopists (74.37% vs. 76.11%).

Publication Bias
Deeks' funnel plot of the asymmetry test showed no evidence of publication bias (p = 0.15) ( Figure 5).

Discussion
We conducted a meta-analysis to evaluate the performance of the DL model for the diagnosis of esophageal cancer using data from 28 studies. The overall pooled estimation showed that the DL performance for the diagnosis of esophageal cancer performed better in terms of sensitivity, specificity, and AUROC. To determine the generalizability of the DL model to external settings, the DL model also showed a better performance when used with different datasets. The findings of our study suggest that the accurate diagnosis performance of DL may help a physician to increase the early diagnosis of esophageal cancer and decrease mortality. Because the identification of early esophageal cancer has been a subject of concern and solely depends on expert endoscopists, the higher sensitivity and specificity of DL could correctly and accurately detect esophageal cancer in patients, provide supportive treatment, and improve patient outcomes.
Four previously published studies also assessed the impact of AI-assisted models for detecting esophageal cancer using endoscopic images [58][59][60][61]. Zhang et al. [58] included sixteen studies to provide scientific evidence for using AI-assisted models to detect esophageal neoplasm. The pooled sensitivity and specificity and AUROC of AI-assisted models for esophageal cancer detection were 0.94 (95% CI: 0.92-0.96), 0.85 (95% CI: 0.73-0.92), and 0.97 (95% CI: 0.95-0.98), respectively. They also reported that the performance of AI-based models was better than endoscopists in terms of the pooled sensitivity 0.94 [95% CI: 0.84-0.98] vs 0.82 [95% CI: 0.77-0.86]. Lui et al. [59] conducted a systematic review and meta-analysis to evaluate the diagnostic accuracy of AI models for gastric, esophageal neoplastic lesions, and Helicobacter pylori status. A total of 23 studies were included in that study; however, only 10 studies were used to evaluate early esophageal cancer detection. The pooled sensitivity, specificity, and AUROC on the detection of squamous esophagus neoplasm were 0.75 (95% CI: 0.48-0.92), 0.92 (95% CI: 0.66-0.99), and 0.88 (95% CI: 0.82-0.96), respectively. Bang et al. [60] included 21 studies in the systematic review. Among them, 19 studies were included in the meta-analysis to evaluate the diagnostic test accuracy of the deep learning or machine learning model of esophageal cancers. The pooled sensitivity, specificity, and AUROC of DL algorithms for the diagnosis of esophageal cancer were 0.94 (95% CI: 0.89-0.96), 0.88 (95% CI: 0.76-0.94), and 0.97 (95% CI: 0.95-0.99), respectively. Mohan et al. [61] performed a meta-analysis to examine the pooled performance rates for CNN-based AI in diagnosing gastrointestinal neoplasia from endoscopic images. Nineteen studies met all inclusion criteria for detecting gastrointestinal neoplasm; however, only five studies were used to evaluate the impact of CNN in diagnosing esophageal cancer. The pooled sensitivity, specificity, and accuracy of the CNN model for diagnosis of esophageal cancer were 0.87 (95% CI: 0.69-0.95), 0.87 (95% CI: 0.74-0.94), and 0.87 (95% CI: 0.76-0.93), respectively. This study included a higher number of studies to summarize the available evidence for the accuracy of the DL algorithm in diagnosing esophageal cancer. Moreover, this study showed not only sensitivity and specificity but accuracy and positive and negative predictive value, which are essential metrics for making clinical decisions. The findings of our study showed that the DL model could play a crucial role in diagnosing esophageal cancer in the near future when this algorithm might be employed in a busy daily clinical practice.
The detection rate of esophageal cancer is relatively poor (more than 40% percent of patients with esophageal cancer are detected at a late stage), and the 5-year survival rate is approximately 20% [62]; therefore, the early diagnosis of esophageal cancer is important for both clinicians and patients. The early diagnosis of esophageal cancer could assist clinicians in decision-making, improve patients' management, and reduce healthcare costs. Endoscopy is one of the reliable methods for early esophageal cancer screening because of its high diagnostic performance [63]. Diagnostic accuracy depends on several factors such as the quality of images, instruments, and endoscopists. Traditional statistical models could hardly perform analyzing endoscopic images for the diagnosis of esophageal cancer; however, the DL model has been applied and has shown great performance in esophageal cancer diagnosis. To the best of our knowledge, this is the first comprehensive systematic review and meta-analysis of the diagnostic performance of the DL-based AI model for the diagnosis of esophageal cancer. The findings of this study demonstrate conclusively that the diagnostic performance of the DL model for the diagnosis of esophageal cancer using endoscopic images was clinically highly satisfactory in terms of sensitivity, specificity, and AUROC. For esophageal cancer diagnosis, DL achieved a pooled AUROC of 0.96 with a pooled sensitivity of 94% and a specificity of 92%. In terms of the generalizability of the DL model, the DL model also achieved good performance in the external validation.
The early detection of esophageal cancer offers a higher chance of survival [64]. Upper gastrointestinal endoscopy is still considered the standard method for the diagnosis of esophageal cancer [65,66]. Previous evidence shows that endoscopic techniques have made significant progress over the last decades [67][68][69]. WLI is widely used and the most basic endoscopy diagnostic modality to diagnose esophageal cancer [70]. However, the application of WLI for the diagnosis of esophageal cancer is limited because the DL model could not perform well. Therefore, the application of NBI and VLI has been increased for the early diagnosis of esophageal cancer. Consequently, researchers are now using both modalities together to develop DL models for identifying early esophageal cancer. The findings of our study show that the pooled sensitivity and specificity of the DL model, which used NBI images, had significantly better performance than those using WLI images.
There are several strengths and limitations associated with this meta-analysis. First, this is the first comprehensive study that summarized the performance of the DL model for the diagnosis of esophageal cancer using endoscopic images. Second, this study provided clinically important diagnosis metrics such as accuracy, positive predictive value, and negative predictive value, which may help a physician to make the appropriate decision for the patient with a high risk of esophageal cancer. Nevertheless, this study has some limitations that need to be addressed. First, most of the included studies were retrospective study designs. Although external validation and prospective evaluation showed great performance, more studies are needed that could evaluate the performance of the DL model using prospectively collected data or real-time evaluation. Second, two-thirds of the studies developed their model and tested the performance using the same continent's population (namely, Asia); therefore, more data are warranted from other continents to validate the performance of the current model. Finally, heterogeneity among the studies was high, although it can be partially explained by regional effect, image quality, image modality, and histological types.

Conclusions
This is the first comprehensive systematic review and meta-analysis to show that the DL model was able to diagnose early esophageal cancer with high sensitivity and specificity. The performance of the DL model for esophageal cancer diagnosis was higher for NBI than for WLI. Most of the studies were from Asia and used a retrospective design; therefore, more prospective evaluations with various populations are warranted in the future.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/cancers14235996/s1, Figure S1: PRISMA flowchart for identifying and selecting the study; Table S1: Evaluation metrics. Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.