Artificial Intelligence in Gastric Cancer: Identifying Gastric Cancer Using Endoscopic Images with Convolutional Neural Network

Simple Summary Gastric cancer (GC) is one of the most newly diagnosed cancers and the fifth leading cause of death globally. Previous studies reported that the detection rate of gastric cancer (EGC) at an earlier stage is low, and the overall false-negative rate with esophagogastroduodenoscopy (EGD) is up to 25.8%, which often leads to inappropriate treatment. Accurate diagnosis of EGC can reduce unnecessary interventions and benefits treatment planning. Convolutional neural network (CNN) models have recently shown promising performance in analyzing medical images, including endoscopy. This study shows that an automated tool based on the CNN model could improve EGC diagnosis and treatment decision. Abstract Gastric cancer (GC) is one of the most newly diagnosed cancers and the fifth leading cause of death globally. Identification of early gastric cancer (EGC) can ensure quick treatment and reduce significant mortality. Therefore, we aimed to conduct a systematic review with a meta-analysis of current literature to evaluate the performance of the CNN model in detecting EGC. We conducted a systematic search in the online databases (e.g., PubMed, Embase, and Web of Science) for all relevant original studies on the subject of CNN in EGC published between 1 January 2010, and 26 March 2021. The Quality Assessment of Diagnostic Accuracy Studies-2 was used to assess the risk of bias. Pooled sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, and diagnostic odds ratio were calculated. Moreover, a summary receiver operating characteristic curve (SROC) was plotted. Of the 171 studies retrieved, 15 studies met inclusion criteria. The application of the CNN model in the diagnosis of EGC achieved a SROC of 0.95, with corresponding sensitivity of 0.89 (0.88–0.89), and specificity of 0.89 (0.89–0.90). Pooled sensitivity and specificity for experts endoscopists were 0.77 (0.76–0.78), and 0.92 (0.91–0.93), respectively. However, the overall SROC for the CNN model and expert endoscopists was 0.95 and 0.90. The findings of this comprehensive study show that CNN model exhibited comparable performance to endoscopists in the diagnosis of EGC using digital endoscopy images. Given its scalability, the CNN model could enhance the performance of endoscopists to correctly stratify EGC patients and reduce work load.


Introduction
Gastric cancer (GC) is the fifth most commonly diagnosed cancer and the third leading cause of death worldwide [1]. The overall incidence and global burden of GC are rapidly growing, especially in East Asian countries, such as Japan and Korea [2]. The majority of patients remain asymptomatic, and more than 80% of patients are diagnosed with GC at an advanced stage [3]. The five-year overall survival rate of GC patients at pathological stage IA is higher than 90%, where it is below 20% in stage IV [4,5]. Therefore, timely identification and referral to gastroenterologists could significantly reduce mortality and disease complications. A recent study also suggests that stratification of GC at an early stage can be clinically efficacious; although, it is quite challenging and often overlooked [6].
Importantly, previous studies showed that the detection rate of early gastric cancer (EGC) is low [7,8], and the overall false-negative rate is up to 25.8% [9][10][11][12]. Endoscopy is now a widely used technique for distinguishing between EGC and other gastric diseases (e.g., Helicobacter pylori and gastritis) [13]. Several reliable imaging modalities, namely, white light imaging (WLI) or narrow-band imaging (NBI) combined with magnifying endoscopy, have been used to clearly visualize and stratify gastric abnormalities such as cancers [14][15][16] and intestinal metaplasia [17]. A meta-analysis of 22 studies reported that the rate of missed GC when using endoscopy is only 9.4% [18]. However, grading of endoscopic images is always subjective, time-consuming, and labor intensive, and the performance varies among endoscopists, especially novices [19]. Automated grading of EGC would have enormous clinical benefits, such as increasing efficiency, accessibility, coverage, and productivity of existing resources.
Artificial intelligence (AI) has gained tremendous global attention over the last decade in various healthcare domains, including gastroenterology. AI models have shown robust performance in the diagnosis of gastroesophageal reflux disease [20] and the prediction of colorectal [21] and esophageal squamous cell carcinoma [22]. AI is a broader notion, which includes machine learning (ML) and deep learning (DL) (Figure 1). AI illustrates an innovative computerized technique to perform complex tasks that normally require "human judgement/cognition". ML is a special branch of AI that allows a computer to become more accurate in predicting, identifying, and stratifying tasks without using explicit computer programing. ML algorithms have several potential limitations to perform tasks; primarily image recognition. However, DL, a subset of ML, has revolutionized the world and become the de-facto standard for recognizing medical images. Recently, CNN has been applied to detect EGC using endoscopic images, helping physicians to reduce a mistaken diagnosis and improve effective clinical decisions. The primary benefits of the CNN model in gastroenterology can be to promote earlier detection, more accurate diagnosis, and ensure a more timely treatment. Developing a CNN-based automated system could detect EGC faster than endoscopists, and result in positive effects on clinical workflow and quality for patients care. However, the overall clinical applicability and reliability of the CNN model for EGC are still debated due to a lack of external validation and comparison to the performance of endoscopists. To our knowledge, there is no study that summarizes the effectiveness of the recent evidence. Therefore, the aims of this meta-analysis were to critically review the relevant articles of the CNN model for the diagnosis of EGC, evaluate the diagnostic performance in comparison with that of endoscopists, analyze the methodological quality, and explore the applicability of the CNN model in real-world clinical settings.

Study Protocol
We conducted a meta-analysis of studies about the diagnostic test accuracy (DTA). The methodological standards outlined for this study is based on the Handbook for DTA Reviews of Cochrane and the Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies (i.e., PRISMA-DTA), which was used to report our study findings [23].

Electronic Databases Search
We conducted a systematic search of electronic databases such as PubMed, Embase, Scopus, and Web of Science to identify all eligible articles published between January 1, 2010, and March 1, 2021. The following keywords were used: (1) "Deep learning" OR "Convolutional neural network" OR "CNN" OR "Artificial intelligence" OR "Automated technique", (2) "Early gastric cancer", (3) 1 AND 2. The reference list of potential articles was screened for other relevant studies.

Eligibility Criteria
We considered all studies on the diagnostic accuracy of the CNN model for detecting EGC in any setting. These original research studies were included if they were published in English, and research designs were prospective, retrospective, or secondary analyses of randomized controlled trial. We excluded studies if they were published as reviews, letters to editors, or short reports. We also excluded studies reported in invasion of GC and with a lack of DTA, namely sensitivity, and specificity. Two authors (M.M.I., T.N.P.) independently reviewed each study for eligibility and data extraction. Any disagreement during the study screening was resolved through discussion between the main investigators.

Data Extraction
The same two authors extracted the following data: (a) study characteristics (author first and last name, publication year, country, study design, sample size, total number of endoscopy image, and clinical settings), (b) patient characteristics (inclusion and exclusion criteria, demographic criteria), (c) index test (methods, performer of endoscopy), (d) reference standard (image modality, guidelines), and (e) diagnostic accuracy parameters (accuracy, sensitivity, specificity, and the area under the receiver operating curve).

Quality Assessment and Risk of Bias
The Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool was used to assess the risk of bias of the included studies [24]. The QUDAS-2 tool contains two domains, namely risk of bias (patient selection, index test, reference standard, and flow and timing) and applicability concerns (patients' selection, index test, and reference standard). The risk of bias was categorized into three groups, namely low, uncertain, and high.

Statistical Analysis
We followed the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy methodology guidelines to conduct all statistical analyses. The pooled sensitivity and specificity with the corresponding 95% confidence intervals (CIs) were calculated using a random-effect model. Moreover, the summary receiver operating characteristic curve (SROC) was computed by bivariate analysis. In our study, we also calculated the positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, and diagnostic odd ratio. The value of the SROC curve was considered to be excellent (SROC: ≥90), good (SROC: 80-89), fair (SROC: 0.70-0.79), poor (SROC: 0.60-0.69), and worse (SROC: <50). We also assessed the statistical heterogeneity among the studies by using the I 2 value, and the I 2 value was also classified into very low (0-25%), low (25-50%), medium (50-75%), and high (>75%) heterogeneity, respectively [25].

Study Selection
The initial literature search of the electronic databases yielded 171 articles. A total of 101 articles were excluded for duplication. After reviewing the titles and abstracts, we further excluded 47 articles; therefore, 23 articles went for full-text review. Afterwards, we screened all reference lists for further relevant articles, but no additional study was found. Based on the full-text review, we excluded eight more studies because they were not in adherence with our inclusion criteria. Finally, 15 studies met all inclusion criteria [6,[26][27][28][29][30][31][32][33][34][35][36][37][38][39]. The flow diagram of the systematic search is presented in Figure 2.  Table 1 shows the baseline characteristics of the included studies. Among the 15 included studies, 7 studies were published in China, 6 studies in Japan, and 2 studies in Korea. All the included studies retrospectively collected data and developed their model for the diagnosis of EGC. All the studies utilized the CNN model to train and validate their results; however, GoogLeNet, Inception-v3, VGG-16, Inception-Resnet-v2, and ResNet34 were the most widely used algorithms (Table S1). The number of patients and images ranged from 69-2639 and 926-l45,240, respectively. The gold standard methods for identifying EGC were the World Health Organization (WHO) guidelines, Japanese classification, and histopathology, as shown in Table 2. White light imaging (WLI), magnifying endoscopy, narrow-band imaging (ME-NBI), and chromoendoscopy imaging were utilized to develop and evaluate the performance of the CNN model.

Performance Evaluation in Different Image Modalities
Eight studies used ME-NBI images to develop a CNN model for predicting EGC ( Table 3). The pooled sensitivity and specificity of CNN model for the detection of EGC was 0.95 and 0.95, respectively. Additionally, the pooled sensitivity and specificity of WLI image application (4 studies) was 0.80 and 0.95, respectively. The performance was not up to the mark while applying mixed image for detecting EGC. The pooled sensitivity, specificity, PPV, and NPV was 0.85, 0.89, 0.63, and 0.96, respectively.

Deep Learning versus Endoscopists
Five studies compared the performance of the CNN model to detect EGC with a total of 51 expert endoscopists (who had more than 10 years of working experience). The pooled sensitivity, specificity, PPV, and NPV was 0.77, 0.92, 0.80, and 0.90, respectively. The pooled SROC of expert endoscopists for detecting EGC was 0.90. Five studies also compared the performance of the CNN model to detect EGC with 47 senior endoscopists (who had 5-10 years of working experience). The pooled sensitivity, specificity, PPV, and NPV was 0.73, 0.95, 0.89, and 0.84, respectively. The pooled SROC of expert endoscopists for detecting EGC was 0.92. Moreover, the pooled sensitivity, specificity, PPV, and NPV of junior endoscopists was 0.69, 0.80, 0.78, and 0.71, respectively (Table 4).

Quality Assessment
In this study, the risk of bias was assessed by the QUDAS-2 tool (Table S2). The risk of bias for patient's selection, index test, and reference standard were low. All studies had an unclear risk of bias for the flow, timing, and index test. In case of applicability, all studies had a low risk of bias for the patient selection, index test, and applicability concern for the reference standard.

Main Findings
This comprehensive study shows the effectiveness of the CNN model in the automatic diagnosis of EGC using endoscopic digital images. The key findings are (1) the CNN model can diagnose EGC with comparable or better performance than expert endoscopists, and (2) the CNN model may facilitate existing screening program without human efforts, avoid misclassification, and assist endoscopists when it is needed.

Clinical Implications
The number of GC cases and deaths has increased globally. However, the prevalence of GC is always high in developed countries (approximately 70%), and nearly 50% of GC occurred in East Asian countries such as China, Korea, Japan, and Taiwan [40,41]. Previous studies reported that earlier identification and treatment could reduce the overall morbidity and mortality of GC [19,42]. Patients with gastrointestinal disorders such as Helicobacter pylori, gastritis, and intestinal metaplasia should be screened for GC at least annually to identify high-risk patients. In practice, the screening strategy relies only on visual inspection of the gastric mucosa [43]; therefore, gastroenterologists use an endoscope to collect samples from the inner cavity for histopathological evaluation [44]. Endoscopy is considered as a standard procedure for the diagnosis of EGC, and detection is higher than other screening methods such as UGI series, serum pepsinogen testing, and H. pylori serology [45]. However, the use of endoscopic screening has several limitations, and screening requires referral to a gastroenterologist. Patients do not always visit expert gastroenterologists due to the logistical barrier, cost, and availability of experts in rural areas [46].
Moreover, manual inspection of endoscopy images for gastric abnormalities findings is time-consuming, and detection performance always depends on the skill of the endoscopists. Previous studies reported that manual inspection increases the false detection rate, especially when the number of patients for screening is high [47,48]. Our study findings demonstrate that the CNN model can improve the detection performance of EGC, which is higher than that of endoscopists. Tang et al. [35] reported that the detection performance of EGC is even higher when endoscopists use the CNN model (Table 4). Obtaining high-quality images to detect EGC is difficult, especially for inexperienced endoscopists. Different image techniques have been using to detect gastric tissue abnormalities. However, the CNN model, which used a conventional technique, white light endoscopy (WLE), had lower performance NBI, a novel imaging technique. A previous study mentioned that diagnosis accuracy of EGC when using WLE is low when it comes to flat lesions and minute carcinoma [49]; however, both superficial structures and microvascular architecture of lesions are visualized by NBI [50,51]. The performance of CNN was even lower when a mixture of WLI, ME-NBI, and chromoendoscopy had been used to train and test the model.
The findings of our study suggest that the CNN model is clinically effective in detecting EGC. The application of the CNN model to correctly diagnose EGC could provide alternative ways for EGC screening, especially in areas where skilled endoscopists are not always available. In the future, physicians may cooperate with a CNN-based automated system, which would help to increase work efficiency and to reduce false detection ( Figure 5).

Strengths and Limitations
Our study has several strengths. First, this is the most comprehensive study that evaluated the performance of the CNN model to correctly diagnose EGC. Second, our study also compared the performance of the CNN model with that of expert, senior, and junior endoscopists to diagnose EGC, which has great clinical value. Third, we also compared the performance of the CNN model for different image modalities. Finally, we calculated the overall PPV and NPV values, which may help to make an effective clinical decision on implementing the CNN model in real-world clinical settings. However, our study has several limitations that also need to be mentioned. First, our study findings are mainly based on retrospective data, but prospective evaluation is needed to check the real performance of the CNN model. Although, several studies had prospective evaluation. Second, all studies used high-quality images to develop and validate the performance of the CNN model. Therefore, our study is unable to present the real-performance of the CNN model if subjected to lower quality images. Finally, high heterogeneity exists among the studies included in this current study, which may be due to the following reasons: (a) varied nature of methodology and training algorithms, (b) a different number of sample size, (c) the variability of endoscopic images (WLI, NBI, and chromo-endoscopy). However, it could also be due to the distinct strictness of experts in the various study centers for positive judgment of GC patients. Therefore, the findings should be interpreted with caution. Despite the above limitations, efforts were made to select high-quality studies and the current meta-analysis presents the potentiality of the DL model for detecting GC. These findings warrant further validation in the larger prospective studies with different populations.

Conclusions
This study provides a summary of the current state-of-the-art CNN model for the diagnosis of EGC using endoscopic images. The findings of this comprehensive study show that the CNN model had a high sensitivity and specificity of stratifying EGC and outperformed the performance of endoscopists. A fully automated tool based on CNN could facilitate EGC screening in a cost-effective and time-efficient manner.
Despite the outstanding performance of the CNN model, there are still several potential challenges to apply these findings in the real-world clinical practice. First, the CNN model is often referred to as "black-box" due to a lack of interpretability of its findings [52][53][54][55]; therefore, it is not sufficient to have good accuracy. Second, the comparison of CNN algorithms across the studies is quite challenging because various methodologies on different populations with different sample sizes were being compared. Third, more sample size, and sample from various population as developing set is likely to improve performance, reduce the risk of bias, and increase the applicability of DL models in the real-world clinical settings. Finally, generalizability is another key challenge because the performance of the CNN model could vary when it is tested on unknown datasets, especially those based on low-quality images. Therefore, more evaluation is needed before widely deploying the CNN based tool in the real-world clinical practice.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers13215253/s1, Table S1: Description of performance metrics, data and model description, Table S2: Quality Assessment of Diagnostic Accuracy Studies-2 for Included Studies.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest associated with the contents of this article.