Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease

Lee, Kwang-Sig; Kim, Eun Sun

doi:10.3390/app142311219

Open AccessReview

Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease

by

Kwang-Sig Lee

^1,*

and

Eun Sun Kim

^2,*

¹

AI Center, Korea University Anam Hospital, Seoul 02841, Republic of Korea

²

Department of Gastroenterology, Korea University Anam Hospital, Seoul 02841, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(23), 11219; https://doi.org/10.3390/app142311219

Submission received: 1 November 2024 / Revised: 23 November 2024 / Accepted: 28 November 2024 / Published: 2 December 2024

(This article belongs to the Special Issue Novel Approaches for Machine Learning in Healthcare Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This study reviews the recent progress of generative artificial intelligence for gastrointestinal disease (GID) from detection to diagnosis. The source of data was 16 original studies in PubMed. The search terms were ((gastro* [title]) or (endo* [title])) and ((GAN [title/abstract] or (transformer [title/abstract]). The eligibility criteria were as follows: (1) the dependent variable of gastrointestinal disease; (2) the interventions of generative adversarial network (GAN) and/or transformer for classification, detection and/or segmentation; (3) the outcomes of accuracy, intersection of union (IOU), structural similarity and/or Dice; (3) the publication period of 2021–2023; (4) the publication language of English. Based on the results of this study, different generative artificial intelligence methods would be appropriate for different tasks for the early diagnosis of gastrointestinal disease. For example, patch GAN (accuracy 91.9%) in the case of classification, bi-directional cycle GAN (structural similarity 98.8%) in the case of data generation and semi-supervised GAN (Dice 89.4%) in the case of segmentation. Their performance indicators reported varied within 87.1–91.9% for accuracy, 83.0–98.8% for structural similarity and 86.6–89.4% for Dice. Likewise, vision transformer (accuracy 96.9%) in the case of classification, multi-modal transformer (IOU 79.5%) in the case of detection and multi-modal transformer (Dice 89.5%) in the case of segmentation. Their performance measures reported registered a variation within 85.7–96.9% for accuracy, 79.5% for IOU and 77.8–89.5% for Dice. Synthesizing different kinds of generative artificial intelligence for different kinds of GID data would further the horizon of research on this topic. In conclusion, however, generative artificial intelligence provides an effective, non-invasive decision support system for the early diagnosis of gastrointestinal disease from detection to diagnosis.

Keywords:

gastrointestinal disease; early diagnosis; artificial intelligence

1. Introduction

Gastrointestinal disease (GID), “the disease of the gastrointestinal tract” [1], is a main contributor to disease burden in the world [1,2,3,4,5,6]. GID causes 8 million deaths in the world in a year [2] and costs 120 billion dollars in the United States in 2018 [3]. The risk factors of GID include unhealthy behavioral patterns, abnormal antacid/anti-diarrheal medication, bad bowel habits and pregnancy [6]. On the other hand, the topics of artificial intelligence and machine learning have gained great attention on a global level. Artificial intelligence can be defined as “the capability of a machine to imitate intelligent human behavior” (the Merriam–Webster dictionary). As a branch of artificial intelligence, machine learning can be defined as “extracting knowledge from large amounts of data” [7]. Popular machine learning approaches are the artificial/deep neural network, the decision tree and the random forest (See [7] for their detailed explanation). Specifically, a deep neural network is an artificial neural network with many intermediate layers, e.g., 5, 10, or even 1000 [8]. Conventional study considers a limited scope of independent variables for the early diagnosis of disease, employing logistic regression with an unrealistic assumption of ceteris paribus, i.e., “all the other variables staying constant”. In this context, emerging literature utilizes artificial intelligence for the early diagnosis of GID, including classification [9], detection [10,11], segmentation [12] and explainable artificial intelligence [13,14,15].

In particular, generative artificial intelligence has garnered great attention in the past decade [16,17,18,19,20]. Generative artificial intelligence, which can be defined as “artificial intelligence generating image or text data”, includes generative adversarial networks (GANs) [16] and transformers [17,18,19,20]. The GAN generates image data by training the generative model vs. the discriminative model (Figure 1). The generative model tries to generate fake images from random noises so that it can fool the discriminative model. The discriminative model tries to discriminate fake images from real images (input images) so that it can outwit the generative model [16]. The original transformer, which consists of the encoder and the decoder, classifies or generates text data by using a combination of positional information (positional vectors), context information (embedding vectors) and attention mechanisms. Here, the encoder covers classification tasks while the decoder takes generation tasks [17]. Bidirectional encoder representations from transformers (BERT) inherit the encoder of the transformer only with more focus on classification tasks. [18], whereas generative pretrained transformers (GPT) take the decoder of the transformer only with more emphasis on generation tasks [19]. The vision transformer classifies or generates image data by considering image patches as if the original transformer considers text tokens [20]. To our best knowledge, no review is available on the recent development of generative artificial intelligence for GID from detection to diagnosis. In this context, this study reviews the recent progress of GANs and transformers for the classification, detection, segmentation and generation of GID images.

2. Methods

Figure 2 shows the flow diagram of this study as a modified version of Preferred Reporting Items for Systematic Reviews and Meta-Analyses. The source of data was 16 original studies in PubMed [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36]. The search terms were ((gastro* [title]) or (endo* [title])) and ((GAN [title/abstract] or (transformer [title/abstract]). The eligibility criteria were: (1) the participants with the dependent variable of gastrointestinal disease; (2) the interventions/comparisons of generative adversarial network (GAN) and/or transformer for classification, detection and/or segmentation; (3) the outcomes of accuracy, intersection of union (IOU), structural similarity and/or Dice; (4) the publication period of 2021–2023; and (5) the publication language of English. Opinions, reports and reviews were excluded (N = 14). Here, accuracy as an evaluation measure of classification denotes a ratio of correct predictions among all observations, whereas the intersection of union as an evaluation indicator of detection represents a ratio of overlap over union for predicted and observed objects. Structural similarity as an evaluation measure of data generation denotes similarity in terms of luminance, contrast and structure [37], while Dice as an evaluation indicator of segmentation is a ratio of (2*overlap) over summation for predicted and observed objects. In other words, the Dice numerator (or denominator) is the IOU numerator (or denominator) added by overlap for predicted and observed objects.

3. Results

3.1. Summary

The summary of the review is shown in Table 1 (or Table 2) for augmentation, classification, detection, generation and segmentation based on the GAN (or transformer). The tables have seven summary measures: (1) sample size (participants); (2–4) deep learning title, innovation and baseline (interventions and comparisons); (5–6) innovation vs. baseline performance outcomes; and (7) dependent variable (participants) for classification, detection and/or segmentation. Based on the results of this study, different generative artificial intelligence methods would be appropriate for different tasks for the early diagnosis of gastrointestinal disease. For example, patch GAN (accuracy 91.9%) in the case of classification, bi-directional cycle GAN (structural similarity 98.8%) in the case of data generation and semi-supervised GAN (Dice 89.4%) in the case of segmentation. Their performance indicators reported varied within 87.1–91.9% for accuracy, 83.0–98.8% for structural similarity and 86.6–89.4% for Dice. Likewise, vision transformer (accuracy 96.9%) in the case of classification, multi-modal transformer (IOU 79.5%) in the case of detection and multi-modal transformer (Dice 89.5%) in the case of segmentation. Their performance measures reported registered a variation within 85.7–96.9% for accuracy, 79.5% for IOU and 77.8–89.5% for Dice. However, artificial intelligence is a data-driven method and more study is to be completed with more external data for greater external validity.

3.2. GAN

This section summarizes original studies with augmentation, classification, detection, generation and segmentation based on the GAN. As reported above, different GANs would be appropriate for different tasks for the early diagnosis of GID; i.e., patch GAN (accuracy 91.9%) in the case of classification [21], bi-directional cycle GAN (structural similarity 98.8%) in the case of data generation [30] and semi-supervised GAN (Dice 89.4%) in the case of segmentation [27]. Here, the GAN improves the performance of a traditional classification, detection or segmentation model by (1) augmenting data or (2) presenting a feedback loop between the generative model and the discriminative model. It is to be reminded that the GAN generates image data by training the generative model vs. the discriminative model. The generative model strives to generate fake images from random noises so as to fool the discriminative model. The discriminative model strives to discriminate fake images from real images (input images) so as to outwit the generative model [16]. The original GAN maps an input image into a single probability value, while the patch GAN maps an input image into multiple probability values of multiple patches [21]. The bi-directional cycle GAN uses both mappings (1) and (2): (1) mapping a blurred input into a single probability value for a clear sample; and (2) mapping a clear input into a single probability value for a blurred sample. On the other hand, the unidirectional cycle GAN employs either of mappings (1) and (2) [30]. The semi-supervised GAN uses both unlabeled and labeled data to improve its performance [27].

The aim of a recent study with the patch GAN was to develop and validate deep-learning classification models for gastric cancer [21]. Data came from 630 images at a general hospital. The 100 images were taken as the test set. Then, the 530 images were augmented to 9360 images, which were split into the training and validation sets with a 95:05 ratio (8892:468 images). A major criterion for the test of the trained and validated models was accuracy. The patch GAN and the endoscopist (baseline) were trained, validated, tested and compared. Based on the results of this study, the accuracy of the patch GAN was higher than that of the endoscopist: 91.9% vs. 83.6%. Another recent study used 1290 images and the cycle GAN with bi-direction for the generation of GID images [30]. The cycle GAN with bi-direction and its unidirectional counterpart (baseline) were trained, validated, tested and compared. According to the findings of this study, the structural similarity of the cycle GAN with bi-direction was higher than that of its unidirectional counterpart: 98.8% vs. 96.8%. The purpose of another recent study with the semi-supervised GAN was to develop and validate deep learning segmentation models for GID [27]. The source of data was 4880 images at a general hospital and from public sources (e.g., CVC-ClinicDB, ETIS-LaribPolypDB, Kvasir-SEG). A major criterion for the test of the trained and validated models was Dice. The semi-supervised GAN and its supervised counterpart (baseline) were trained, validated, tested and compared. The DICE of the patch GAN was found to be higher than that of its supervised counterpart: 89.4% vs. 82.1%.

3.3. Transformer

This section summarizes original studies with augmentation, classification, detection, generation and segmentation based on the transformer. As stated above, different transformers would be optimal for different tasks for the early diagnosis of GID, i.e., vision transformer (accuracy 96.9%) in the case of classification [26], multi-modal transformer (IOU 79.5%) in the case of detection [29] and multi-modal transformer (Dice 89.5%) in the case of segmentation [35]. Here, the original transformer, which consists of the encoder and the decoder, generates or classifies text data by using a combination of positional information (positional vectors), context information (embedding vectors) and attention mechanisms [17]. The vision transformer generates or classifies image data by considering image patches as if the original transformer considers text tokens [20]. The multi-modal transformer accepts both text and image data [29,35].

A recent study with the vision transformer (i.e., TransMT-Net) aimed to develop and validate deep learning classification models for GID [26]. A major criterion for the test of the trained models was accuracy. The vision transformer and Alex (baseline) were trained, tested and compared. Based on the results of this study, the accuracy of the vision transformer was higher than that of Alex: 96.9% vs. 95.7%. Another recent study used 1886 images and the multi-modal transformer (EGCCap) for the detection of gastric cancer [29]. The multi-modal transformer and the endoscopist (baseline) were trained, validated, tested and compared. According to the findings of this study, the IOU of the multi-modal transformer was higher than that of the endoscopist: 79.5% vs. 74.9%. Another recent study with the multi-modal transformer (BiFTransNet) strived to develop and validate deep-learning segmentation models for gastrointestinal cancer [35]. The source of data was 41,987 images from public sources (UW-Madison Gastrointestinal Segmentation, Synapse Mult-organ Segmentation). A major criterion for the test of the trained and validated models was Dice. The multi-modal transformer and Unet (baseline) were trained, validated, tested and compared. The DICE of the multi-modal transformer was found to be higher than that of Unet: 89.5% vs. 88.3%.

4. Discussion

4.1. Evaluation Measure of Data Generation

Three suggestions for this line of research are presented here. Firstly, structural similarity was used as the evaluation measure of generative artificial intelligence in the task of data generation. However, it needs to be noted that structural similarity is one of the following twenty-four evaluation indicators summarized in Table 3 [37]. A brief but comprehensive overview of these indicators is presented here, given that it is a very rare attempt in existing literature on this topic. Average likelihood is the average likelihood of model distribution given by generated sample data. The coverage metric is the probability mass of generated sample data covered by model distribution. The Inception score is the quality and overall diversity of generated sample data. Here, quality denotes how easily Inception [38] can classify generated sample data. The modified Inception score is the quality and within-class diversity of generated sample data, i.e., diversity calculated per class category and then averaged over all class categories. The mode score is the Inception score with the consideration of prior label distribution over sample data, whereas the AM score is the Inception score with the consideration of training vs. test label distribution over sample data. The Fréchet Inception distance (or the Wasserstein critic) is the Gaussian (or Wasserstein) distance between generated sample data and real data distributions.

Likewise, the maximum mean discrepancy is the discrepancy between model distributions given generated sample data. The Birthday Paradox Test is the support size of a discrete distribution. C2ST (Classifier Two Sample Test) is a classifier of whether two sample data are drawn from the same distribution, the classification metric is a classifier of how well-extracted features predict class labels and boundary distortion is a classifier of how much diversity loss boundary distortion causes. The number of statistically different bins denotes the diversity of generated sample data. Image retrieval is quality for the nearest neighbors of generated sample data in the test set retrieved. The geometry score summarizes the geometric properties of generated sample data vs. real data. The reconstruction error is the reconstruction error between generated sample data and real data. Structural similarity is similarity in terms of luminance, contrast and structure, while the low-level image statistics is similarity in terms of four low-level image statistics such as mean power spectrum. Finally, the F1 scores are the F1 scores of two classifiers, one on generated sample data and the other on real data. Here, the F1 score represents the harmonic mean of sensitivity and specificity [37].

Table 3 presents a meta-analysis for the comparison of the twenty-four evaluation measures in terms of three characteristics (i.e., discriminable, detecting overfitting, sensitivity to distortions) [37]. For instance, four common evaluation measures of generative artificial intelligence, the IS, the modified IS, the mode score and the AM, are known to be discriminable by a high degree, detecting overfitting by a middle degree and sensitive to distortions by a middle degree. Another popular evaluation indicator, structural similarity, is reported to be discriminable by a low degree, detecting overfitting by a middle degree and sensitive to distortions by a high degree. This meta-analysis reveals that no measure satisfies all the three requirements of being discriminable by a high degree, detecting overfitting by a high degree and being sensitive to distortions by a low degree. In other words, it is quite challenging to make a rigorous evaluation of generative artificial intelligence for the task of data generation. Evaluating generative artificial intelligence in terms of an appropriate set of data generation indicators is expected to make a great contribution to this line of research.

4.2. Generative Artificial Intelligence with Reinforcement Learning

Secondly, generative artificial intelligence with reinforcement learning is requesting its due attention now. Reinforcement learning has three elements, i.e., the environment gives a series of rewards, an agent takes a series of actions to maximize the cumulative reward in response and the environment follows a series of transitions with transition probabilities [39]. Two revolutionary notions behind this booming branch of artificial intelligence were that artificial intelligence begins like a human player, i.e., takes a series of actions to maximize the chance of victory (cumulative reward) from limited data in limited time spans, but that it outperforms the best human player ever by a great margin with the sheer power of big data synthesizing all human players until now [39]. The popularity of reinforcement learning has become more apparent in finance [40] and healthcare (such as diagnosis automation, resource allocation and treatment recommendation) [41]. A recent review highlights the importance of psychological and social factors behind optimization processes in this direction [42]. In particular, a recent study made a very rare endeavor to apply proximal policy optimization for stomach coverage scanning of wireless capsule endoscopy [43]. Proximal policy optimization is a model-free reinforcement learning approach to find an optimal course of action (or optimal policy) based on importance sampling and “clipped” policy update (i.e., in a way that it is neither too big nor too small) [44]. In a recent study, stomach coverage scanning of wireless capsule endoscopy with reinforcement learning was found to achieve a much higher coverage rate than its manual counterpart, i.e., 98% vs. 87% during 150 s [43]. However, little research has been conducted and more study is needed on generative artificial intelligence with reinforcement learning.

4.3. Synthesizing Different Kinds of Generative Artificial Intelligence

Thirdly, synthesizing different kinds of generative artificial intelligence for different kinds of GID data would further the horizon of research on this topic. In this study, multi-modal or vision transformers were found to be optimal for the early diagnosis of GID, i.e., accuracy 96.9% in the case of classification [26], IOU 79.5% in the case of detection [29] and Dice 89.5% in the case of segmentation [35]. However, it should be noted that these multi-modal or vision transformers for the early diagnosis of GID focused on the synthesis of numeric and image data only. The expanding literature requests due attention for the synthesis of artificial intelligence for numeric, image and genetic data for the diagnosis, prognosis, prevention, and management of other diseases. For example, a recent study reviewed deep learning approaches based on multi-omics data for disease diagnosis, prognosis, prevention and management [45]. This study identified six multi-omics data sources and nine deep learning architectures in this direction: deep auto-encoders for the prediction of Alzheimer’s disease; deep auto-encoders for the prediction of breast cancer; convolutional neural networks for the prediction of breast cancer; convolutional neural networks for the prediction of heart disease; deep auto-encoders for the prediction of Parkinson’s disease; graph neural networks for the prediction of Parkinson’s disease; convolutional neural networks for the prediction of pneumonia; graph neural networks for the prediction of disease prescription; and ensemble machine learning for the prediction of ensemble diseases [45]. However, still little examination has been performed and more investigation is to be performed in this direction.

4.4. Rigorous Qualitative Evaluation for Generative Artificial Intelligence

Fourthly, refining qualitative evaluation approaches is essential for systematic reviews of generative artificial intelligence for GID, from detection to diagnosis. Meta-analysis is recommended to incorporate the following information based on the Enhancing the Quality and Transparency of Health Research Network: research question; eligibility and exclusion criteria; flow diagram; experimental characteristics such as sample size (participants), baseline vs. innovation methods (comparisons vs. interventions), dependent variable (participants), task type and baseline vs. innovation performance outcomes [46,47]. This study followed this recommendation with the following summary measures: research question (pp. 001–002); eligibility and exclusion criteria (p. 002); flow diagram (Figure 2); experimental characteristics such as sample size, baseline vs. innovation methods, dependent variable, task type and baseline vs. innovation performance outcomes (Table 1 and Table 2). However, this qualitative evaluation approach can be refined and this new approach would strengthen the reliability of reviews of generative artificial intelligence for GID much more.

5. Conclusions

This study reviewed the recent progress of generative artificial intelligence for GID from detection to diagnosis. Based on the results of this study, different generative artificial intelligence methods would be appropriate for different tasks for the early diagnosis of gastrointestinal disease. For example, patch GAN (accuracy 91.9%) in the case of classification, bi-directional cycle GAN (structural similarity 98.8%) in the case of data generation and semi-supervised GAN (Dice 89.4%) in the case of segmentation. Their performance indicators reported varied within 87.1–91.9% for accuracy, 83.0–98.8% for structural similarity and 86.6–89.4% for Dice. Likewise, vision transformer (accuracy 96.9%) in the case of classification, multi-modal transformer (IOU 79.5%) in the case of detection and multi-modal transformer (Dice 89.5%) in the case of segmentation. Their performance measures reported registered a variation within 85.7–96.9% for accuracy, 79.5% for IOU and 77.8–89.5% for Dice. However, it can be noted that the source of data was 16 original studies in PubMed and this would put a certain limitation on the validity of the review. Evaluating generative artificial intelligence in terms of an appropriate set of data generation indicators is expected to make a great contribution to this line of research. Generative artificial intelligence with reinforcement learning is requesting its due attention now. Synthesizing different kinds of generative artificial intelligence for different kinds of GID data would further the horizon of research on this topic as well. In conclusion, however, generative artificial intelligence provides an effective, non-invasive decision support system for the early diagnosis of gastrointestinal disease from detection to diagnosis.

Author Contributions

K.-S.L. and E.S.K. designed the study, collected, analyzed and interpreted the data and wrote and reviewed the manuscript. K.-S.L. and E.S.K. approved the final version of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Korea University College of Medicine grant (K2209721), Korea Health Industry Development Institute grants (No. HI21C156001; HI22C1302 (Korea Health Technology R&D Project)) funded by the Ministry of Health and Welfare of South Korea, and the Technology Innovation Program (20001533) funded by the Ministry of Trade, Industry & Energy of South Korea. The funders had no role in the design of the study, in the collection, analysis and interpretation of the data or the writing and review of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ashwin, N.A.; Ramnik, J.X. Gastrointestinal diseases. In Hunter’s Tropical Medicine and Emerging Infectious Diseases, 20th ed.; Ryan, E.T., Hill, D.R., Solomon, T., Aronson, N., Endy, T.P., Eds.; Elsevier: Amsterdam, The Netherlands, 2020; pp. 16–26. [Google Scholar]
Milivojevic, V.; Milosavljevic, T. Burden of gastroduodenal diseases from the global perspective. Curr. Treat. Options Gastroenterol. 2020, 18, 148–157. [Google Scholar] [CrossRef] [PubMed]
Peery, A.F.; Crockett, S.D.; Murphy, C.C.; Jensen, E.T.; Kim, H.P.; Egberg, M.D.; Lund, J.L.; Moon, A.M.; Pate, V.; Barnes, E.L.; et al. Burden and cost of gastrointestinal, liver, and pancreatic diseases in the United States, Update 2021. Gastroenterology 2022, 162, 621–644. [Google Scholar] [CrossRef] [PubMed]
Kim, Y.E.; Park, H.; Jo, M.W.; Oh, I.H.; Go, D.S.; Jung, J.; Yoon, S.J. Trends and patterns of burden of disease and injuries in Korea using disability-adjusted life years. J. Korean Med. Sci. 2019, 34 (Suppl. 1), e75. [Google Scholar] [CrossRef] [PubMed]
Jung, H.K.; Jang, B.; Kim, Y.H.; Park, J.; Park, S.Y.; Nam, M.H.; Choi, M.G. Health care costs of digestive diseases in Korea. Korean J. Gastroenterol. 2011, 58, 323–331. [Google Scholar] [CrossRef] [PubMed]
Mills, J.C.; Stappenbeck, T.S. Gastrointestinal disease. In Pathophysiology of Disease, an Introduction to Clinical Medicine, 7th ed.; Hammer, G.D., McPhee, S.J., Eds.; McGraw-Hill Education: New York, NY, USA, 2014; pp. 1–72. [Google Scholar]
Lee, K.S.; Ahn, K.H. Application of artificial intelligence in early diagnosis of spontaneous preterm labor and birth. Diagnostics 2020, 10, 733. [Google Scholar] [CrossRef]
Lee, K.S.; Ham, B.J. Machine learning on early diagnosis of depression. Psychiatry Investig. 2022, 19, 597–605. [Google Scholar] [CrossRef]
Byeon, S.J.; Park, J.; Cho, Y.A.; Cho, B.J. Automated histological classification for digital pathology images of colonoscopy specimen via deep learning. Sci. Rep. 2022, 12, 12804. [Google Scholar] [CrossRef]
Lee, K.S.; Son, S.H.; Park, S.H.; Kim, E.S. Automated detection of colorectal tumors based on artificial intelligence. BMC Med. Inform. Decis. Mak. 2021, 21, 33. [Google Scholar] [CrossRef]
Yu, T.; Lin, N.; Zhang, X.; Pan, Y.; Hu, H.; Zheng, W.; Liu, J.; Hu, W.; Duan, H.; Si, J. An end-to-end tracking method for polyp detectors in colonoscopy videos. Artif. Intell. Med. 2022, 131, 102363. [Google Scholar] [CrossRef]
Cui, R.; Yang, R.; Liu, F.; Cai, C. N-Net, Lesion region segmentations using the generalized hybrid dilated convolutions for polyps in colonoscopy images. Front. Bioeng. Biotechnol. 2022, 10, 963590. [Google Scholar] [CrossRef]
Esposito, A.A.; Zannoni, S.; Castoldi, L.; Giannitto, C.; Avola, E.; Casiraghi, E.; Catalano, O.; Carrafiello, G. Pseudo-pneumatosis of the gastrointestinal tract, its incidence and the accuracy of a checklist supported by artificial intelligence (AI) techniques to reduce the misinterpretation of pneumatosis. Emerg. Radiol. 2021, 28, 911–919. [Google Scholar] [CrossRef]
Kang, E.A.; Jang, J.; Choi, C.H.; Kang, S.B.; Bang, K.B.; Kim, T.O.; Seo, G.S.; Cha, J.M.; Chun, J.; Jung, Y.; et al. Development of a clinical and genetic prediction model for early intestinal resection in patients with Crohn’s disease: Results from the IMPACT Study. J. Clin. Med. 2021, 10, 633. [Google Scholar] [CrossRef]
Lipták, P.; Banovcin, P.; Rosoľanka, R.; Prokopič, M.; Kocan, I.; Žiačiková, I.; Uhrik, P.; Grendar, M.; Hyrdel, R. A machine learning approach for identification of gastrointestinal predictors for the risk of COVID-19 related hospitalization. PeerJ 2022, 10, e13124. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT, Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
OpenAI. GPT-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 x 16 words: Transformers for image recognition at scale. arXiv 2010, arXiv:2010.11929. [Google Scholar]
Sun, Y.; Li, Y.; Wang, P.; He, D.; Wang, Z. Lesion segmentation in gastroscopic images using generative adversarial networks. J. Digit. Imaging 2022, 35, 459–468. [Google Scholar] [CrossRef]
Rau, A.; Edwards, P.J.E.; Ahmad, O.F.; Riordan, P.; Janatka, M.; Lovat, L.B.; Stoyanov, D. Implicit domain adaptation with conditional generative adversarial networks for depth prediction in endoscopy. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 1167–1176. [Google Scholar] [CrossRef]
Almalioglu, Y.; Bengisu, O.K.; Gokce, A.; Incetan, K.; Irem Gokceler, G.; Ali Simsek, M.; Ararat, K.; Chen, R.J.; Durr, N.J.; Mahmood, F.; et al. EndoL2H, Deep super-resolution for capsule endoscopy. IEEE Trans. Med. Imaging 2020, 39, 4297–4309. [Google Scholar] [CrossRef]
De Souza, L.A., Jr.; Mendel, R.; Ebigbo, A.; Probst, A.; Messmann, H.; Palm, C.; Papa, J.P. Assisting Barrett’s esophagus identification using endoscopic data augmentation based on Generative Adversarial Networks. Comput. Biol. Med. 2020, 126, 104029. [Google Scholar]
Kumagai, Y.; Takubo, K.; Sato, T.; Ishikawa, H.; Yamamoto, E.; Ishiguro, T.; Hatano, S.; Toyomasu, Y.; Kawada, K.; Matsuyama, T.; et al. AI analysis and modified type classification for endocytoscopic observation of esophageal lesions. Dis. Esophagus 2022, 35, doac010. [Google Scholar] [CrossRef]
Tang, S.; Yu, X.; Cheang, C.F.; Liang, Y.; Zhao, P.; Yu, H.H.; Choi, I.C. Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images. Comput. Biol. Med. 2023, 157, 106723. [Google Scholar] [CrossRef]
Lonseko, Z.M.; Du, W.; Adjei, P.E.; Luo, C.; Hu, D.; Gan, T.; Zhu, L.; Rao, N. Semi-supervised segmentation framework for gastrointestinal lesion diagnosis in endoscopic images. J. Pers. Med. 2023, 13, 118. [Google Scholar] [CrossRef]
Zhou, Y.; Hu, Z.; Xuan, Z.; Wang, Y.; Hu, X. Synchronizing detection and removal of smoke in endoscopic images with cyclic consistency adversarial nets. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 21, 670–680. [Google Scholar] [CrossRef]
Gong, L.; Wang, M.; Shu, L.; He, J.; Qin, B.; Xu, J.; Su, W.; Dong, D.; Hu, H.; Tian, J.; et al. Automatic captioning of early gastric cancer using magnification endoscopy with narrow-band imaging. Gastrointest. Endosc. 2022, 96, 929–942.e6. [Google Scholar] [CrossRef]
Ali, S.; Zhou, F.; Bailey, A.; Braden, B.; East, J.E.; Lu, X.; Rittscher, J. A deep learning framework for quality assessment and restoration in video endoscopy. Med. Image Anal. 2021, 68, 101900. [Google Scholar] [CrossRef]
Govind, D.; Jen, K.Y.; Matsukuma, K.; Gao, G.; Olson, K.A.; Gui, D.; Wilding, G.E.; Border, S.P.; Sarder, P. Improving the accuracy of gastrointestinal neuroendocrine tumor grading with deep learning. Sci. Rep. 2020, 10, 11064. [Google Scholar] [CrossRef]
Shin, K.; Lee, J.S.; Lee, J.Y.; Lee, H.; Kim, J.; Byeon, J.S.; Jung, H.Y.; Kim, D.H.; Kim, N. An image turing test on realistic gastroscopy images generated by using the Progressive Growing of Generative Adversarial Networks. J. Digit. Imaging 2023, 36, 1760–1769. [Google Scholar] [CrossRef]
Im, J.E.; Yoon, S.A.; Shin, Y.M.; Park, S. Real-time prediction for neonatal endotracheal intubation using multimodal transformer network. IEEE J. Biomed. Health Inform. 2023, 27, 2625–2634. [Google Scholar] [CrossRef]
Li, J.; Zhang, P.; Wang, T.; Zhu, L.; Liu, R.; Yang, X.; Wang, K.; Shen, D.; Sheng, B. DSMT-Net, Dual self-supervised multi-operator transformation for multi-source endoscopic ultrasound diagnosis. IEEE Trans. Med. Imaging 2023, 43, 64–75. [Google Scholar] [CrossRef]
Jiang, X.; Ding, Y.; Liu, M.; Wang, Y.; Li, Y.; Wu, Z. BiFTransNet, A unified and simultaneous segmentation network for gastrointestinal images of CT & MRI. Comput. Biol. Med. 2023, 165, 107326. [Google Scholar] [CrossRef]
Qi, J.; Ruan, G.; Liu, J.; Yang, Y.; Cao, Q.; Wei, Y.; Nian, Y. PHF3 Technique, A pyramid hybrid feature fusion framework for severity classification of ulcerative colitis using endoscopic images. Bioengineering 2022, 9, 632. [Google Scholar] [CrossRef]
Borji, A. Pros and cons of GAN evaluation measures. Comput. Vis. Image Underst. 2019, 179, 41–65. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Hambly, B.; Xu, R.; Yang, H. Recent advances in reinforcement learning in finance. Math. Financ. 2023, 33, 437–503. [Google Scholar] [CrossRef]
Yu, C.; Liu, J.; Nemati, S. Reinforcement learning in healthcare: A survey. ACM Comput. Surv. (CSUR) 2021, 55, 1–36. [Google Scholar] [CrossRef]
Puiutta, E.; Veith, E.M.S.P. Explainable reinforcement learning: A survey. In Proceedings of the International Cross-Domain Conference for Machine Learning and Knowledge Extraction, Dublin, Ireland, 25–28 August 2020. [Google Scholar]
Zhang, Y.; Bai, L.; Liu, L.; Ren, H.; Meng, M.Q.H. Deep reinforcement learning-based control for stomach coverage scanning of wireless capsule endoscopy. In Proceedings of the 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO), Jinghong, China, 5–9 December 2022. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Wekesa, J.S.; Kimwele, M. A review of multi-omics data integration through deep learning approaches for disease diagnosis, prognosis, and treatment. Front. Genet. 2023, 14, 1199087. [Google Scholar] [CrossRef]
Enhancing the Quality and Transparency of Health Research Network. Reporting Guidelines. 2024. Available online: https://www.equator-network.org/reporting-guidelines/ten-simple-rules-for-neuroimaging-meta-analysis/ (accessed on 23 November 2024).
Müller, V.I.; Cieslik, E.C.; Laird, A.R.; Fox, P.T.; Radua, J.; Mataix-Cols, D.; Tench, C.R.; Yarkoni, T.; Nichols, T.E.; Turkeltaub, P.E.; et al. Ten simple rules for neuroimaging meta-analysis. Neurosci. Biobehav. Rev. 2018, 84, 151–161. [Google Scholar] [CrossRef]

Figure 1. Generative adversarial network (GAN). The GAN generates image data by training the generative model vs. the discriminative model. The generative model tries to generate fake images from random noises so that it can fool the discriminative model. The discriminative model tries to discriminate fake images from real images (input images) so that it can outwit the generative model.

Figure 2. Flow diagram.

Table 1. Summary of review—Generative Adversarial Network.

ID	Sample Size	Method	Generative Adversarial Network		Performance		Dependent Variable
		Title	Innovation	Baseline	Innovation	Baseline
21	630		Patch	Endoscopist	91.9	83.6	CL Cancer Gastric
21	630		Patch	Unet	86.6	77.6	SE Cancer Gastric
22	16,000		Conditional	Conditional	0.175	0.236	AU Cancer Colorectal
23	80,000	EndoL2H	Conditional & Spatial Attention	Conditional	83.0	69.0	GE Adenoma Small Intestine
24	589		Deep Convolutional	Alex	90.0	81.0	AU Adenocarcinoma Esophagus
27	4880		Semi-Supervised	Supervised	89.4	82.1	SE GID
28			Cycle-Desmoking	Cycle	91.0	81.1	GE GID
30	1290		Cycle-Bi-Direction	Cycle-Uni-Direction	98.8	96.8	GE GID
31	1508		Cycle & K-Means	K-Means	87.1	84.8	CL Cancer Gastrointestinal
32	107,060		Progressive Growing	Endoscopist	61.3	61.3	GE GID
	Note
	Performance	AU Accuracy CL Accuracy DE Intersection-Of-Union GE Structural-Similarity SE Dice
		Root Mean Squared Error
		Accuracy

Abbreviations: AU—Augmentation; CL—Classification; DE—Detection; GE—Generation; GID—Gastrointestinal Disease; NEI—Neonatal Endotracheal Intubation; NICU—Neonatal Intensive Care Unit; SE-Segmentation.

Table 2. Summary of review—Transformer.

ID	Sample Size	Method	Transformer		Performance		Dependent Variable
		Title	Innovation	Baseline	Innovation	Baseline
25	7983		Vision	Pathologist	91.2	91.2	CL Esophageal Disease
26	1645	TransMT-Net	Vision	Alex	96.9	95.7	CL GID
26	1645	TransMT-Net	Vision	Unet	77.8	65.2	SE GID
29	1886	EGCCap	Multi-Modal	Endoscopist Only	79.5	74.9	DE Cancer Gastric
33	219		Multi-Modal	Numeric	85.7	64.8	CL NICU-NEI
34	11,500	DSMT-Net	Vision Self-Supervised	Vision	89.2	79.1	CL Cancer Breast-Pancreatic
35	41,987	BiFTransNet	Multi-Modal	Unet	89.5	88.3	SE Cancer Gastrointestinal
36	15120	PHF3	Vision	Resnet-100	88.9	86.5	CL Ulcerative Colitis
	Note
	Performance	AU Accuracy CL Accuracy DE Intersection-Of-Union GE Structural-Similarity SE Dice
		Root Mean Squared Error
		Accuracy

Abbreviations: AU—Augmentation; CL—Classification; DE—Detection; GE—Generation; GID—Gastrointestinal Disease; NEI—Neonatal Endotracheal Intubation; NICU—Neonatal Intensive Care Unit; SE-Segmentation.

Table 3. Evaluation measures of generative artificial intelligence in data generation.

Measure	Description	Discriminable	Overfitting	Sensitive
01 Average Likelihood	Average likelihood of model distribution given generated sample data	Low	Low	Low
02 Coverage Metric	Probability mass of generated sample data covered by model distribution	Low	Low	Low
03 Inception Score (IS)	Quality and overall diversity of generated sample data	High	Middle	Middle
04 Modified IS	Quality and within-class diversity of generated sample data	High	Middle	Middle
05 Mode Score	IS with the consideration of prior label distribution over sample data	High	Middle	Middle
06 AM Score	IS with the consideration of training vs. test label distribution over sample data	High	Middle	Middle
07 FID	Gaussian distance between generated sample data and real data distributions	High	Middle	High
08 MMD	Discrepancy between model distributions given generated sample data	High	Low	-
09 Wasserstein Critic	Wasserstein distance between generated sample data and real data distributions	High	Middle	-
10 BPT	Support size of a discrete distribution	Low	High	Low
11 C2ST	Classifier whether two sample data are drawn from the same distribution	High	Low	-
12 Classification	Classifier how well extracted features predict class labels	High	Low	-
13 Boundary Distortion	Classifier how much diversity loss boundary distortion causes	Low	Low
14 NSDB	Number of statistical different bins denoting the diversity of generated sample data	Low	High	Low
15 Image Retrieval	Quality for the nearest neighbors of generated sample data in the test set retrieved	Middle	Low	-
16 GAM	Likelihood ratio of two models with generators or discriminators being switched	High	Low	-
17 Tournament Win	Tournament win rates of competing models	High	High	-
18 NRDS	Number of training epochs needed to distinguish generated sample data from real data	High	Low	-
19 AAAD	Accuracies of two classifiers on generated sample data and real data	High	Low	-
20 Geometry Score	Geometric properties of generated samples data vs. real data	Low	Low	Low
21 Reconstruction Error	Reconstruction error between generated sample data and real data	Low	Low	Middle
22 Structural Similarity	Similarity in terms of luminance, contrast and structure	Low	Middle	High
23 LLIS	Similarity in terms of four low-level image statistics such as mean power spectrum	Low	Low	Low
24 F1 Score	F1 scores of two classifiers on generated sample data and real data	Low	High	-

Note: FID Fréchet Inception Distance, MMD, Maximum Mean Discrepancy, BPT Birthday Paradox Test, C2ST Classifier Two Sample Test; NSDB Number of Statistically Different Bins, GAM Generative Adversarial Metric, NRDS Normalized Relative Discriminative Score; AAAD Adversarial Accuracy and Adversarial Divergence, LLIS Low-Level Image Statistics.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, K.-S.; Kim, E.S. Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease. Appl. Sci. 2024, 14, 11219. https://doi.org/10.3390/app142311219

AMA Style

Lee K-S, Kim ES. Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease. Applied Sciences. 2024; 14(23):11219. https://doi.org/10.3390/app142311219

Chicago/Turabian Style

Lee, Kwang-Sig, and Eun Sun Kim. 2024. "Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease" Applied Sciences 14, no. 23: 11219. https://doi.org/10.3390/app142311219

APA Style

Lee, K.-S., & Kim, E. S. (2024). Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease. Applied Sciences, 14(23), 11219. https://doi.org/10.3390/app142311219

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative Artificial Intelligence in the Early Diagnosis of Gastrointestinal Disease

Abstract

1. Introduction

2. Methods

3. Results

3.1. Summary

3.2. GAN

3.3. Transformer

4. Discussion

4.1. Evaluation Measure of Data Generation

4.2. Generative Artificial Intelligence with Reinforcement Learning

4.3. Synthesizing Different Kinds of Generative Artificial Intelligence

4.4. Rigorous Qualitative Evaluation for Generative Artificial Intelligence

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI