Next Article in Journal
Hierarchical Markov Chain Monte Carlo Framework for Spatiotemporal EV Charging Load Forecasting
Previous Article in Journal / Special Issue
Deterministic Modeling of Muller’s Ratchet Effect in Populations Evolving in an Environment of Finite Capacity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Artificial Intelligence and Bioinformatics in the Malignant Progression of Gastric Cancer

by
Tasuku Matsuoka
1,2 and
Masakazu Yashiro
1,2,*
1
Department of Molecular Oncology and Therapeutics, Osaka Metropolitan University Graduate School of Medicine, 1-4-3 Asahi-machi, Abeno-ku, Osaka 5458585, Japan
2
Institute of Medical Genetics, Osaka Metropolitan University, 1-4-3 Asahi-machi, Abeno-ku, Osaka 5458585, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(20), 11092; https://doi.org/10.3390/app152011092
Submission received: 30 August 2025 / Revised: 13 October 2025 / Accepted: 14 October 2025 / Published: 16 October 2025
(This article belongs to the Special Issue Research on Computational Biology and Bioinformatics)

Abstract

Gastric cancer (GC) is characterized by heterogeneity and complexity and remains one of the leading causes of cancer-related deaths. The molecular mechanisms underlying carcinogenesis and the progression of GC have been central to scientific research and urgently need to be elucidated. With the potent development of next-generation sequencing technologies, a vast amount of bioinformatic data—including genomics, epigenomics, transcriptomics, proteomics, and metabolomics—has been accumulated, providing an extraordinary prospect to explore the heterogeneity and intricacy of GC. Nevertheless, the enormous amount of data created by bioinformatics analyses presents considerable analytical challenges. The application of artificial intelligence (AI), including machine learning and deep learning, has emerged as a powerful resolution to these challenges, obtaining useful information from exponential omics data, particularly in GC. The integration of AI with multi-omics approaches in GC research offers novel insights and powerful tools for gaining a deeper understanding of cancer’s complexities. This article reviews the latest research and progress of AI and bioinformatics analysis in GC oncology over the past several years, focusing on the landscape of GC carcinogenesis, progression, and metastasis. We also discuss the current challenges for improving performance and highlight future directions for more precise and effective treatments for GC patients.

1. Introduction

Gastric cancer (GC) is referred to as a heterogeneous and multifactorial disease that involves changes in its occurrence and progression through multiple steps, including genetic [1], protein [2], metabolic [3], and environmental factors [4]. The complexity of pathogenesis and progression of GC impedes our understanding of cancer’s mechanisms. Furthermore, the accurate mechanisms driving GC initiation and progression remain areas that require further investigation. With remarkable developments in bioinformatics analysis based on multi-omics technologies, abundant omics data can be obtained through high-throughput next-generation sequencing techniques. In the field of cancer research, employing multi-omics analyses is crucial in elucidating the complex mechanisms underlying tumor progression and metastasis [5,6,7]. However, effectively processing, analyzing, integrating, and interpreting these large-scale multi-omics datasets to derive useful information represents a major challenge in the current GC research. Over the past decade, artificial intelligence (AI) has shown substantial promises for extensive applications and emerged as a transformative force in a variety of medical fields, particularly in the domain of cancer research. As a result, researchers have begun to incorporate AI into cancer research to address these challenges [8,9,10]. AI enhances the potential for early diagnosis through high-precision pattern recognition and imaging, offering new avenues for predicting disease progression. Additionally, AI-driven approaches are crucial in aiding in the identification of novel therapeutic targets and the development of personalized treatment strategies [11,12]. There is no doubt that the integration of AI with multi-omics approaches in GC research offers novel insights and powerful tools for gaining a deeper understanding of cancer’s complexities.
This review aims to explore the synergy of AI and bioinformatics in GC research, especially focusing on the landscape of progression, including carcinogenesis, diagnosis, growth, invasion, and metastasis, discussing the current challenges for improving the performance and the limitations of particular methods and highlighting future directions for more precise strategies for GC patients. Figure 1 illustrates GC progression, including carcinogenesis (early detection), progression (growth, migration, and invasion), and metastasis.

2. Methods

A non-systematic review was performed based on an electronic search through the medical literature using PubMed and Google Scholar. The keywords “gastric cancer”, “artificial intelligence”, “machine learning”, “deep learning”, “bioinformatics”, and “multi-omics” were used. In addition to research articles, which were mainly searched to obtain novel findings, review articles and guidelines investigating the roles of invasion for the progression of GC from gastroenterology, oncology, and genetics were included in this review. When more than one guideline concerning the same subject was available, the most up-to-date one was selected. Only full articles in the English language published in the last 10 years were considered for further review. Great importance was also given to “clinical study” and “review” articles dealing with the topic. The exclusion criteria comprised duplicated articles and studies lacking diagnostic outcomes. Case reports, correspondence, letters, and non-human research studies were excluded. First, the titles were screened, and appropriate studies were selected. Of these studies, the full text was acquired. A total of 121 articles were identified.

3. Development of AI and Bioinformatics in the Field of Cancer Research

3.1. AI

As cancer research continues to evolve, the integration of AI technology promises to revolutionize the field, paving the way for more effective and precise interventions. AI focuses on creating systems capable of performing tasks traditionally requiring human intelligence. The overarching goal of AI is to enable machines to emulate and execute functions such as perception, learning, reasoning, planning, and natural language processing. AI encompasses various technologies, including machine learning (ML) and deep learning (DL). ML, a pivotal subset of AI, involves programming computers to improve a performance criterion by leveraging example data, storing a large dataset, and continually improving their performance over time, enabling the training of prediction models and interpreting generalizations [13]. The application of ML in cancer research encompasses a range of areas, including risk assessment, lesion grading and genomics, lesion detection and characterization, imaging, prognosis, staging, therapy response, and other downstream applications [14]. ML can be categorized into supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning, which leads to the extraction of patterns and prediction making. Supervised learning applied labeled training data to construct associations between inputs and corresponding labels. Conversely, unsupervised learning does not rely on labeled datasets to train machines, which may make it less accurate. DL, a specialized area of ML, uses neural networks with multiple layers to mimic the human brain’s processing system, significantly advancing applications like lung computed tomography (CT) radiomics [15]. The application of AI in GC research includes multiple areas, such as screening, diagnosis, relapse prediction, prognosis, and the evaluation of treatment effectiveness.

3.2. Bioinformatics Analysis

Bioinformatics analysis is the adaptation of computational tools and statistical systems to analyze and elucidate biological data [16]. This field is essential for understanding the complexity of biological systems and developing new technologies in the field, such as genomics, transcriptomics, proteomics, metabolomics, and drug discovery. A variety of omics technologies have enormous potential in GC research, allowing the quantitative characterization of complicated biological arrangements. Genomics is a field that examines the structure, activity, variation, and categorization of all genes within a living organism and has the capability to analyze DNA sequences and unravel the genetic information, which is the foundation for understanding the important events in the progression of GC [17]. DNA methylation is an epigenetic event that is involved in various biological processes, including gene expression, alternative splicing, and cell differentiation [18]. Aberrant gene methylation, especially DNA methylation, is a common molecular event that happens early in carcinogenesis and can be detected in precancerous lesions, making it a valuable tool for detecting cancer initiation [19]. As a transient intermediate molecule, RNA can translate genetic information stored in DNA into proteins. The development of noncoding RNAs (ncRNAs) enlarges the functional range of the transcriptome [20]. The transcriptome plays an essential role in indicating indirect protein expression and implying real-time genome activity. Transcriptomic analysis can identify gene expression alterations and has been involved in the understanding of GC carcinogenesis. To analyze and quantify the transcriptome, RNA sequencing (RNA-seq) provides an accurate way to measure transcript abundance and is particularly useful for identifying alternative isoforms and gene fusions. Proteins perform the great majority of activities of the cell, from copying DNA, enhancing genetic information transfer, and catalyzing metabolic responses to inducing cellular mobility [21]. Proteins interfere with the phenotype and thus effectively reflect the real situation of cells than the genome [22]. Hence, proteomics can offer vigorous information to evaluate biological processes. Mass spectrometry (MS)-based proteomics technology has been the most often applied method for protein analysis. The metabolome, the accumulation of metabolites, can directly reveal functional data of biochemical responses, hence providing insights into multiple features of cellular physiology [23]. Metabolic deregulation results in distinctive metabolic phenotypes that are useful for earlier cancer detection and progression. MS, as a critical method for the identification and quantification of metabolites, has been developed by separate technologies, such as liquid chromatography (LC-MS) or by combining with gas chromatography (GC-MS) [24]. Metabolomics has emerged as a potential technology to evaluate oncogenic mechanisms underlying GC.
In bioinformatics analysis, the use of ML/DL has become progressively predominant [25]. By integrating multi-omics data, ML/DL supports uncovering the complexity of progressive processes and pathways involved in GC. These applications have significantly facilitated our ability to uncover complex biological data, providing unprecedented insights into the molecular mechanisms underlying GC progression. Figure 2 illustrates how AI and bioinformatics analysis transform diverse biological data to explore the mechanism of GC progression, which may lead to precision oncology.

4. Application of AI and Bioinformatics in the Malignant Progression of GC Research

4.1. Carcinogenesis

The pathogenesis of GC involves rearrangements at the genetic, transcriptomic, proteomic, and metabolic levels that drive oncogenesis [26]. It is now well established that the causes of carcinogenesis are highly complex and encompass multiple layers of regulatory mechanisms. In the past decades, our understanding of cancer has advanced considerably, particularly in relation to carcinogenesis, oncogenesis, and the transformation of cells into malignant metastatic forms. Unraveling the molecular and cellular pathways that underlie carcinogenesis is essential for crafting effective approaches to cancer prevention, accurate diagnosis, and targeted treatment. Recently, various omics tools have prompted the scientific measurement of dissimilar aspects of GC characteristics and developed biological networks at exceptional stages.

4.1.1. Genomics and Transcriptomics

A recent study illustrated useful insights into GC through a comprehensive analysis of gene expression profiles. They integrated the gene expression profiles downloaded from the Gene Expression Omnibus (GEO) database and used the R software and bioinformatics. A total of 34 upregulated genes and 27 downregulated genes were identified [27]. Among them, OLFM4, IGF2BP3, CLDN1, and MMP1 were the most extensively upregulated differentially expressed genes (DEGs). These results may contribute to understanding the carcinogenic mechanism of GC. Another study combining experimental methods and computational approaches obtained from the GEO database identified 83 DEGs between GC and normal gastric tissues [28]. Among these DEGs, COL1A1, COL1A2, COL3A1, and FN1 were identified as hub genes, which exhibited significant up-regulation in GC, and hypomethylation of the promoter regions of these genes was detected, suggesting potential diagnostic implications in GC. In addition, a study using multi-dimensional bioinformatics analysis identified a total of 161 DEGs from GEO datasets [29]. Among them, the top 10 hub genes, COL1A1, TIMP1, SPP1, BGN, MMP3, APOE, LOX, SST, NPY, and ATP4A, were determined. The mRNA expressions of COL1A1, TIMP1, SPP1, BGN, MMP3, and APOE were constantly and remarkably upregulated in gastric tumor tissues compared to normal tissues. These results may be useful for evaluating the mechanism of GC carcinogenesis. A study utilized weighted gene co-expression network analysis (hdWGCNA) on single-cell data to highlight the role of chronic atrophic gastritis in the carcinogenesis of GC [30]. A total of 15 genes (HSP90AB1, FUS, CTSD, KRT8, TALDO1, BTG2, TXNRD1, GADD45B, PSMB3, RPL9, NQO1, MTHFD2, CFL1, PRDX1, and PFDN2) were constantly identified as significant by the least absolute shrinkage and selection operator (LASSO) regression model, Random Forest (RF) analysis, and support vector machines (SVM)-RFE. Based on ROC analysis and data from the Human Protein Atlas, GADD45B was identified as a prominent oncogene linked to chronic atrophic gastritis, indicating its crucial role in GC carcinogenesis. Meanwhile, a comprehensive study of single-cell RNA-seq (scRNA-seq) data combined with bulk RNA-seq data demonstrates the potential role of lysine metabolism-related regulatory genes in GC [31]. Solute carrier family 7 member 7 (SLC7A7) and vimentin (VIM) were identified as key lysine metabolism-related genes involved in gastric carcinogenesis by using ML algorithms, specifically gradient boosting (GB). SLC7A7 and VIM are closely associated with the level of immune cell infiltration in GC and play important roles in the immune microenvironment.
Helicobacter pylori (HP) infects about 50% of the world’s population, making it a risk factor for gastritis, peptic ulcers, and GC. HP is also categorized as oncogenic due to its capacity for inducing gene mutations and damage to several HP-related genes [32]. CagA, VacA, BabA, and oipA are representative HP strain-specific virulence factors that increase the risk of GC. Nevertheless, few studies have comprehensively evaluated the exact mechanism underlying carcinogenesis induced by HP infection. A recent study using an integrative approach elucidated the unknown gene profiles correlated with HP-related GC, which were obtained from GEO and The Cancer Genome Atlas (TCGA) database [33]. A total of 73 HP infection-related genes were found from the gene set enrichment analysis (GSEA)-MSigDB website. Two distinct mutation patterns in GC were identified, characterized by distinct tumor microenvironment (TME)-infiltrating immune cell profiles. Among the mutated genes, three core genes, CRTAC1, BATF2, and CTHRC1, were selected to establish the risk signature using LASSO and multivariate Cox regression analyses, which may contribute to the accurate prediction of metastasis [33]. Another study, using an integrative transcriptomic analysis, identified four genes—TPX2, MKI67, EXO1, and CTHRC1—which revealed upregulation from infection to cancer, suggesting molecular shifts that link inflammation-driven infection to malignancy [34]. Network analysis identified four hub genes, CXCL1, CCL20, IL12B, and STAT4, which are enriched in immune processes such as chemotaxis, leukocyte migration, and cytokine signaling. These findings may provide an important basis for understanding the molecular mechanisms underlying HP-related GC. The transcriptomic (including HP-associated RNA-seq and TCGA-STAD) and genomic (single nucleotide variants, copy number variations (CNVs), methylation) analyses identified YWHAE, a ferroptosis-related gene, as being involved in GC carcinogenesis [35].
Epstein–Barr virus (EBV)-associated gastric cancer (EBVaGC) is a unique molecular subtype, accounting for approximately 8.7% of GC cases, featuring a high proportion of tumor-infiltrating lymphocytes and good response to PD-1 inhibitor treatment [36]. Using an integrated bioinformatics analysis of the immune microenvironment of EBVaGC, a study identified EBVaGC-specific immune-related genes (IRGs) and hub genes (CD4, STAT1, FCGR3A, IL10, C1QA, CXCL9, CXCL10, CXCR6, PD-L1, and CCL18) [37]. Among these, they identified C1QA as a differentially expressed gene in EBV-positive GC patients compared with EBV-negative patients. EBV-positive GC up-regulated immune-related genes, including common immune checkpoints and the human leukocyte antigen (HLA) gene. These findings will assist in deepening the understanding of a specific feature of EVBaGC. Similarly, a study carried out transcriptomic analysis, using the differential expression analysis, analyzed with the DESeq2 package in R, between EBV-positive GC and EBV-negative GC patients [38]. Analyses revealed key genes (LGALS17A, IRF1, TAP1, C1QA, C1QB, CMKLR1, ICAM1, APOE, CXCR2P1, GM2A, C1QC, TNFSF10, CXCL11, GBP5, CD300LF, IK32, FAM3B, and IDO1). These genes distinguished EBV-positive patients with high sensitivity and specificity. When examining the bacterial genera identified from the samples, microbial data display differentially abundant bacteria in EBV-positive and EBV-negative GC. The genera Choristoneura and Bartonella were more abundant in EBV-positive GC, while differentially abundant bacteria in EBV-negative GC included Citrobacter, Acidithiobacillus, Biochmannia, Beijerinckia, and Acidaminococcus [38].

4.1.2. Epigenomics

Accumulating evidence demonstrates that epigenetic changes play important roles in the initiation and progression of GC, mainly including epigenomic regulations and chromatin accessibility [19]. A recent study integrated analysis of methylation, RNA-seq gene expression using hierarchal clustering algorism on clinical data of GC from TCGA database and identified a set of eight potential diagnostic methylation probes (cg17105014 (GYPC), cg23273897 (MME), cg22083047 (PRICKLE2), cg09396217 (ANGPT1), cg01049530 (BMP3), cg18237405 (CPNE5), cg12741420 (IRF4), cg11754206 (KCNB2), which had an area under the curve (AUC) of the model on training, and validation set was 0.99 and 0.97, respectively, in an independent validation set from the GEO database [39].

4.1.3. Liquid Biopsy

Liquid biopsy is a non-invasive technique for detecting circulating tumor cells, circulating tumor DNA (ctDNA), and extracellular vesicles (EVs) in body fluids. This technique has innovated early cancer detection. Among them, ctDNA has demonstrated promising capability in promoting the detection of cancer at an early stage [40]. For example, a study by Michel et al. established a novel method, named Detection of Long Interspersed Nuclear Element Altered Methylation ON plasma DNA (DIAMOND), to identify methylation patterns of the internal promoter of long-interspersed element-1 (LINE-1) retrotransposons (L1) [41]. They evaluated the capacity of circulating DNA methylation changes and developed new highly sensitive approaches to detect cancer-specific signatures in blood. ML-based classifiers demonstrated robust accuracy in distinguishing healthy and tumor plasmas from malignancies, including GC in two independent cohorts (AUC = 88–100%). Another study showed that methylated ring finger protein 180 (RNF180) and secreted frizzled protein 2 (SFRP2), found in circulating DNA from blood samples, improved the GC diagnosis more accurately when used with an RF model [42]. Exosome transcriptome feature-based on liquid biopsy is a revolutionary tool for early diagnosis of GC [43]. In a multi-cohort study, an ML-based diagnostic model using serum exosomal ncRNAs has been constructed, allowing remarkable performance, with AUCs achieving 0.94 in the training cohorts for non-invasive early detection, and revealing the critical role of DGCR9 in GC progression [44]. Establishing a methylation atlas is an attractive method for analyzing big methylation data. The study by Nguyen et al. developed a comprehensive methylation atlas for cell type deconvolution of ctDNA samples, creating a model that can detect tumor-of-origin in low-depth ctDNA samples for early cancer detection, with a graph convolutional neural network (GCNN) using deconvolution scores and genome-wide methylation density features (GWMD) [45]. The deconvolution scores from this atlas correlated well with the tumor fraction in ctDNA samples, and the combination of deconvolution scores and GWMD significantly enhanced the tumor of origin detection performance when applied to low-depth whole-genome bisulfite sequencing ctDNA samples in five tumor types, including GC. Current relevant studies on the application of AI and bioinformatics analysis, including genomics, epigenomics, transcriptomics, and microbiome in GC carcinogenesis, are summarized in Table 1.

4.1.4. Radiomics

Taking into account the intricate, multifactorial nature of cancer etiology, cancer diagnosis requires the integration of data across various biological levels, from molecular phenotypes to clinical pathogenesis. Among them, medical imaging can offer important acknowledgement for clinical decision-making. In the last few decades, medical imaging techniques have developed towards omics applications, proposing to illustrate and quantify imaging features [46]. Extensively, ML/DL algorithms can analyze imaging data, such as endoscopic or computer tomography (CT) images, to improve early detection and accurate diagnosis of GC. Endoscopy is a crucial tool for detecting early gastric lesions. The use of AI, especially through DL algorithms exercised on patient data, GC endoscopy images, and their morphological features can detect gastric lesions with enhanced accuracy, consistency, and sensitivity, which can lead to the prediction of specific pathological characteristics. Numerous AI-assisted models have been developed in the diagnostic research on GC. For instance, a recent study built a new AI computer-aided diagnosis (CAD) system, called ALPHAON®, to evaluate the validity and clinical benefit [47]. ALPHAON® demonstrated high validation performance in accuracy, sensitivity, and specificity (accuracy: 0.88, 95% CI: 0.84 to 0.90; sensitivity: 0.93, 95% CI: 0.88 to 0.98; specificity: 0.87, 95% CI: 0.97 to 0.99), with an AUC value of 0.962 in detecting GC, suggesting that utilizing AI models in practical clinical settings emphasizes AI’s role in assisting endoscopists with detecting GC. Interestingly, ALPHAON® has been shown to help beginner and trainee endoscopists achieve better advanced outcomes, with a significant result observed in the beginner group [46]. Another study constructed a novel ML-based GC diagnosis model XHGC20 using the extreme gradient boosting (XGB) algorithm, which discriminates GC from precancerous lesions using a large-scale, population-based retrospective dataset [48]. This ensemble learning model demonstrated that the detection of XHGC20-related markers has a useful assisting role in the GC diagnosis. This model diagnosed GC with high sensitivity and specificity (sensitivity: 0.83; specificity: 0.806), suggesting that this model contributes to accurate and convenient decision assistance for early diagnosis of GC.
Although endoscopy is the most effective tool for early detection of GC, it cannot identify metastatic lesions, which may lead to mismanagement of patients [49]. Thus, CT- and magnetic resonance imaging (MRI)-based radiomics are currently one of the most frequently used and practical tools for diagnosing, preoperative staging, and validation of treatment effectiveness for GC. AI applications in CT images have drawn significant attention in numerous fields involving image segmentation, diagnosis, prediction of metastasis, and prognosis [49]. A recent study focused on GRAPE (GC risk assessment procedure with AI), using non-contrast CT and DL to detect GC in large-scale screening cohorts [50]. Improved detection accuracy for GC was unveiled by GRAPE compared with previous models based on clinical data and serological diagnostic procedure (AUC value: 0.757–0.79). The subgroup investigation revealed GRAPE sensitivity was superior in the advanced stage (more than 90%) compared with early GC (nearly 50%), suggesting that GRAPE may serve as a novel approach for GC diagnosis and promote cost-effectiveness and compliance with GC screening endeavors. A study by Fan et al. investigate the prediction of lymphovascular invasion (LVI) by preoperative eight clinical variables, enhanced CT- and Fluoro18-fluorodeoxyglucose positron emission tomography/CT (18F-FDG PET/CT)-based radiomics, and their combinations with three different ML classifiers, such as adaptive boosting, linear discriminant analysis, and logistic regression (LR) in GC [51]. The best performance of AUC was attained by the combined model with 94%.

4.1.5. Pathomics

Cancer pathology, which involves examining the histological changes in cells and organs induced by cancer, is crucial for diagnosis, prognosis, and the evaluation of treatment efficacy. The majority of challenges in cancer pathology are due to the heterogeneity of tumors. The accurate diagnosis and elucidation of malignant progression, which needs pathologists with abundant experience and clinical expertise, is notably complicated by intratumor heterogeneity [52]. AI has transformed cancer pathology by enabling more precise identification of lesion types and detailed evaluation through analysis of advanced histology images [53]. DL algorithms, especially CNNs, have proven to be highly effective in automated pathological image analysis [54]. For instance, a recent study clarified the implementation of DL-assisted GC diagnosis through a comprehensive multi-case examination [55]. The results revealed that pathologists with the aid of DL accomplished higher detection sensitivity than those without (90.63% vs. 82.75%) in distinguishing between malignant and benign lesions. Interestingly, a recent study constructed a multi-channel attention mechanism-based framework that can conquer the limitations of conventional DL models by focusing on a multi-channel strategy in medical data. The proposed approach of this novel MCAM framework for GC detection using attention mechanisms with transfer learning in histopathological images to facilitate automatic learning demonstrated outstanding improvements in GC detection compared with conventional DL models, suggesting that this framework has potential as a reliable and feasible tool for identifying GC in histopathological images [56].
LVI is a pathological feature indicating the risk of lymph node metastasis in primary GC specimens, and LVI detection is frequently a difficult task for pathologists because it is easy to make a mistake. A recent study utilizing an advanced integrated DL algorithm based on a combination of ConViT (SMALL) patch-level classification model and YOLOX object detection model to identify LVI from GC pathology showed exceptional performance, with an AUC achieving 0.998 [57]. In contrast, a study employing a hard negative mining algorithm and a CNN to advance a DL model for LVI screening with GC digital whole-slide images demonstrated the possibility of developing DL algorithms for various medical imaging applications (AUROC, 0.9738; AUPRC, 0.9501) [57]. In summary, the design of AI in identifying metastatic cells opens avenues for reviewing lymph node metastasis. These models may be used as a useful tool for pathologists in diagnosing GC and may assist in enhancing the accuracy of diagnosis and decreasing discrepancies among pathologists.
Based on the unique molecular features of GC, the TCGA research group has proposed dividing GC into four types, including EBV and microsatellite instability (MSI) [58]. EBV positivity and MSI are closely correlated with a favorable response to immunotherapy. A recent study trained a CNN model based on EfficientNet to evaluate the EBV status of H&E-stained whole slide images from biopsy specimens of GC [59]. The results achieved commendable performance with an accuracy of 0.938, suggesting that AI has the potential to clarify complicated patterns in cancer pathology and to identify virus infection status. Similarly, a study used a Human–Computer collaborative technology that combined a neural network called the EBVNet model and pathologists in predicting the status of EBV in GC [60]. The model attained an AUC in the internal cross-validation and external dataset were 0.945 and 0.969 for predicting GC EBV status, which can prompt the screening effectiveness of GC patients for immunotherapy in a cost-effective manner.
The application of AI in cancer pathology is evolving the research field, providing more precise diagnostic abilities and treatment approaches. These forward-looking technologies may enhance the efficacy and accuracy of tumor diagnosis and classification and enable the prediction of treatment response.

4.2. Progression

Cell proliferation, migration, and invasion capability are central components of cancer biology, all of which play crucial roles in cancer progression. A recent study established a diagnostic panel based on the dynamic gene expression profiles from patients with precancerous lesions and early GC [61]. Ten genes (AK5, CAST, CPE, MAP6, MRO, NR3C1, PHLDB2, TAGLN3, CPT1C, and SNAI) were identified. They had the best diagnostic ability to discriminate GC from normal tissues, with AUCs achieving 0.95 in the TCGA-STAD cohort. Through unsupervised clustering, the ten-gene panel divided GC patients into distinct subtypes, and these subtypes exhibited transcriptome-based pathway alterations during cancer progression, which were closely related to the dysregulation of the immune microenvironment. These results suggest that the 10-gene panel is a promising tool for developing a novel early detection signature for GC, which may also support progression assessment and facilitate further exploration of heterogeneity in GC [61]. Another study conducted WGCNA, GO and KEGG enrichment analyses, and GSEA using the GEO database to compare early GC tissues with advanced GC tissues [62]. These models indicated that C1QB, FCER1G, FPR3, and TYROBP may be key genes involved in the incidence and progression of GC. Further IHC experiments showed that the levels of C1QB, FCER1G, FPR3, and TYROBP proteins were significantly higher in the advanced stage group, suggesting that these four genes can help identify the interaction of hub genes closely linked to tumor progression. Similarly, the WGCNA network analysis identified three hub genes (FN1, COL1A1, and SERPINE1) that are associated with GC progression related to epithelial–mesenchymal transition (EMT) [63]. To identify key genes in GC progression, a study used bioinformatics analysis to select two datasets from the GEO database (GSE19826 and GSE103236) [64]. These datasets identified 123 differentially expressed genes (DEGs), and GO annotation analysis showed that these DEGs were linked to extracellular matrix (ECM)-receptor interaction, protein digestion, tight junctions, and cell adhesion molecules. Twelve key genes were chosen based on the PPI network: CCKBR, COL1A1, COL1A2, COL2A1, COL6A3, COL11A1, MMP1, MMP3, MMP7, MMP10, TIMP1, and SPP1 [64].
A study combined histone modification and Capture-based Self-Transcribing Active Regulatory Region sequencing (CapSTARR-seq) functional enhancer data and revealed a more accurate metric for the prediction of enhancer–promoter connections and isolation of genomic variants, including germline single nucleotide polymorphisms (SNPs), somatic copy number alterations (SCNAs), and trans-acting transcription factors (TFs), in driving enhancer heterogeneity [65]. The authors identified cancer-related genes (ING1, ARL4C), whose expression between patients is influenced by enhancer differences in genomic copy number and germline SNPs, and HNF4α as a master trans-acting factor associated with GC enhancer heterogeneity. These results will expand our acknowledgment of enhancer activities driving GC progression. Similarly, a study combined ATAC-seq with ChIP-seq data and created an integrative classification of mesenchymal subtype GC (Mes-GC) [66]. When stromal proportions in Mes-GC and nonMes-GC subtypes by bulk RNA-seq data were compared, Mes-GCs show a higher stromal proportion. Computational motif analysis identified TEAD1 as a key mediator of Mes-GC enhancers and a major TF regulator. Similarly, NUAK1 was found to be a candidate positive regulator of the Mes-GC enhancers. Notably, a study created a novel ML approach designed to identify the TFs that modulate various subtypes of GC via the analysis of ATAC-seq data [67]. The results showed that the amplification of MAPK9 and the deletion of GATA4, both of which contribute to the dysregulation of TFs in GC, were closely correlated, and both of them assist in the deregulation of TFs in GC. In addition, they identified TFs that drive the mesenchymal mode, involving RUNX2, ZEB1, SNAI2, and the AP-1 dimer, along with those related to the epithelial state, GATA4, GATA6, KLF5, HNF4A, FOXA2, and GRHL2. Mes-GCs reveal activated fibroblast-like epigenomic and expression profiles by using comparative analyses with bulk and single-cell RNA-seq datasets. This research provides novel insights into the mechanisms underlying the heterogeneity of GC subtypes [67].
lncRNAs are RNA transcripts longer than 200 nucleotides, transcribed from noncoding regions [68]. A recent study identified LOC441461 as one of seven hub genes linked to GC prognosis in the TCGA STAD database, using RNA-seq data [69]. The expression of LOC441461 decreased with the progression of TMN stages. Additionally, downregulation of LOC441461 promoted the proliferation, motility, and invasion abilities of GC cell lines. Furthermore, LOC441461 influences gene transcription by interacting with GATA1, ESR1, RELA, IRF1, AR, POU5F1, and TRIM28, which contributed to the malignant progression of GC [69]. Similarly, another study conducted a bioinformatics analysis using the TCGA cohort and the “lnCar” algorithm and identified that lncRNA LINC00659 was extensively expressed in GC tumor tissues [70]. SP1, a widely reported TF, enhanced the expression of LINC00659 to promote the tumor proliferation and mobility of GC through the miR-370-AQP3 axis. These results may help clarify the usefulness of lncRNAs in diagnosing GC. A study using RNA-seq identified a seven-lncRNA signature which were associated with overall survival (OS) in GC patients [71]. LncRNA LINC01614 was the most significant candidate of the signature of GC among them. Consistent with TCGA analysis, LINC0164 was found to be overexpressed in the GC specimen and GC cell lines, implying that LINC01614 may serve as an oncogene in GC. LINC01614 also affects cell cycle distribution and has a remarkable stimulating effect on the migration, invasion, and EMT of GC cell lines. Enrichment analysis showed that LIN01614 exhibited activities by regulating the PI3K-Akt signaling pathway [71]. A study employed gene set variation analysis to evaluate the increased level of EMT in GC. EMT activity was assessed, and the EMT score was calculated based on the log2-transformed gene expression values of genes, as in the pipeline using R package limma [72]. GC patients with high EMT scores revealed a correlation with gene sets associated with cell growth and significantly worse OS in the TCGA cohort. Additionally, EMT high GC was associated with a lower fraction of infiltrating Th1 cells and a higher fraction of infiltrating dendritic cells and M1 macrophages. Moreover, GSEA analysis demonstrated that TGF-β and angiogenesis are closely associated with GC, showing high EMT and cell proliferation-related gene expressions [72].
A recent study performed multilayered proteomic analyses, involving proteome, phosphor-proteome, and TF activity profiles [73]. Systematic comparison and consensus clustering analysis characterize three subtypes of diffuse GC (DGC) and intestinal GC (IGC), according to distinctive features of the cell cycle, ECM, and immune proteins. Immune and ECM proteins are elevated in DGC, whereas DNA damage is upregulated in IGC. TF activity-based subtypes demonstrate that SWI/SNF and NF-κB complexes regulate the progression of GC.
Genetic alterations are basic but not adequate for oncogenesis and cancer progression [73]. The TME refers to the cellular environment surrounding cancer cells and encompasses the surrounding immune cells, blood vessels, fibroblasts, ECM, etc. The nonmalignant cells in the TME often reveal a promoting function at all phases of cancer progression. Neutrophils, the sentries of immunity, were generally deemed to be the cells of innate immunity with pro-inflammatory and phagocytic functions involved in a dual activity in the TME. A recent study investigated the alteration of gene expression profiles of tumor-associated neutrophiles based on the scRNA-seq database in combination with transcriptomic data in GC specimens [74]. By using the LASSO, Univariate, RF, and Boruta ML algorithms in the GeneSelectR package, the authors identified 22 functional gene sets from the AUCell scores of neutrophils in the GS163558 dataset. Neutrophils highly expressing CD44 have a critical impact on growth, migration, oxidative stress, and T-cell infiltration. Genes, ncRNA, and proteins that facilitate GC progression, including proliferation, migration, invasion, identified by AI and bioinformatics analysis, are listed in Table 2.

4.3. Metastasis

The mortality rate of GC is remarkably high due to metastasis occurring at an advanced stage, which is when most patients are diagnosed. It is generally recognized that mutations overlap in the metastatic process, but the underlying mechanisms remain unknown [76]. Therefore, gaining a deep understanding of how metastasis occurs is critical and could potentially improve survival rates for GC patients. With advancements in AI and bioinformatics analysis, recent studies have identified several multi-omics findings that partially explain metastasis progression [77,78]. A study demonstrated that the molecular landscape of GC patients is characterized by integrated DNA and RNA sequencing [79]. The study reveals that TP53 and MADCAM1 mutations serve as early drivers of metastasis and are strongly associated with high metastatic potential. Additionally, MADCAM1 mutation promotes GC cell migration and metastasis through an immune-suppressive microenvironment. Current research exploring various candidates that could enhance our understanding of GC metastasis using AI and bioinformatics analysis is summarized in Table 3.

4.3.1. Lymph Node Metastasis

The most essential factor in deciding the primary therapy of early GC is lymph node metastasis (LNM). Almost all patients without LNM can survive for five years, whereas the OS rate of early GC decreases to 70–80% when LNM occurs [92]. Thus, accurate identification of LNM in GC is essential to avoid unnecessary treatments. A recent study clarified somatic mutations using MuTect and the immune TME in both normal regions adjacent to tumors (NATs) and matched tumors, applying multi-regional sequencing of tissues [80]. Whole-exome, RNA-seq, and TCR-alpha (T-cell receptor) sequencing were carried out. Cancer cells are predominant in NAT, and several driver mutations, including TP53, are also detected in NAT, indicating that field cancerization is a general issue during GC carcinogenesis. A phylogenetic tree for each patient based on somatic single-nucleotide variants revealed that metastatic clones may start seeding in the early stage and perform further extension to establish lymph node lesions at stations [80]. Another study aimed to validate the following hypothesis: whether proteomics can identify early GCs with LMN [81]. The authors employed proteomic analysis to compare the proteomic distinctions depending on the incidence of LNM in patients with early GC and subsequently constructed a prediction model of LNM in early GC patients. Two proteins, GABARAPL2 and NAV1, which showed superior predictive value, were identified. These differences may be utilized to precisely predict whether early GC patients have LNM at the preoperative phase to provide the choice of a suitable surgical indication. In recent years, numerous studies have constructed prediction models for LNM of early GC [93,94]. These models are primarily based on various algorithms, such as logistic models, nomograms, decision trees (DT), and Naive Bayes (NB) methods, among others. Relevant studies have demonstrated that utilizing ensemble learning techniques to process this type of dataset can yield positive outcomes. A recent study constructed a prediction model by employing ensemble algorithms, including RF, SVM, classification and regression tree (CART), LR, and XGB to predict LNM risk [82]. This ensemble learning model exhibited an enhanced level of accuracy, achieving an AUC value of 0.86 on the test set and an AUC value of 0.892 on the external validation set. Similarly, a study managed to establish external validation of a model for predicting LNM in early GC using LR and GB machine (GBM) methods [83]. The accuracy and specificity were 0.803 (95% confidence interval [CI], 0.787 to 0.806) and 0.796 (95% CI, 0.787 to 0.798) in LR and 0.810 (95% CI, 0.794 to 0.814) and 0.803 (95% CI, 0.795 to 0.805) in GBM, respectively. LR, GBM, and LASSO demonstrated higher performance compared with SVM and Elastic Net. Interestingly, a study using a novel liquid biopsy-based transcriptomic panel to detect LNM in early GC patients identified four genes, SDS, TESMIN, NEB, and GRB14, that were differentially expressed in T1 GC with LNM vs. T1 GC without LNM based on high-throughput data from TCGA and GSE246963 [84]. ROC curve analysis showed that the combined four-mRNA panel significantly increased diagnostic value (AUC  =  0.838, sensitivity 82.3%, specificity 75.0%). Moreover, transcriptomic liquid biopsies using blood samples can accurately estimate the preoperative risk of LNM. The Risk Stratification Assessment model revealed superior predictive accuracy, with an AUC of 0.812 in ROC curve analysis, and decreased the high overtreatment rate from 84.5% to 14.4%, suggesting its capability for better clinical decision-making.

4.3.2. Peritoneal Metastasis

The peritoneum is one of the most familiar targets of metastasis and a prevalent cause of worsened prognosis in GC [95]. However, the detailed molecular mechanisms involved in the metastatic process to the peritoneum are still poorly understood. Thus, validating the molecular mechanisms linked to peritoneal metastasis (PM) is crucial and holds significant clinical promise. A recent study showed that WGCNA using RNA-seq data identified PM-related lncRNAs (lnc-TRIM28-14, lnc-RFNG-1) and genes (CD93, COL3A1, and COL4A1) in GC samples [85]. Expression of lnc-TRIM28-14 is significantly increased, improving the diagnostic sensitivity and specificity in GCPM. Another study developed and validated a multitask ML model to predict PM(TACSPR) concurrently [86]. The TACSPR model proved to accurately predict GCPM by utilizing a multitask ML approach. Using RNA-seq data from TCGA and GSE15081, a study assessed the diagnostic performance of an mRNA panel for detecting PM [87]. Six genes (BUB1, CKS2, PCNA, CHEK1, NEK2, and NCAPG2) were identified. All of them showed different expression levels in patients with PM compared to those without. Elevated expression of the 6-mRNA in PM patients was confirmed through peripheral blood analysis of GC patients with or without PM. The 6-mRNA panel demonstrated high diagnostic performance, with an AUC value of 0.902. Multivariate analysis identified the 6-mRNA classifier as an independent indicator of PM. Additionally, the analysis of the pre-operative peripheral blood serum levels of these six mRNAs indicated that the risk stratification assessment model differentiated between patients with P0CY1 cells and those with detectable tumors, highlighting its potential role in determining cases for conversion therapy [87]. Stimulated Raman Molecular Cytology (SRMC) is an innovative and highly sensitive cytology technique combining SRS microscopy with DL algorithms. A recent study developed SRMC for detecting PM in GC [88]. The integration of three-color SRS microscopy with a DL segmentation model provided both morphological and compositional details of individual cells. SRMC demonstrated rapid and accurate detection of PM, achieving a sensitivity of 81.5%, specificity of 84.9%, accuracy of 83.75%, and an AUC of 0.85 in just twenty minutes with cross-validation. This method may facilitate quick and precise identification of GCPM.

4.3.3. Distant Metastasis

Recent research developed ML/DL models to distinguish distant metastasis in GC samples based on DNA methylation profiles using bioinformatics analysis [89]. The authors constructed various ML/DL models, such as deep neural networks (DNN), SVM, RF, NB, and DT models, to predict distant metastasis in GC. The results show that DNN outperformed other models, with an AUC of 99.9%. Additionally, a weighted random sampling technique helped identify key methylation markers linked to significantly useful genes in GC metastasis tumors. A study investigated the factors correlated with distant metastasis of GC by using several common clinicopathological findings combined with distinct ML algorithms from the SEER database [90]. Among the six ML algorithms, RF is the best algorithm to predict the risk of distant metastasis of GC, which offers adequate data for clinical decision-making. The scRNA-seq data from the GEO database were used to classify six major cell subpopulations, namely, Myeloid cells, B cells, Mast cells, Epithelial cells, Fibroblasts, and TNK cells [91]. Among these, the quantity of TNK cells was remarkably increased in the group of GC patients with liver metastasis. Differentially enriched pathways of TNK cells between the GC and GC liver metastasis groups revealed that TNK cells were enriched in the pathways, including IL-17 and PI3K-Akt signaling pathways. Cellular infiltration between the samples demonstrated a significantly increased infiltration ratio of CD8 T cells and NKT cells in the GC liver metastasis group. These results suggest that TNK cell subsets play a crucial role in the liver metastasis of GC and contribute to the development of new therapies to suppress GC liver metastasis [91]. These results suggest these types of data have attributes critical to the metastasis process and, consequently, the prediction assignment.

5. Discussion

This review highlights the use of AI and bioinformatics in GC progression, including carcinogenesis, proliferation, migration, invasion, and metastasis. It explores how these computational methods—such as various omics technologies, AI algorithms, and clinical prediction models—can assist in early detection, gene mutation prediction, and assessing the risk of metastasis in GC patients. Clinical prediction models are statistical tools built to aid diagnosis and prognosis; for example, nomograms are one type of such models. Recently, the rapid growth of data science and AI has led to a surge in prediction model research and increased discussion on how to properly evaluate these models. Here, we describe both the internal and external validation of the prediction models. Internal validation tests a model’s performance within a population similar to that used for training, while external validation measures performance in different populations, such as across various institutions, countries, or time periods. Incorporating automated analysis of GC progression into predictive models offers significant benefits and may greatly assist clinicians. We summarize these clinical prediction models in Table 4, illustrating how AI tasks relate to clinical decisions.
The synergy of AI and multi-omics analysis may decipher complicated biological mechanisms and mutational changes in GC and provide a new dimension to the cancer therapeutic approaches, such as drug discovery and prediction of treatment efficacy. Drug discovery is a procedure that takes a lot of years and requires many costs [96]. AI has been evidenced to be an effective technology for metastatic cancer drug discovery, particularly in areas such as toxicity prediction, drug repositioning, predicting molecular bioactivity, and cost performance [97]. Several researchers have investigated the potential of AI-based methods for drug discovery. A study constructed a novel model using ML, including XGB, to evaluate the molecules involved in the mTOR signaling pathway [98]. The authors identified five key genes, PPARA, FNIP1, WNT5A, HRAS, and HIF1A, in the SVM ML model that are predicted to be closely related to GC immune microenvironment and drug sensitivity. According to the investigation into these five genes and immunotherapy, thapsigargin may be a new strategy for the management of GC. Meanwhile, tumor-associated neutrophil signatures were identified as the most significant determinant of treatment response. A recent study found DEGs from the ICBatlas database by comparing responders to immunotherapy with non-responders among GC patients [99]. WGCNA, univariate Cox, RF, XGB, and Boruta were performed on these DEGs from the TCGA-STAD cohort and revealed that three cancer-associated fibroblast-related genes, CDH6, EGFLAM, and RASGRF2, were differentially expressed in patients with GC. Multi-omics analyses have demonstrated that these three genes were closely associated with tumor immunity, immune-related genes, and chemotherapeutic drug efficacy. These results suggest that AI-based multi-omics analysis may have an important role in decoding GC drug resistance.
To translate research results into clinical application, clinical trials have an important role in the practice of evidence-based medicine and provide insights into better target selection. The LEGACy2 study, a Horizon 2020-funded multi-institutional research project, was conducted prospectively, recruiting tumor tissue samples from European and Latin American countries to evaluate the characterization of advanced GC in European and Latin American populations by integrating multi-omics analysis (NCT04015466) [100]. Transcriptomic analysis identified four distinct immune clusters with different levels of immune activation by using an unsupervised hierarchical clustering based on cell type scores. Cluster 2 revealed higher scores for immune cells, such as B cells, CD8 T cells, cytotoxic cells, dendritic cells, neutrophils, NK cells, T cells, and T regulatory cells, and higher expression of immune checkpoints, such as PDCD1, CD274, CTLA4, and TNFRSF4, and was enriched for Hedgehog, JAK-STAT, MAPK, NF-κB, Notch, and Wnt signaling pathways. Cluster 1 was further divided into three subclasses and contained tumors with more heterogeneous immune infiltration. Furthermore, microbiome sequencing showed that HP in Latin America and Lactobacillus sp. in European GC patients was significantly more abundant in Cluster 2. These results may help to clarify the regional variations in the biological and clinical activities of GC and provide precision clinical decision-making worldwide.
Although significant progress has been made with AI-based bioinformatics analysis, its application continues to face many challenges. Firstly, the analysis of big data needs exact standardization and quality management to certify the reliability of results. Owing to variations in experimental approaches and sample handling, bias may occur that can influence the reliability and accuracy of the bioinformatics data. Hence, a series of standardized experimental procedures and processing techniques is required to overcome these problems. To integrate multi-omics data, advanced computational procedures are necessary to enhance data stability and preserve data integrity. Managing missing data encompassing different omics remains a significant challenge. Data synthesis and expansion architectures, such as generative adversarial networks [101] and diffusion models [102], can be leveraged to alleviate this obstacle. Data harmonization is crucial for sharing data across diverse research laboratories. While various publicly available bioinformatics databases currently exist, such as the TCGA and Genotype-Tissue Expression, there remains a persistent need for further efforts to facilitate data access through collaborative research systems and enhanced data management. Because AI algorithms are classically trained on single-institution datasets from a limited number of patient records or images, data sharing and algorithm generalizability are also key hurdles for clinical implementation [103]. Transitioning from single-institution datasets to multi-institutional ones is crucial. By adopting common data models, researchers can collaborate across institutions, overcoming barriers to data interoperability and enhancing the generalizability of AI models. Many advanced ML algorithms are available to produce comprehensive multi-omics data sources, accessible to researchers aiming to collect various omics data to create a comprehensive database to develop cancer research. The lack of interpretability and repeatability is another complex challenge in AI-based multi-omics analysis, which is a critical element for facilitating clinical usability [104]. Heterogeneities of cancer, intrinsic discrepancies between modalities, and AI’s black-box landscape still impede our acknowledgment of the way to predict made by models based on AI [105]. This obstacle hampers the clinical translation of AI-based multi-omics approaches. Growing efforts have been made to increase the interpretability, and numerous techniques have been aimed at elucidating AI prediction rationale, including hidden-state analysis, saliency maps, feature visualizations, and variable importance metrics [106]. These approaches will provide meaningful explanations for patient-specific predictions and prompt a mechanistic understanding of complicated models based on AI. In the case of repeatability, several approaches have attempted to resolve this problem, such as the adaptation of a more complex algorithm and averaging results from several models [104]. Following the standard guidelines, by contrast, can guarantee the reproducibility, transparency, and methodological rigor of an AI model to a certain extent [107]. Standardization in guidelines can ensure the interpretability, reproducibility, and methodological consistency of an AI model to some degree [107]. ML/DL models may finally address their expectations of integrating into clinical decision-making and transforming the data-driven progression of personalized medicine. Shapley Additive Explanations (SHAP) displayed increased real-time performance compared to existing methods [108]. This technique provides improved performance as well as warrants interpretability, supporting the requirements of medical clinicians. SHAP highlighted image features that improve the interpretability and reliability of the results in gastrointestinal cancer [109]. Similarly, a study constructed the potential and practical value of explainable ML models in early detection of GC by integrating advanced models such as WeightedEnsemble, CatBoost, and RF [110]. For instance, the greatest AUC at 0.94 was shown by the WeightedEnsembl-L2 model. This study not only enhances the accuracy and efficiency of medical decision-making but also offers a deeper understanding and opportunities for medical research. Another study proposed to establish a more reliable and interpretable AI algorithm for GC detection by integrating the Interpretable Model-agnostic Explanations (LIME) technique [111]. The authors proposed to increase predicted accuracy by combining three CNN modalities, VGG16, RESNET50, and MobileNetV2, to the GasHisSDB dataset. The combined model demonstrated an accuracy of 98%. Applying the LIME technique, which emphasizes the important regions within the image that inform the merged model’s decision-making process, led to clinicians receiving highlighted sections of the images that support their evaluation of the model’s predictions. This result may enhance their reliability toward AI-driven cancer detection systems.
The rapid advancement of AI in cancer research raises significant ethical concerns, including concerns about data privacy and the objectivity of AI models. Data security, transparency, and clinical validation are vital for fostering trust and certifying responsible AI deployment. Bioinformatics analysis often handles sensitive personal data, including gene sequences, health status, and lifestyle choices. Protecting privacy during data sharing and collaboration among multiple institutions is a significant challenge. Ensuring data security is crucial to protect patient confidentiality. Advanced encryption and anonymization techniques are essential for maintaining data privacy and integrity. Several sophisticated computational methods, such as federated learning, swarm learning, and blockchain technology, have been employed to boost privacy. Additionally, it is important to guarantee that all patients have equitable access to AI-driven diagnosis and treatment tools. Addressing algorithmic bias and enhancing explainability will strengthen AI’s role in improving patient care. Overcoming these challenges requires a multidisciplinary effort involving biology, mathematics, medicine, and computer science. The integration of AI and bioinformatics provides valuable applications and benefits in cancer research. This progress aims to bridge the gap between cutting-edge research and real-world healthcare, leading to more precise and effective treatments for GC patients.

6. Conclusions

AI has become a potent tool for processing large-scale datasets and is widely applied in constructing various omics-based models for GC research. The synergy of AI and multi-omics analysis, including genomics, transcriptomics, metabolomics, proteomics, radiomics, and pathomics, has shown promise in clarifying the complexity and heterogeneity of GC progression. Although many challenges, like data quality management and model interpretability, remain, there are significant ongoing efforts to address these issues and promote the clinical use of AI-driven bioinformatics. We hope AI technologies will offer valuable insights to support clinical decision-making, ultimately transforming the data-driven evolution of personalized treatment and improving outcomes and quality of life for GC patients.

Author Contributions

Conceptualization, T.M. and M.Y.; writing—original draft preparation, T.M. and M.Y.; writing—review and editing, T.M. and M.Y.; supervision, M.Y.; funding acquisition, M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by KAKENHI (Grant-in-Aid for Scientific Research), No. 21H03008.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wu, S.; Zhu, W.; Thompson, P.; Hannun, Y.A. Evaluating intrinsic and non-intrinsic cancer risk factors. Nat. Commun. 2018, 9, 3490. [Google Scholar] [CrossRef] [PubMed]
  2. Rao, R.; Gulfishan, M.; Kim, M.S.; Kashyap, M.K. Deciphering Cancer Complexity: Integrative Proteogenomics and Proteomics Approaches for Biomarker Discovery. Methods Mol. Biol. 2025, 2859, 211–237. [Google Scholar] [PubMed]
  3. Newgard, C.B. Metabolomics and Metabolic Diseases: Where Do We Stand? Cell Metab. 2017, 25, 43–56. [Google Scholar] [CrossRef]
  4. Mbemi, A.; Khanna, S.; Njiki, S.; Yedjou, C.G.; Tchounwou, P.B. Impact of Gene-Environment Interactions on Cancer Development. Int. J. Environ. Res. Public Health 2020, 17, 8089. [Google Scholar] [CrossRef]
  5. Akhoundova, D.; Rubin, M.A. Clinical application of advanced multi-omics tumor profiling: Shaping precision oncology of the future. Cancer Cell 2022, 40, 920–938. [Google Scholar] [CrossRef]
  6. Ghaffari, S.; Hanson, C.; Schmidt, R.E.; Bouchonville, K.J.; Offer, S.M.; Sinha, S. An integrated multi-omics approach to identify regulatory mechanisms in cancer metastatic processes. Genome Biol. 2021, 22, 19. [Google Scholar] [CrossRef]
  7. Heo, Y.J.; Hwa, C.; Lee, G.H.; Park, J.M.; An, J.Y. Integrative Multi-Omics Approaches in Cancer Research: From Biological Networks to Clinical Subtypes. Mol. Cells 2021, 44, 433–443. [Google Scholar] [CrossRef]
  8. Acosta, J.N.; Falcone, G.J.; Rajpurkar, P.; Topol, E.J. Multimodal biomedical AI. Nat. Med. 2022, 28, 1773–1784. [Google Scholar] [CrossRef]
  9. Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
  10. Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef] [PubMed]
  11. Bhinder, B.; Gilvary, C.; Madhukar, N.S.; Elemento, O. Artificial Intelligence in Cancer Research and Precision Medicine. Cancer Discov. 2021, 11, 900–915. [Google Scholar] [CrossRef] [PubMed]
  12. Sarvepalli, S.; Vadarevu, S. Role of artificial intelligence in cancer drug discovery and development. Cancer Lett. 2025, 627, 217821. [Google Scholar] [CrossRef]
  13. Greener, J.G.; Kandathil, S.M.; Moffat, L.; Jones, D.T. A guide to machine learning for biologists. Nat. Rev. Mol. Cell Biol. 2022, 23, 40–55. [Google Scholar] [CrossRef]
  14. Cuocolo, R.; Caruso, M.; Perillo, T.; Ugga, L.; Petretta, M. Machine Learning in oncology: A clinical appraisal. Cancer Lett. 2020, 481, 55–62. [Google Scholar] [CrossRef]
  15. Zhou, J.; Hu, B.; Feng, W.; Zhang, Z.; Fu, X.; Shao, H.; Wang, H.; Jin, L.; Ai, S.; Ji, Y. An ensemble deep learning model for risk stratification of invasive lung adenocarcinoma using thin-slice CT. NPJ Digit. Med. 2023, 6, 119. [Google Scholar] [CrossRef]
  16. Matsuoka, T.; Yashiro, M. Bioinformatics Analysis and Validation of Potential Markers Associated with Prediction and Prognosis of Gastric Cancer. Int. J. Mol. Sci. 2024, 25, 5880. [Google Scholar] [CrossRef]
  17. Sherman, R.M.; Salzberg, S.L. Pan-genomics in the human genome era. Nat. Rev. Genet. 2020, 21, 243–254. [Google Scholar] [CrossRef]
  18. Moore, L.D.; Le, T.; Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 2013, 38, 23–38. [Google Scholar] [CrossRef] [PubMed]
  19. Kiri, S.; Ryba, T. Cancer, metastasis, and the epigenome. Mol. Cancer 2024, 23, 154. [Google Scholar] [CrossRef] [PubMed]
  20. Cilento, M.A.; Sweeney, C.J.; Butler, L.M. Spatial transcriptomics in cancer research and potential clinical impact: A narrative review. J. Cancer Res. Clin. Oncol. 2024, 150, 296. [Google Scholar] [CrossRef]
  21. Muller, J.B.; Geyer, P.E.; Colaco, A.R.; Treit, P.V.; Strauss, M.T.; Oroshi, M.; Doll, S.; Virreira Winter, S.; Bader, J.M.; Kohler, N.; et al. The proteome landscape of the kingdoms of life. Nature 2020, 582, 592–596. [Google Scholar] [CrossRef] [PubMed]
  22. Zare, F.; Fleming, R.M.T. Integration of proteomic data with genome-scale metabolic models: A methodological overview. Protein Sci. 2024, 33, e5150. [Google Scholar] [CrossRef]
  23. Martinez-Reyes, I.; Chandel, N.S. Cancer metabolism: Looking forward. Nat. Rev. Cancer 2021, 21, 669–680. [Google Scholar] [CrossRef] [PubMed]
  24. Alseekh, S.; Aharoni, A.; Brotman, Y.; Contrepois, K.; D’Auria, J.; Ewald, J.; Ewald, J.C.; Fraser, P.D.; Giavalisco, P.; Hall, R.D.; et al. Mass spectrometry-based metabolomics: A guide for annotation, quantification and best reporting practices. Nat. Methods 2021, 18, 747–756. [Google Scholar] [CrossRef] [PubMed]
  25. Zheng, H.; Hu, X. Computational intelligence in bioinformatics and biomedicine. Methods 2024, 227, 58–59. [Google Scholar] [CrossRef]
  26. Ma, A.; McDermaid, A.; Xu, J.; Chang, Y.; Ma, Q. Integrative Methods and Practical Challenges for Single-Cell Multi-omics. Trends Biotechnol. 2020, 38, 1007–1022. [Google Scholar] [CrossRef]
  27. Yang, C.; Gong, A. Integrated bioinformatics analysis for differentially expressed genes and signaling pathways identification in gastric cancer. Int. J. Med. Sci. 2021, 18, 792–800. [Google Scholar] [CrossRef]
  28. Pei, X.; Luo, Y.; Zeng, H.; Jamil, M.; Liu, X.; Jiang, B. Identification and validation of key genes in gastric cancer: Insights from in silico analysis, clinical samples, and functional assays. Aging 2024, 16, 10615–10635. [Google Scholar] [CrossRef]
  29. Yu, C.; Chen, J.; Ma, J.; Zang, L.; Dong, F.; Sun, J.; Zheng, M. Identification of Key Genes and Signaling Pathways Associated with the Progression of Gastric Cancer. Pathol. Oncol. Res. 2020, 26, 1903–1919. [Google Scholar] [CrossRef]
  30. Xu, W.; Jiang, T.; Shen, K.; Zhao, D.; Zhang, M.; Zhu, W.; Liu, Y.; Xu, C. GADD45B regulates the carcinogenesis process of chronic atrophic gastritis and the metabolic pathways of gastric cancer. Front. Endocrinol. 2023, 14, 1224832. [Google Scholar] [CrossRef]
  31. Shao, Y.; Chen, C.; Yu, X.; Yan, J.; Guo, J.; Ye, G. Comprehensive analysis of scRNA-seq and bulk RNA-seq data via machine learning and bioinformatics reveals the role of lysine metabolism-related genes in gastric carcinogenesis. BMC Cancer 2025, 25, 644. [Google Scholar] [CrossRef]
  32. Duan, Y.; Xu, Y.; Dou, Y.; Xu, D. Helicobacter pylori and gastric cancer: Mechanisms and new perspectives. J. Hematol. Oncol. 2025, 18, 10. [Google Scholar] [CrossRef] [PubMed]
  33. Wu, X.; Jian, A.; Tang, H.; Liu, W.; Liu, F.; Liu, S.; Wu, H. A Multi-Omics Study on the Effect of Helicobacter Pylori-Related Genes in the Tumor Immunity on Stomach Adenocarcinoma. Front. Cell. Infect. Microbiol. 2022, 12, 880636. [Google Scholar] [CrossRef]
  34. Mohamed, S.H.; Hamed, M.; Alamoudi, H.A.; Jastaniah, Z.; Alakwaa, F.M.; Reda, A. Multi-omics analysis of Helicobacter pylori-associated gastric cancer identifies hub genes as a novel therapeutic biomarker. Brief. Bioinform. 2025, 26, bbaf241. [Google Scholar] [CrossRef] [PubMed]
  35. Liu, D.; Peng, J.; Xie, J.; Xie, Y. Comprehensive analysis of the function of helicobacter-associated ferroptosis gene YWHAE in gastric cancer through multi-omics integration, molecular docking, and machine learning. Apoptosis 2024, 29, 439–456. [Google Scholar] [CrossRef] [PubMed]
  36. Naseem, M.; Barzi, A.; Brezden-Masley, C.; Puccini, A.; Berger, M.D.; Tokunaga, R.; Battaglin, F.; Soni, S.; McSkane, M.; Zhang, W.; et al. Outlooks on Epstein-Barr virus associated gastric cancer. Cancer Treat. Rev. 2018, 66, 15–22. [Google Scholar] [CrossRef]
  37. Deng, S.Z.; Wang, X.X.; Zhao, X.Y.; Bai, Y.M.; Zhang, H.M. Exploration of the Tumor Immune Landscape and Identification of Two Novel Immunotherapy-Related Genes for Epstein-Barr virus-associated Gastric Carcinoma via Integrated Bioinformatics Analysis. Front. Surg. 2022, 9, 898733. [Google Scholar] [CrossRef]
  38. Carneiro, K.O.; Araujo, T.M.T.; Da Silva Mourao, R.M.; Casseb, S.M.M.; Demachki, S.; Moreira, F.C.; Dos Santos, A.; Ishak, G.; Da Costa, D.S.A.; Magalhaes, L.; et al. Transcriptional and microbial profile of gastric cancer patients infected with Epstein-Barr virus. Front. Oncol. 2025, 15, 1530430. [Google Scholar] [CrossRef]
  39. Hosseini, M.; Lotfi-Shahreza, M.; Nikpour, P. Integrative analysis of DNA methylation and gene expression through machine learning identifies stomach cancer diagnostic and prognostic biomarkers. J. Cell. Mol. Med. 2023, 27, 714–726. [Google Scholar] [CrossRef]
  40. Nikanjam, M.; Kato, S.; Kurzrock, R. Liquid biopsy: Current technology and clinical applications. J. Hematol. Oncol. 2022, 15, 131. [Google Scholar] [CrossRef]
  41. Michel, M.; Heidary, M.; Mechri, A.; Da Silva, K.; Gorse, M.; Dixon, V.; von Grafenstein, K.; Bianchi, C.; Hego, C.; Rampanou, A.; et al. Noninvasive Multicancer Detection Using DNA Hypomethylation of LINE-1 Retrotransposons. Clin. Cancer Res. 2025, 31, 1275–1291. [Google Scholar] [CrossRef]
  42. Dai, Z.; Jiang, J.; Chen, Q.; Bai, M.; Sun, Q.; Feng, Y.; Liu, D.; Wang, D.; Zhang, T.; Han, L.; et al. Combining methylated RNF180 and SFRP2 plasma biomarkers for noninvasive diagnosis of gastric cancer. Transl. Oncol. 2025, 51, 102190. [Google Scholar] [CrossRef]
  43. Zheng, Z.; Zhai, Y.; Yan, X.; Wang, Z.; Zhang, H.; Xu, R.; Liu, X.; Cai, J.; Zhang, Z.; Shang, Y.; et al. Functions and Clinical Applications of Exosomes in Gastric Cancer. Int. J. Biol. Sci. 2025, 21, 2330–2345. [Google Scholar] [CrossRef] [PubMed]
  44. Cai, Z.R.; Zheng, Y.Q.; Hu, Y.; Ma, M.Y.; Wu, Y.J.; Liu, J.; Yang, L.P.; Zheng, J.B.; Tian, T.; Hu, P.S.; et al. Construction of exosome non-coding RNA feature for non-invasive, early detection of gastric cancer patients by machine learning: A multi-cohort study. Gut 2025, 74, 884–893. [Google Scholar] [CrossRef] [PubMed]
  45. Nguyen, T.H.; Doan, N.N.T.; Tran, T.H.; Huynh, L.A.K.; Doan, P.L.; Nguyen, T.H.H.; Nguyen, V.T.C.; Nguyen, G.T.H.; Nguyen, H.N.; Giang, H.; et al. Tissue of origin detection for cancer tumor using low-depth cfDNA samples through combination of tumor-specific methylation atlas and genome-wide methylation density in graph convolutional neural networks. J. Transl. Med. 2024, 22, 618. [Google Scholar] [CrossRef] [PubMed]
  46. Saltz, J.; Almeida, J.; Gao, Y.; Sharma, A.; Bremer, E.; DiPrima, T.; Saltz, M.; Kalpathy-Cramer, J.; Kurc, T. Towards Generation, Management, and Exploration of Combined Radiomics and Pathomics Datasets for Cancer Research. AMIA Jt. Summits Transl. Sci. Proc. 2017, 2017, 85–94. [Google Scholar] [PubMed]
  47. Lee, H.; Chung, J.W.; Yun, S.C.; Jung, S.W.; Yoon, Y.J.; Kim, J.H.; Cha, B.; Kayasseh, M.A.; Kim, K.O. Validation of Artificial Intelligence Computer-Aided Detection on Gastric Neoplasm in Upper Gastrointestinal Endoscopy. Diagnostics 2024, 14, 2706. [Google Scholar] [CrossRef]
  48. Ke, X.; Cai, X.; Bian, B.; Shen, Y.; Zhou, Y.; Liu, W.; Wang, X.; Shen, L.; Yang, J. Predicting early gastric cancer risk using machine learning: A population-based retrospective study. Digit. Health 2024, 10, 20552076241240905. [Google Scholar] [CrossRef]
  49. Ma, T.; Wang, H.; Ye, Z. Artificial intelligence applications in computed tomography in gastric cancer: A narrative review. Transl. Cancer Res. 2023, 12, 2379–2392. [Google Scholar] [CrossRef]
  50. Hu, C.; Xia, Y.; Zheng, Z.; Cao, M.; Zheng, G.; Chen, S.; Sun, J.; Chen, W.; Zheng, Q.; Pan, S.; et al. AI-based large-scale screening of gastric cancer from noncontrast CT imaging. Nat. Med. 2025, 31, 3011–3019. [Google Scholar] [CrossRef]
  51. Fan, L.; Li, J.; Zhang, H.; Yin, H.; Zhang, R.; Zhang, J.; Chen, X. Machine learning analysis for the noninvasive prediction of lymphovascular invasion in gastric cancer using PET/CT and enhanced CT-based radiomics and clinical variables. Abdom. Radiol. 2022, 47, 1209–1222. [Google Scholar] [CrossRef] [PubMed]
  52. Ramon, Y.C.S.; Sese, M.; Capdevila, C.; Aasen, T.; De Mattos-Arruda, L.; Diaz-Cano, S.J.; Hernandez-Losa, J.; Castellvi, J. Clinical implications of intratumor heterogeneity: Challenges and opportunities. J. Mol. Med. 2020, 98, 161–177. [Google Scholar] [CrossRef] [PubMed]
  53. Shmatko, A.; Ghaffari Laleh, N.; Gerstung, M.; Kather, J.N. Artificial intelligence in histopathology: Enhancing cancer research and clinical oncology. Nat. Cancer 2022, 3, 1026–1038. [Google Scholar] [CrossRef]
  54. Baxi, V.; Edwards, R.; Montalto, M.; Saha, S. Digital pathology and artificial intelligence in translational medicine and clinical practice. Mod. Pathol. 2022, 35, 23–32. [Google Scholar] [CrossRef]
  55. Ba, W.; Wang, S.; Shang, M.; Zhang, Z.; Wu, H.; Yu, C.; Xing, R.; Wang, W.; Wang, L.; Liu, C.; et al. Assessment of deep learning assistance for the pathological diagnosis of gastric cancer. Mod. Pathol. 2022, 35, 1262–1268. [Google Scholar] [CrossRef]
  56. Zubair, M.; Owais, M.; Hassan, T.; Bendechache, M.; Hussain, M.; Hussain, I.; Werghi, N. An interpretable framework for gastric cancer classification using multi-channel attention mechanisms and transfer learning approach on histopathology images. Sci. Rep. 2025, 15, 13087. [Google Scholar] [CrossRef]
  57. Lee, J.; Cha, S.; Kim, J.; Kim, J.J.; Kim, N.; Jae Gal, S.G.; Kim, J.H.; Lee, J.H.; Choi, Y.D.; Kang, S.R.; et al. Ensemble Deep Learning Model to Predict Lymphovascular Invasion in Gastric Cancer. Cancers 2024, 16, 430. [Google Scholar] [CrossRef]
  58. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 2014, 513, 202–209. [Google Scholar] [CrossRef]
  59. Vuong, T.T.L.; Song, B.; Kwak, J.T.; Kim, K. Prediction of Epstein-Barr Virus Status in Gastric Cancer Biopsy Specimens Using a Deep Learning Algorithm. JAMA Netw. Open 2022, 5, e2236408. [Google Scholar] [CrossRef] [PubMed]
  60. Zheng, X.; Wang, R.; Zhang, X.; Sun, Y.; Zhang, H.; Zhao, Z.; Zheng, Y.; Luo, J.; Zhang, J.; Wu, H.; et al. A deep learning model and human-machine fusion for prediction of EBV-associated gastric cancer from histopathology. Nat. Commun. 2022, 13, 2790. [Google Scholar] [CrossRef] [PubMed]
  61. Long, F.; Li, S.; Xu, Y.; Liu, M.; Zhang, X.; Zhou, J.; Chen, Y.; Rong, Y.; Meng, X.; Wang, F. Dynamic gene screening enabled identification of a 10-gene panel for early detection and progression assessment of gastric cancer. Comput. Struct. Biotechnol. J. 2023, 21, 677–687. [Google Scholar] [CrossRef]
  62. Liu, R.; Liu, J.; Cao, Q.; Chu, Y.; Chi, H.; Zhang, J.; Fu, J.; Zhang, T.; Fan, L.; Liang, C.; et al. Identification of crucial genes through WGCNA in the progression of gastric cancer. J. Cancer 2024, 15, 3284–3296. [Google Scholar] [CrossRef]
  63. Zhao, Q.; Xie, J.; Xie, J.; Zhao, R.; Song, C.; Wang, H.; Rong, J.; Yan, L.; Song, Y.; Wang, F.; et al. Weighted correlation network analysis identifies FN1, COL1A1 and SERPINE1 associated with the progression and prognosis of gastric cancer. Cancer Biomark. 2021, 31, 59–75. [Google Scholar] [CrossRef]
  64. Wang, Y.; Li, D.; Li, D.; Wang, H.; Wu, Y. Integrated bioinformatics analysis for exploring hub genes and related mechanisms affecting the progression of gastric cancer. Biotechnol. Genet. Eng. Rev. 2024, 40, 4911–4922. [Google Scholar] [CrossRef]
  65. Sheng, T.; Ho, S.W.T.; Ooi, W.F.; Xu, C.; Xing, M.; Padmanabhan, N.; Huang, K.K.; Ma, L.; Ray, M.; Guo, Y.A.; et al. Integrative epigenomic and high-throughput functional enhancer profiling reveals determinants of enhancer heterogeneity in gastric cancer. Genome Med. 2021, 13, 158. [Google Scholar] [CrossRef] [PubMed]
  66. Ho, S.W.T.; Sheng, T.; Xing, M.; Ooi, W.F.; Xu, C.; Sundar, R.; Huang, K.K.; Li, Z.; Kumar, V.; Ramnarayanan, K.; et al. Regulatory enhancer profiling of mesenchymal-type gastric cancer reveals subtype-specific epigenomic landscapes and targetable vulnerabilities. Gut 2023, 72, 226–241. [Google Scholar] [CrossRef] [PubMed]
  67. Razavi-Mohseni, M.; Huang, W.; Guo, Y.A.; Shigaki, D.; Ho, S.W.T.; Tan, P.; Skanderup, A.J.; Beer, M.A. Machine learning identifies activation of RUNX/AP-1 as drivers of mesenchymal and fibrotic regulatory programs in gastric cancer. Genome Res. 2024, 34, 680–695. [Google Scholar] [CrossRef]
  68. Yao, R.W.; Wang, Y.; Chen, L.L. Cellular functions of long noncoding RNAs. Nat. Cell Biol. 2019, 21, 542–551. [Google Scholar] [CrossRef] [PubMed]
  69. Lee, S.S.; Park, J.; Oh, S.; Kwack, K. Downregulation of LOC441461 Promotes Cell Growth and Motility in Human Gastric Cancer. Cancers 2022, 14, 1149. [Google Scholar] [CrossRef]
  70. Wang, Y.; Guo, Y.; Zhuang, T.; Xu, T.; Ji, M. SP1-Induced Upregulation of lncRNA LINC00659 Promotes Tumour Progression in Gastric Cancer by Regulating miR-370/AQP3 Axis. Front. Endocrinol. 2022, 13, 936037. [Google Scholar] [CrossRef]
  71. Wu, H.; Zhou, J.; Chen, S.; Zhu, L.; Jiang, M.; Liu, A. Survival-Related lncRNA Landscape Analysis Identifies LINC01614 as an Oncogenic lncRNA in Gastric Cancer. Front. Genet. 2021, 12, 698947. [Google Scholar] [CrossRef]
  72. Oshi, M.; Roy, A.M.; Yan, L.; Kinoshita, S.; Tamura, Y.; Kosaka, T.; Akiyama, H.; Kunisaki, C.; Takabe, K.; Endo, I. Enhanced epithelial-mesenchymal transition signatures are linked with adverse tumor microenvironment, angiogenesis and worse survival in gastric cancer. Cancer Gene. Ther. 2024, 31, 746–754. [Google Scholar] [CrossRef]
  73. Shi, W.; Wang, Y.; Xu, C.; Li, Y.; Ge, S.; Bai, B.; Zhang, K.; Wang, Y.; Zheng, N.; Wang, J.; et al. Multilevel proteomic analyses reveal molecular diversity between diffuse-type and intestinal-type gastric cancer. Nat. Commun. 2023, 14, 835. [Google Scholar] [CrossRef] [PubMed]
  74. Zhang, W.; Wang, X.; Dong, J.; Wang, K.; Jiang, W.; Fan, C.; Liu, H.; Fan, L.; Zhao, L.; Li, G. Single-cell analysis uncovers high-proliferative tumour cell subtypes and their interactions in the microenvironment of gastric cancer. J. Cell. Mol. Med. 2024, 28, e18373. [Google Scholar] [CrossRef] [PubMed]
  75. Gan, S.; Li, C.; Hou, R.; Tian, G.; Zhao, Y.; Ren, D.; Zhou, W.; Zhao, F.; Lv, K.; Yang, J. Dynamic changes of the immune microenvironment in the development of gastric cancer caused by inflammation. Mol. Ther. Oncol. 2024, 32, 200849. [Google Scholar] [CrossRef] [PubMed]
  76. Comen, E.; Norton, L.; Massague, J. Clinical implications of cancer self-seeding. Nat. Rev. Clin. Oncol. 2011, 8, 369–377. [Google Scholar] [CrossRef] [PubMed]
  77. Siegel, M.B.; He, X.; Hoadley, K.A.; Hoyle, A.; Pearce, J.B.; Garrett, A.L.; Kumar, S.; Moylan, V.J.; Brady, C.M.; Van Swearingen, A.E.; et al. Integrated RNA and DNA sequencing reveals early drivers of metastatic breast cancer. J. Clin. Investig. 2018, 128, 1371–1383. [Google Scholar] [CrossRef]
  78. Zhao, Z.M.; Zhao, B.; Bai, Y.; Iamarino, A.; Gaffney, S.G.; Schlessinger, J.; Lifton, R.P.; Rimm, D.L.; Townsend, J.P. Early and multiple origins of metastatic lineages within primary tumors. Proc. Natl. Acad. Sci. USA 2016, 113, 2140–2145. [Google Scholar] [CrossRef]
  79. Zhang, J.; Liu, F.; Yang, Y.; Yu, N.; Weng, X.; Yang, Y.; Gong, Z.; Huang, S.; Gan, L.; Sun, S.; et al. Integrated DNA and RNA sequencing reveals early drivers involved in metastasis of gastric cancer. Cell Death Dis. 2022, 13, 392. [Google Scholar] [CrossRef]
  80. Zhou, Y.; Li, S.; Hu, Y.; Xu, X.; Cui, J.; Li, S.; Li, Z.; Ji, J.; Xing, R. Multi-regional sequencing reveals the genetic and immune heterogeneity of non-cancerous tissues in gastric cancer. J. Pathol. 2024, 263, 454–465. [Google Scholar] [CrossRef]
  81. Yan, B.; Dong, X.; Wu, Z.; Chen, D.; Jiang, W.; Cheng, J.; Chen, G.; Yan, J. Association of proteomics with lymph node metastasis in early gastric cancer patients. Biochim. Biophys. Acta Mol. Basis Dis. 2025, 1871, 167773. [Google Scholar] [CrossRef] [PubMed]
  82. Song, K.; Wu, J.; Xu, M.; Li, M.; Chen, Y.; Zhang, Y.; Chen, H.; Jiang, C. An ensemble learning model to predict lymph node metastasis in early gastric cancer. Sci. Rep. 2025, 15, 11257. [Google Scholar] [CrossRef]
  83. Lee, H.D.; Nam, K.H.; Shin, C.M.; Lee, H.S.; Chang, Y.H.; Yoon, H.; Park, Y.S.; Kim, N.; Lee, D.H.; Ahn, S.H.; et al. Development and Validation of Models to Predict Lymph Node Metastasis in Early Gastric Cancer Using Logistic Regression and Gradient Boosting Machine Methods. Cancer Res. Treat. 2023, 55, 1240–1249. [Google Scholar] [CrossRef]
  84. Ding, P.; Wu, J.; Wu, H.; Ma, W.; Li, T.; Yang, P.; Guo, H.; Tian, Y.; Yang, J.; Er, L.; et al. Preoperative liquid biopsy transcriptomic panel for risk assessment of lymph node metastasis in T1 gastric cancer. J. Exp. Clin. Cancer Res. 2025, 44, 43. [Google Scholar] [CrossRef]
  85. Dong, C.; Luan, F.; Tian, W.; Duan, K.; Chen, T.; Ren, J.; Li, W.; Li, D.; Zhi, Q.; Zhou, J. Identification and validation of crucial lnc-TRIM28-14 and hub genes promoting gastric cancer peritoneal metastasis. BMC Cancer 2023, 23, 76. [Google Scholar] [CrossRef]
  86. Fu, M.; Lin, Y.; Yang, J.; Cheng, J.; Lin, L.; Wang, G.; Long, C.; Xu, S.; Lu, J.; Li, G.; et al. Multitask machine learning-based tumor-associated collagen signatures predict peritoneal recurrence and disease-free survival in gastric cancer. Gastric. Cancer 2024, 27, 1242–1257. [Google Scholar] [CrossRef]
  87. Ding, P.; Wu, H.; Wu, J.; Li, T.; Gu, R.; Zhang, L.; Yang, P.; Guo, H.; Tian, Y.; He, J.; et al. Transcriptomics-based liquid biopsy panel for early non-invasive identification of peritoneal recurrence and micrometastasis in locally advanced gastric cancer. J. Exp. Clin. Cancer Res. 2024, 43, 181. [Google Scholar] [CrossRef]
  88. Chen, X.; Wu, Z.; He, Y.; Hao, Z.; Wang, Q.; Zhou, K.; Zhou, W.; Wang, P.; Shan, F.; Li, Z.; et al. Accurate and Rapid Detection of Peritoneal Metastasis from Gastric Cancer by AI-Assisted Stimulated Raman Molecular Cytology. Adv. Sci. 2023, 10, e2300961. [Google Scholar] [CrossRef] [PubMed]
  89. Shi, J.; Chen, Y.; Wang, Y. Deep learning and machine learning approaches to classify stomach distant metastatic tumors using DNA methylation profiles. Comput. Biol. Med. 2024, 175, 108496. [Google Scholar] [CrossRef]
  90. Qin, X.; Qiu, B.; Ge, L.; Wu, S.; Ma, Y.; Li, W. Applying machine learning techniques to predict the risk of distant metastasis from gastric cancer: A real world retrospective study. Front. Oncol. 2024, 14, 1455914. [Google Scholar] [CrossRef] [PubMed]
  91. Gao, J.; Liu, Y.; Tao, L.; Zeng, P.; Ye, G.; Zheng, Y.; Zhang, N. Single-cell data revealed the regulatory mechanism of TNK cell heterogeneity in liver metastasis from gastric cancer. Discov. Oncol. 2024, 15, 664. [Google Scholar] [CrossRef]
  92. Li, X.; Liu, S.; Yan, J.; Peng, L.; Chen, M.; Yang, J.; Zhang, G. The Characteristics, Prognosis, and Risk Factors of Lymph Node Metastasis in Early Gastric Cancer. Gastroenterol. Res. Pract. 2018, 2018, 6945743. [Google Scholar] [CrossRef] [PubMed]
  93. Zhao, L.; Han, W.; Niu, P.; Lu, Y.; Zhang, F.; Jiao, F.; Zhou, X.; Wang, W.; Luan, X.; He, M.; et al. Using nomogram, decision tree, and deep learning models to predict lymph node metastasis in patients with early gastric cancer: A multi-cohort study. Am. J. Cancer Res. 2023, 13, 204–215. [Google Scholar]
  94. Yue, C.; Xue, H. Construction and validation of a nomogram model for lymph node metastasis of stage II-III gastric cancer based on machine learning algorithms. Front. Oncol. 2024, 14, 1399970. [Google Scholar] [CrossRef]
  95. Sundar, R.; Nakayama, I.; Markar, S.R.; Shitara, K.; van Laarhoven, H.W.M.; Janjigian, Y.Y.; Smyth, E.C. Gastric cancer. Lancet 2025, 405, 2087–2102. [Google Scholar] [CrossRef]
  96. Workman, P.; Antolin, A.A.; Al-Lazikani, B. Transforming cancer drug discovery with Big Data and AI. Expert Opin. Drug Discov. 2019, 14, 1089–1095. [Google Scholar] [CrossRef]
  97. Gupta, R.; Srivastava, D.; Sahu, M.; Tiwari, S.; Ambasta, R.K.; Kumar, P. Artificial intelligence to deep learning: Machine intelligence approach for drug discovery. Mol. Divers. 2021, 25, 1315–1360. [Google Scholar] [CrossRef]
  98. Zhang, H.; Zhuo, H.; Hou, J.; Cai, J. Machine learning models predict the mTOR signal pathway-related signature in the gastric cancer involving 2063 samples of 7 centers. Aging 2023, 15, 6152–6162. [Google Scholar] [CrossRef] [PubMed]
  99. Wang, J.; Feng, J.; Chen, X.; Weng, Y.; Wang, T.; Wei, J.; Zhan, Y.; Peng, M. Integrated multi-omics analysis and machine learning identify hub genes and potential mechanisms of resistance to immunotherapy in gastric cancer. Aging 2024, 16, 7331–7356. [Google Scholar] [CrossRef] [PubMed]
  100. van Schooten, T.S.; Derks, S.; Jimenez-Marti, E.; Carneiro, F.; Figueiredo, C.; Ruiz, E.; Alsina, M.; Molero, C.; Garrido, M.; Riquelme, A.; et al. The LEGACy study: A European and Latin American consortium to identify risk factors and molecular phenotypes in gastric cancer to improve prevention strategies and personalized clinical decision making globally. BMC Cancer 2022, 22, 646. [Google Scholar] [CrossRef]
  101. Masalkhi, M.; Sporn, K.; Kumar, R.; Ong, J.; Nguyen, T.; Waisberg, E.; Zaman, N.; Lee, A.G.; Tavakkoli, A. Ophthalmic Image Synthesis and Analysis with Generative Adversarial Network Artificial Intelligence. J. Imaging Inform. Med. 2025, 38, 1–23. [Google Scholar] [CrossRef]
  102. Ma, Z.; Zhang, Y.; Jia, G.; Zhao, L.; Ma, Y.; Ma, M.; Liu, G.; Zhang, K.; Ding, N.; Li, J.; et al. Efficient Diffusion Models: A Comprehensive Survey From Principles to Practices. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 7506–7525. [Google Scholar] [CrossRef]
  103. Golla, A.K.; Tonnes, C.; Russ, T.; Bauer, D.F.; Froelich, M.F.; Diehl, S.J.; Schoenberg, S.O.; Keese, M.; Schad, L.R.; Zollner, F.G.; et al. Automated Screening for Abdominal Aortic Aneurysm in CT Scans under Clinical Conditions Using Deep Learning. Diagnostics 2021, 11, 2131. [Google Scholar] [CrossRef] [PubMed]
  104. Vamathevan, J.; Clark, D.; Czodrowski, P.; Dunham, I.; Ferran, E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spitzer, M.; et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 2019, 18, 463–477. [Google Scholar] [CrossRef]
  105. Wang, F.; Kaushal, R.; Khullar, D. Should Health Care Demand Interpretable Artificial Intelligence or Accept “Black Box” Medicine? Ann. Intern. Med. 2020, 172, 59–60. [Google Scholar] [CrossRef]
  106. Novakovsky, G.; Dexter, N.; Libbrecht, M.W.; Wasserman, W.W.; Mostafavi, S. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat. Rev. Genet. 2023, 24, 125–137. [Google Scholar] [CrossRef] [PubMed]
  107. Liu, X.; Faes, L.; Calvert, M.J.; Denniston, A.K.; Group, C.S.-A.E. Extension of the CONSORT and SPIRIT statements. Lancet 2019, 394, 1225. [Google Scholar] [CrossRef] [PubMed]
  108. Ponce-Bobadilla, A.V.; Schmitt, V.; Maier, C.S.; Mensing, S.; Stodtmann, S. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clin. Transl. Sci. 2024, 17, e70056. [Google Scholar] [CrossRef]
  109. Binzagr, F. Explainable AI-driven model for gastrointestinal cancer classification. Front. Med. 2024, 11, 1349373. [Google Scholar] [CrossRef]
  110. Du, H.; Yang, Q.; Ge, A.; Zhao, C.; Ma, Y.; Wang, S. Explainable machine learning models for early gastric cancer diagnosis. Sci. Rep. 2024, 14, 17457. [Google Scholar] [CrossRef]
  111. Ma, J.; Yang, F.; Yang, R.; Li, Y.; Chen, Y. Interpretable deep learning for gastric cancer detection: A fusion of AI architectures and explainability analysis. Front. Immunol. 2025, 16, 1596085. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic illustration of GC progression, including carcinogenesis (early detection), progression (growth, migration, and invasion), and metastasis. Firstly, genetic and epigenetic mutations cause dysregulation of biochemical signaling pathways associated with protein expression, leading to a loss of control in the cell cycle, uncontrolled proliferation, and evasion of apoptosis. Subsequently, tumor cells within the primary site acquire proliferative and thereby invasive properties, detach from the primary tumor, and invade the surrounding tissues, becoming CTCs. Cell migration is facilitated through the interaction between tumor cells and TME. CTCs then undergo EMT, a pivotal change that promotes their migration and invasion. Then, CTCs spread to nearby tissues or into the bloodstream or the lymph system to form secondary tumors at the peritoneum, lymph nodes, and distant organs (liver and lung). GC, gastric cancer; TME, tumor microenvironment; CTCs, circulating tumor cells; EMT, epithelial–mesenchymal transition; CAF, cancer-associated fibroblasts; ECM, extracellular matrix.
Figure 1. Schematic illustration of GC progression, including carcinogenesis (early detection), progression (growth, migration, and invasion), and metastasis. Firstly, genetic and epigenetic mutations cause dysregulation of biochemical signaling pathways associated with protein expression, leading to a loss of control in the cell cycle, uncontrolled proliferation, and evasion of apoptosis. Subsequently, tumor cells within the primary site acquire proliferative and thereby invasive properties, detach from the primary tumor, and invade the surrounding tissues, becoming CTCs. Cell migration is facilitated through the interaction between tumor cells and TME. CTCs then undergo EMT, a pivotal change that promotes their migration and invasion. Then, CTCs spread to nearby tissues or into the bloodstream or the lymph system to form secondary tumors at the peritoneum, lymph nodes, and distant organs (liver and lung). GC, gastric cancer; TME, tumor microenvironment; CTCs, circulating tumor cells; EMT, epithelial–mesenchymal transition; CAF, cancer-associated fibroblasts; ECM, extracellular matrix.
Applsci 15 11092 g001
Figure 2. AI and multi-omics analysis in the progression of GC. Multi-omics technologies have been developed to profile genomics, epigenomics, transcriptomics, proteomics, metabolomics, microbiomics, and emerging omics, such as radiomics and pathomics. AI is used to analyze these omics data to identify the hub genes, proteins, and bacteria, and to construct clinical predictive models, which leads to uncovering the complexity of progressive processes and pathways involved in GC. These data can be used in many clinical applications, including diagnosis, classification, predicting gene mutation, the risk of metastasis, and drug response. GC, gastric cancer; ML, machine learning; DL, deep learning; KNN, K-nearest neighbor; LASSO, least absolute shrinkage and selection operator; NB, Naïve Bayes; SVM, support vector machine; LR, logistic regression; RF, Random Forest; GB, gradient boosting; DT, decision tree; GMM, gaussian mixture model; PCA, principal component analysis; LDA, linear discriminant analysis; DNN, deep neural network; CNN, conventional neural network; RNN, recurrent neural network.
Figure 2. AI and multi-omics analysis in the progression of GC. Multi-omics technologies have been developed to profile genomics, epigenomics, transcriptomics, proteomics, metabolomics, microbiomics, and emerging omics, such as radiomics and pathomics. AI is used to analyze these omics data to identify the hub genes, proteins, and bacteria, and to construct clinical predictive models, which leads to uncovering the complexity of progressive processes and pathways involved in GC. These data can be used in many clinical applications, including diagnosis, classification, predicting gene mutation, the risk of metastasis, and drug response. GC, gastric cancer; ML, machine learning; DL, deep learning; KNN, K-nearest neighbor; LASSO, least absolute shrinkage and selection operator; NB, Naïve Bayes; SVM, support vector machine; LR, logistic regression; RF, Random Forest; GB, gradient boosting; DT, decision tree; GMM, gaussian mixture model; PCA, principal component analysis; LDA, linear discriminant analysis; DNN, deep neural network; CNN, conventional neural network; RNN, recurrent neural network.
Applsci 15 11092 g002
Table 1. Relevant studies on the application of AI and bioinformatics analysis in GC carcinogenesis.
Table 1. Relevant studies on the application of AI and bioinformatics analysis in GC carcinogenesis.
Omics TechnologyDatabaseAI Algorithm (Including Alongside AI Methods)No. of CasesIdentified Gene, Protein, BacteriaCommentsRef.
Genomics
Microarray
TCGA
GEO
N.A.30OLFM4, IGF2BP3, CLDN1, and MMP1These four genes were the most extensively upregulated.[27]
Genomics
Microarray
TCGA
GEO
N.A.39COL1A1, COL1A2, COL3A1, and FN1These four genes exhibited significant up-regulation in GC, and hypomethylation of promoter regions of these genes was detected.[28]
Genomics
Microarray
TCGA
GEO
N.A.161COL1A1, TIMP1, SPP1, BGN, MMP3, and APOEThese five mRNA expressions were constantly and remarkably upregulated in the gastric tumor tissues.[29]
GenomicsTCGA
GEO
Copykat algorithm
hdWGCNA
CellChat
LASSO regression model
SVM-RFE
N.A.HSP90AB1, FUS, CTSD, KRT8, TALDO1, BTG2, TXNRD1, GADD45B, PSMB3, RPL9, NQO1, MTHFD2, CFL1, PRDX1, and PFDN2GADD45B was identified as a prominent oncogene linked to chronic atrophic gastritis.[30]
Transcriptomics
metabolomics
scRNA-seq
bulk RNA-seq
TCGA
GEO
STAD
GDSC
TSNE analysis
CellChat
RFSRC
CIBERSORT,
GSEA
46SLC7A7 and VIMSLC7A7 and VIM were identified as key lysine metabolism-related genes involved in gastric carcinogenesis and closely associated with the level of immune cell infiltration.[31]
Epigenomics
450 K array
TCGA
GEO
STAD
SVM
LR
RF
GaussianNB
AdaBoost
470cg17105014 (GYPC), cg23273897 (MME), cg22083047 (PRICKLE2), cg09396217 (ANGPT1), cg01049530 (BMP3), cg18237405 (CPNE5), cg12741420 (IRF4), and cg11754206 (KCNB2)Eight potential diagnostic methylation probes had an AUC of the model on the training and validation set (0.99 and 0.97).[39]
Genomics
MSigDB
TCGA
GEO
STAD
GDC
GSVA
GSEA
LASSO analysis
N.A.CRTAC1, BATF2, and CTHRC1HP infection contributed to predicting patient prognosis and response to immunotherapy.[33]
Genomics
Transcriptomics
Microarray
RNA-seq
MSigDB
TCGA
GEO
STAD
DGIdb
GeneMANIA
SwissADME
N.A.TPX2, MKI67, EXO1, CTHRC1, CXCL1, CCL20, IL12B, and STAT4These four genes serve as potential biomarkers for early diagnosis, prognostic evaluation, and therapeutic targeting in HP-infected GC.[34]
Transcriptomics
scRNA-seq
qRT-PCR
TCGA
GEO
STAD
GSEA
LASSO
RF
CoxBoost
SVM
GBM
58YWHAEHelicobacter-associated ferroptosis gene YWHAE exhibits high expression in both HP-associated gastritis and GC.[35]
Genomics
Transcriptomics
Microarray
RNA-seq
RT-PCR
TCGA
GEO
STAD
CIBERSORT
GSEA
22CD4, STAT1, FCGR3A, IL10, C1QA, CXCL9, CXCL10, CXCR6, PD-L1, and CCL18C1QA is a differentially expressed gene in EBV-positive GC patients compared with EBV-negative patients.[37]
Transcriptomics
Microbiome
DESeq2 package
LDA
41LGALS17A, IRF1, TAP1, C1QA, C1QB, CMKLR1, ICAM1, APOE, CXCR2P1, GM2A, C1QC, TNFSF10, CXCL11, GBP5, CD300LF, IK32, FAM3B, and IDO1
Citrobacter, Acidithiobacillus, Biochmannia, Beijerinckia, and Acidaminococcus
These transcriptional landscapes and pathogens of EBV-associated GC are enriched, potentially contributing to a pro-inflammatory and tumor-promoting microenvironment.[38]
Genomics (ctDNA)
PCR
WGBS
Zenodo database
European Genome-phenome Archive
RF
RF stack model
N.A.N.A.Circulating DNA methylation changes at retrotransposons are a universal tumor biomarker, including GC.[41]
Genomics (ctDNA)N.A.RF
NB
KNN
Neural network
LR
303RNF180 and SFRP2These two methylated genes were detected in circulating DNA from blood samples and improved the accuracy of GC diagnosis.[42].
Transcriptomics (Exosome, ncRNAs)
qRT-PCR
Retrospective cohortLASSO-LR, LR
XGBoost
KNN
RF
SVM
1595DGCR9Serum exosome ncRNA feature offers an assuring liquid biopsy approach for promoting the early GC diagnosis.[44]
Genomics
(cfDNA)
Epigenomics
Microarray
TCGA
TSMA
GCNN88N.A.A GCNN using deconvolution scores and genome-wide methylation density features achieved an accuracy of 69% in low-depth cfDNA samples.[45]
Abbreviations: GC, gastric cancer; AI, artificial intelligence; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; WGCNA, Weighted Gene Co-expression Network Analysis; TSNE, T-distributed stochastic neighbor embedding; RF, Random Forest; scRNA-seq, single-cell RNA-sequencing; STAD, stomach adenocarcinoma; SVM, support vector machine; LR, logistic regression; GB, gradient boosting; AUC, area under the curve; LASSO, least absolute shrinkage and selection operator; PC, principal component; DGIdb, Drug-Gene Interaction Database; RT-PCR, Reverse Transcription-Polymerase Chain Reaction; LDA, Linear Discriminant Analysis; WGBS, whole-genome bisulfite sequencing; 1; NB, Naïve Bayes; XGBoost, extreme gradient boosting; KNN, K-nearest neighbors; ncRNAs, noncoding RNAs; N.A., not applicable.
Table 2. Relevant studies on the application of AI and bioinformatics analysis in GC progression.
Table 2. Relevant studies on the application of AI and bioinformatics analysis in GC progression.
Omics TechnologyDatabaseAI Algorithm (Including Alongside AI Methods)No. of CasesIdentified Gene, ProteinCommentsRef.
Transcriptomics
Bulk RNA
Microarray
scRNA-seq
TCGA
GEO
STAD
MSigDB
LASSO regression analysis
GSEA
PAM
GSVA
407AK5, CAST, CPE, MAP6, MRO, NR3C1, PHLDB2, TAGLN3, CPT1C, and SNAIThese ten genes had the best diagnostic ability to discriminate GC from normal tissues, with AUCs achieving 0.95.[61]
Genomics
Microarray
TCGA
GEO
GPL570
WGCNA192C1QB, FCER1G, FPR3, and TYROBPThe levels of C1QB, FCER1G, FPR3, and TYROBP proteins were significantly higher in the advanced stage group.[62]
Genomics
Transcriptomics
qRT-PCR
RNA-seq
TCGA
GEO
STAD
GEPIA
WGCNA407FN1, COL1A1, and SERPINE1Three hub genes (FN1, COL1A1, and SERPINE1) are associated with GC progression related to EMT.[63]
GenomicsGEON.A.N.A.CCKBR, COL1A1, COL1A2, COL2A1, COL6A3, COL11A1, MMP1, MMP3, MMP7, MMP10, TIMP1, and SPP1Bioinformatics analysis identified twelve key genes that affected the progression of gastric cancer.[64]
Transcriptomics
RNA-seq
bulk RNA-seq
TCGA
GEO
dbGaP
NEBULA algorithm
KNN
58TNF, IL17RA, IKBKG, TAB2, IL1B, and CASP8These six genes associated with interleukin-17 signaling were distinctly expressed.[75]
Transcriptomics
scRNA-seq
TCGA
CellMarker
UMAP method
GRNBoost
AUCell
LASSO regression analysis
CytoTRACE
27CREB3The transcription factor CREB3, which is highly active in the UBE2C+ tumor cell subpopulation, is involved in the migration, invasion, and progression of GC.[74]
Transcriptomics
ChIP-seq
RT-PCR
RNA-seq
TCGA
GEO
GLM
ChromHMM
N.A.ING1, ARL4C, and HNF4αCombining histone modification and functional assay data provides a more accurate metric to assess enhancer activity and identifies novel genes associated with GC.[65]
Genomics
Transcriptomics
Microarray
RNA-seq
TCGA
ACRG
N.A.N.A.TEAD1
NUAK
TEAD1 served as a key mediator, and NUAK1 was a candidate positive regulator of the mesenchymal-subtype GC enhancers.[66]
Transcriptomics
ATAC-seq
RNA-seq
ChIP-seq
TCGA
GEO
STAD
gkm-SVM
gkm-PWM
GSEA
N.A.RUNX2, ZEB1, SNAI2, the AP-1 dimer, GATA4, GATA6, KLF5, HNF4A, FOXA2, and GRHL2Activation of a small set of transcriptional factors driving the mesenchymal-subtype GC regulatory program contributes to cancer progression.[67]
Transcriptomics (lncRNA)
RNA-seq
qRT-PCR
TCGA
STAD
N.A.379LOC441461Downregulation of LOC441461 enhanced the growth, motility, and invasion of GC cell lines.[69]
Transcriptomics (lncRNA)
Microarray
TCGAN.A.120LINC00659SP1, a widely reported TF, promoted the expression of LINC00659 to promote the growth and motility of GC via the miR-370-AQP3 axis.[70]
Transcriptomics (lncRNA)
qRT-PCR
TCGAGSEA470LINC01614LINC01614 affects cell cycle distribution and prompts the migration, invasion, and EMT of GC cell lines.[71]
GenomicsTCGA
GEO
GSEA807CDH1, CDH2, VIM, and FN1.EMT high GC revealed an inverse correlation with gene sets related to cell proliferation.[72]
Proteomics
LC-MS/MS
Retrospective cohort MBR
GSEA
CDF
196SWI/SNF and NFKB.Immune and ECM proteins are elevated in diffuse-GC, whereas DNA damage is upregulated in intestinal-GC. SWI/SNF and NFKB complexes regulate the progression of GC. [73]
Transcriptomics
scRNA-seq
TCGA
GEO
STAD
Lasso
Univariate
RF
Boruta
437CD44Neutrophils highly expressing CD44 have a critical impact on growth, migration, oxidative stress, and T-cell infiltration.[74]
Abbreviations: GC, gastric cancer; AI, artificial intelligence; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; scRNA-seq, single-cell RNA-sequencing; STAD, stomach adenocarcinoma; RT-PCR, Reverse Transcription-Polymerase Chain Reaction; PAM, partitioning around medoids; dbGaP, Database of Genotypes and Phenotypes; KNN, K-nearest neighbors; GLM, generalized linear model; LASSO, least absolute shrinkage and selection operator; PWM, position weight matrices; EMT, epithelial–mesenchymal transition; MS, mass spectrometry LC, liquid chromatography; MBR, match between runs; CDF, cumulative distribution function; N.A., not applicable.
Table 3. Relevant studies on the AI application and bioinformatics analysis in GC metastasis.
Table 3. Relevant studies on the AI application and bioinformatics analysis in GC metastasis.
Omics TechnologyDatabaseAI Algorithm (Including Alongside AI Methods)No. of CasesIdentified Gene, Protein, BacteriaCommentsRef.
LMN
Transcriptomics
Whole-exome
RNA-seq
TCR-sequencing
TCGA
GEO
STAD
MSigDB
LASSO regression analysis
GSEA
PAM
GSVA
407TP53 and CD274A phylogenetic tree showed that metastatic clones may perform further extension to establish lymph node lesions at stations.[80]
Proteomics
LC-MS/MS
Retrospective case-matching studyPCA
WGCNA]
Elastic-net LR
Boruta
SVM
132GABARAPL2 and NAV1These two proteins displayed superior predictive value, and these differences may be used to predict GC patients with LNM.[81]
N.A.Retrospective cohortLR
RF
SVM
CART
XGB
1423N.A.This ensemble learning model exhibited an enhanced level of accuracy, achieving an AUC value of 0.86 on the test set and an AUC value of 0.892 on the external validation set.[82]
N.A.Retrospective cohortLR
GBM
Lasso
2556N.A.The GBM model may serve as a substitute for the Japanese eCura system in clinical practice.[83]
Transcriptomics
RNA-seq
RT-qPCR
TCGA
GEO
LR147SDS, TESMIN, NEB, and GRB14.Transcriptomic liquid biopsies using serum samples can correctly predict the preoperative risk of LNM.[84]
PM
Transcriptomics
RNA-seq
qRT-PCR
Retrospective cohortWGCNA
GSEA
90lncRNAs (lnc-TRIM28-14, lnc-RFNG-1)
Genes (CD93, COL3A1, and COL4A1)
lnc-TRIM28-14 expression improved the diagnostic sensitivity and specificity in GCPM.[85]
N.A.Retrospective cohortMultitask learning713N.A.By using a multitask ML approach, the TACSPR model managed to accurately estimate GCPM.[86]
Transcriptomics
RNA-seq
RT-qPCR
TCGA
GEO
LR108BUB1, CKS2, PCNA, CHEK1, NEK2, and NCAPG2Increased expression of the 6-mRNA in PM patients was identified through peripheral blood analysis of GC patients with PM.[87]
N.A.Retrospective cohortHybrid PCA and K-means clustering algorithm
SVM
LDA
LR
491N.A.Stimulated Raman Molecular Cytology demonstrated rapid and accurate detection of PM, achieving a sensitivity of 81.5%, specificity of 84.9%.[88]
Distant metastasis
Epigenomics
RNA-seq
TCGA
STAD
DNN
SVM
RF
NB
DT
LASSO
398GTF2H1, RNF5, SNRNP25, LMO4, NapA, RPL18A,
ZNF234, MSTO2P, ZNF761, and TREX1
The performance of DNN outperformed all other ML methods, achieving AUR scores of 0.999.[89]
N.A.SEERLR
XGB
RF
KNN
MLP
SVM
LASSO
1595N.A.Six models were constructed to predict the distant metastasis of GC based on six MK algorithms. The RF algorithm had the highest average AUC value.[90]
Transcriptomics
scRNA-seq
qRT-PCR
GEOUMAP
SCENIC
GENIE3
3627 (Cells)N.A.Single-cell clustering of GC samples and GC liver metastasis samples classified six major cell subpopulations. Among them, the TNK cell subpopulation showed the highest infiltration in the GC liver metastasis group. [91]
Abbreviations: GC, gastric cancer; AI, artificial intelligence; LNM, lymph node metastasis; TCGA, The Cancer Genome Atlas; GEO, Gene Expression Omnibus; scRNA-seq, single-cell RNA-sequencing; STAD, stomach adenocarcinoma; RT-PCR, Reverse Transcription-Polymerase Chain Reaction; MS, mass spectrometry; PCA, principal component analysis; LC, liquid chromatography; LR, logistic regression; CART, classification and regression tree; RF, Random Forest; SVM, support vector machine; XGB, extreme gradient boosting; GBM, gradient boosting machine; PM, peritoneal metastasis; WGCNA, Weighted Gene Co-expression Network Analysis; LDA, linear discriminant analysis; DNN, deep neural networks; NB, Naïve Bayes; DT, decision tree; LASSO, least absolute shrinkage and selection operator; SEER, Surveillance, Epidemiology, and End Results; MLP, Multilayer Perceptron; N.A., not applicable.
Table 4. Relevant studies of clinical prediction models to verify the progression of GC.
Table 4. Relevant studies of clinical prediction models to verify the progression of GC.
Clinical Prediction ModelAI AlgorithmClinical DecisionsTypical Metrics (AUC, Sensitivity, Specificity, External Validation Yes/No)Ref
Diagnostic modelSVM, LR, RF
GaussianNB
AdaBoost
ScreeningAUC = 0.99, external validation/no[39]
DIAMONDRFClassificationAUC = 88–100%, Sensitivity = 49–99%, Specificity = 49–100%, external validation/no[41]
Deconvolution scoresGCNNDiagnosisHighest accuracy = 0.69, external validation/no[45]
ALPHAON®CADDiagnosisAUC = 0.962, Sensitivity = 0.93, Specificity = 0.87, external validation/yes[47]
XHGC20XCBDiagnosisAUC = 0.901, Sensitivity = 0.83, Specificity = 0.806
external validation/no
[48]
GRAPEDL frameworkDiagnosisAUC = 0.927, Sensitivity = 0.817, Specificity = 0.905
external validation/yes
[50]
ConVit models
YOLO model
CNNsDiagnosis
Classification
AUROC = 0.988, AUPRC = 0.9769, Accuracy = 0.9514
external validation/yes
[57]
EBVnetCNNDiagnosisAUROC = 0.895
external validation/yes
[60]
Ensemble learning modelRF, LR, SVM, CART, XGBPrediction (LNM)AUC = 0.892, Sensitivity = 0.844, Specificity = 0.768
external validation/no
[82]
eCura systemLR, GBMPrediction (LNM)AUC = 0.796, Sensitivity = 0.958, Specificity = 0.788
external validation/yes
[83]
TACSPR modelMultitask MLPrediction (PM)AUC = 0.746
external validation/yes
[86]
Prediction modelDNN
SVM, RF, NB
DT, LASSO
Prediction (Distant metastasis)AUC = 0.999, AUPRC = 0.995
external validation/no
[89]
Abbreviations: GC, gastric cancer; GCNN, graph convolutional neural network; AUC, area under the curve; AUPRC, area under precision-recall curve; AUROC, area under receiver operating characteristics; CAD, computer-aided detection; DL, deep learning; LR, logistic regression; CART, classification and regression tree; RF, Random Forest; SVM, support vector machine; XGB, extreme gradient boosting; GBM, gradient boosting machine; LNM, lymph node metastasis; PM, peritoneal metastasis; DNN, deep neural networks; NB, Naïve Bayes; DT, decision tree; LASSO, least absolute shrinkage and selection operator.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Matsuoka, T.; Yashiro, M. Artificial Intelligence and Bioinformatics in the Malignant Progression of Gastric Cancer. Appl. Sci. 2025, 15, 11092. https://doi.org/10.3390/app152011092

AMA Style

Matsuoka T, Yashiro M. Artificial Intelligence and Bioinformatics in the Malignant Progression of Gastric Cancer. Applied Sciences. 2025; 15(20):11092. https://doi.org/10.3390/app152011092

Chicago/Turabian Style

Matsuoka, Tasuku, and Masakazu Yashiro. 2025. "Artificial Intelligence and Bioinformatics in the Malignant Progression of Gastric Cancer" Applied Sciences 15, no. 20: 11092. https://doi.org/10.3390/app152011092

APA Style

Matsuoka, T., & Yashiro, M. (2025). Artificial Intelligence and Bioinformatics in the Malignant Progression of Gastric Cancer. Applied Sciences, 15(20), 11092. https://doi.org/10.3390/app152011092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop