.
1. Introduction
Gastric cancer (GC), a highly prevalent neoplastic disorder, occupies the fifth position in global cancer incidence and the third in cancer-related mortality, with an estimated annual toll of approximately 800,000 fatalities [
1]. The discovery and validation of GC biomarkers are of paramount importance for the precise diagnosis and prognostic stratification of patients. Recent advancements in the identification of biomarkers for GC are indeed significant; for instance, the 12-miR analyzer demonstrated a remarkable 87% sensitivity and an exceptional 93.9% specificity within a prospective cohort encompassing 4566 patients [
2]. However, the identification of novel biomarkers remains a formidable challenge, attributable to the intrinsic complexity and heterogeneity characterizing GC [
3].
With the advent of advanced molecular biology techniques and a more profound comprehension of the mechanisms underlying tumorigenesis, a plethora of methodologies has been developed to identify GC biomarkers. Yang et al. [
4] employed Kaplan–Meier and Cox regression analyses to reveal that the over-expression of HAMP could serve as an independent prognostic biomarker for GC patients. Similarly, Azari et al. [
5] discovered that elevated expression levels of Mir21, Mir133a, Mir146b, and Mir29c were correlated with higher mortality rates and could potentially serve as early detection biomarkers in early-stage GC patients using the support vector machine (SVM) algorithm. However, these methodologies, which focus on single-class biomarkers, are limited in their ability to capture the intricate interactions among different types of biomarkers. Recent studies have highlighted the potential of competitive endogenous RNA (ceRNA) networks to significantly enhance the predictive power of disease biomarkers [
6,
7,
8]. Recent studies have highlighted the potential of ceRNA networks in significantly enhancing the prediction of disease biomarkers. This approach emphasizes the role of messenger RNAs (mRNAs) and long non-coding RNAs (lncRNAs) in competitively binding microRNAs (miRNAs), forming a complex regulatory network that modulates gene expression through ceRNA interactions. By integrating these complex interactions, it is possible to identify key regulatory RNAs involved in the pathogenesis of GC, offering innovative avenues for biomarker discovery and development.
Furthermore, the advent of artificial intelligence, particularly graph neural networks (GNNs), has heralded a powerful tool for the discovery of network-based biomarkers [
9,
10,
11]. For instance, EMOGI [
12] integrates protein–protein interaction (PPI) networks with multi-omics data to identify cancer driver genes. Similarly, MOGONET [
13] leverages multi-omics data to identify cancer biomarkers through the application of GNNs. However, these methodologies predominantly utilize homogeneous graphs derived from PPI networks, which limits their applicability to heterogeneous graphs, such as those found in ceRNA networks. In the context of heterogeneous graphs, Gao’s graph autoencoder has demonstrated potential in predicting associations between lncRNA-protein coding gene pairs [
14], while Peng et al. [
15] constructed three heterogeneous networks to identify cancer driver genes using graph convolution networks. Nevertheless, these methods are constrained in their capacity to uncover multi-RNA type biomarkers, suggesting a need for more sophisticated approaches that can fully exploit the complexity and heterogeneity of biological networks to enhance the discovery of innovative biomarkers.
In this study, we introduce the GC biomarker relation graph convolutional network (GCBRGCN) model, which seamlessly integrates ceRNA networks with clinical informations and whole transcriptomics data specific to GC. Employing the relational graph convolutional network (RGCN), our model is designed to predict GC biomarkers with enhanced accuracy. Our approach innovatively consolidates various RNA types within the biomarker identification process, thereby transcending the limitations inherent in current methodologies when confronting the intricacies of biological networks. We propose a novel and efficacious strategy for the detection of potential biomarkers within the domain of GC research. This strategy not only highlights the potential of multi-RNA type analyses in oncological studies but also underscores the importance of integrating diverse data types to achieve a more comprehensive understanding of the molecular underpinnings of GC.
4. Discussion
The significance of different RNA-type biomarkers in GC diagnostics and prognostics is garnering increasing attention within the scientific community. However, the development of comprehensive, multi-molecular biomarker prediction models in this field remains a less-researched area. Addressing this research gap, our study introduces the GCBRGCN model, which has shown superior performance in the discovery of novel GC biomarkers, outperforming both heterogeneous and traditional GNNs as well as conventional machine learning algorithms.
The GCBRGCN model constructs a network based on ceRNA interactions and incorporates critical transcriptomic features alongside pertinent clinical informations, including tumor staging and survival duration. By harnessing the complexity of these interactions through sophisticated heterogeneous GNN analysis, GCBRGCN is capable of uncovering potential novel biomarkers and deciphering their biological significance. This approach offers novel insights into the identification of new GC diagnostic and prognostic biomarkers.
We have identified Mir335, CCNG1, HMGA2, SNHG14, and CITED2 as potential new diagnostic biomarkers for GC. Mir335, recognized as a tumor suppressor gene in GC, is significantly down-regulated in GC compared to normal gastric tissue [
31,
32]. Furthermore, CCNG1, which is up-regulated in various tumor tissues [
33,
34], has not yet been identified as a GC diagnosis biomarker. However, our analysis revealed that CCNG1 is regulated by Mir181a and MEG3, which are well-established GC biomarkers [
35]. Notably, the expression of MEG3 is decreased in GC patients, and it can up-regulate BCL2 by competitively binding to the Mir181a family, thereby inhibiting the onset of GC [
36,
37]. Additionally, it has been reported that high levels of HMGA2 are significantly correlated with lymphatic vessel invasion, perineural invasion, and TNM stage [
38]. In another study, Dai et al. demonstrated that elevated levels of SNHG14, confirmed by sequencing and qRT-PCR, enhance GC proliferation, invasion, and migration, as demonstrated in vitro and in animal models [
39]. Lastly, resistance to anthracycline chemotherapy drugs can be overcome by reactivating the epigenetically silenced CITED2 gene, thus providing a new strategy for GC treatment to enhance chemotherapy sensitivity and drug reactivity [
40].
SRF, CRKL, CYP1B1 and CDH2 were identified as new potential prognostic biomarkers for GC. Firstly, SRF, a downstream target of the MAPK/ERK signaling pathway, promotes cell proliferation and the development of GC metastasis [
41]. Additionally, SRF fosters the proliferation and invasion of GC cells by suppressing the expression of HOTAIR [
29]. Secondly, CRKL, a substrate of BCR-ABL tyrosine kinase, is involved in the transformation process of BCR-ABL into fibroblasts [
42]. Mir335 targets CRKL, thereby inhibiting the migration, invasion, and proliferation of tumor cells, arresting the cell cycle at the G0/G1 phase, and promoting apoptosis in GC cells [
43]. Thirdly, CYP1B1, a member of the cytochrome P450 supergene family, plays a role in the regulation of several crucial transcription factors through the oxidation and metabolism of various carcinogenic precursors and anticancer drugs [
26]. Finally, MFGE8 induces the expression of SNHG14, which in turn promotes the cellular epithelial–mesenchymal transition by stabilizing CDH2 [
39].
Notably, FOXC1 and LINC00324 could serve as both prognostic and diagnostic biomarkers. Jiang’s research has shown that increased FOXC1 expression in GC patients is associated with a poor prognosis. At the molecular level, FOXC1 promotes the nuclear translocation of unphosphorylated catenin, which then upregulates c-MYC expression, thereby driving GC cell proliferation [
44]. Additionally, Zou discovered that the expression level of LINC00324 is significantly higher in GC tissues than in corresponding normal tissues. The over-expression of LINC00324 correlates with an advanced TNM stage, larger tumor size, lymph node metastasis, and a poor prognosis [
45].
It is important to recognize that our study does have certain limitations. Firstly, the integration of multiple RNA molecules and clinical information introduces complexities that may affect the performance of our model. Moreover, the discrepancies between whole transcriptomics data and cancer parameters present challenges in sourcing external test datasets to validate our model, thereby constraining its generalizability.