Identification of a Two-Gene Signature and Establishment of a Prognostic Nomogram Predicting Overall Survival in Diffuse-Type Gastric Cancer

Background: It is widely acknowledged that the molecular biological characteristics of diffuse-type gastric cancer are different from intestinal-type gastric cancer. Notwithstanding that significant progress in high-throughput sequencing technology has been made, there is a paucity of effective prognostic biomarkers for diffuse gastric cancer for clinical practice. Methods: We downloaded four GEO datasets (GSE22377, GSE38749, GSE47007 and GSE62254) to establish and validate a prognostic two-gene signature for diffuse gastric cancer. The TGCA-STAD dataset was used for external validation. The optimal gene signature was established by using Cox regression analysis. Receiver operating characteristic (ROC) methodology was used to find the best prognostic model. Gene set enrichment analysis was used to analyze the possible signaling pathways of the two genes (MEF2C and TRIM15). Results: A total of four differently expressed genes (DEGs) (two upregulated and two downregulated) were identified. After a comprehensive analysis, two DEGs (MEF2C and TRIM15) were utilized to construct a prognostic model. A prognostic prediction model was constructed according to T stage, N stage, M stage and the expression of MEF2C and TRIM15. The area under the time-dependent receiver operator characteristic was used to evaluate the performance of the prognosis model in the GSE62254 dataset. Conclusions: We demonstrated that MEF2C and TRIM15 might be key genes. We also established a prognostic nomogram based on the two-gene signature that yielded a good performance for predicting overall survival in diffuse-type gastric cancer.


Introduction
Gastric cancer (GC) is the third most common cause of cancer-related death globally, with a low 5-year survival rate [1]. Indeed, in recent years, with the establishment of multidisciplinary team (MDT) care, GC treatment has substantially improved. Albeit multiple therapeutic approaches are available currently, it is difficult to accurately determine the optimal treatment for an individual gastric cancer (GC) patient due to clinical and genetic heterogeneity [2]. The Lauren classification is an internationally recognized histopathological classification that sorts GC into three subtypes: intestinal type, diffuse type and mixed type [3]. However, several studies have shown significant heterogeneity in biological behavior among the three subtypes of gastric cancers [4][5][6][7]. Importantly, diffuse-type

Clinical Correlation Analysis and Biological Process Prediction
We extracted the clinical information from the GSE62254 dataset. SPSS 24.0 was used to perform a Chi-square test between every single gene and clinical pathological characteristics. A p value less than 0.05 was considered statistically significant. To understand the biological process of the identified prognostic gene signatures, gene set enrichment analysis (GSEA) was performed by using a Java GSEA desktop application (downloaded from http://www.broad.mit.edu/gsea (accessed on 1 June 2022)). The GSE62254 samples were divided into high-and low-expression groups according to the median value. The GSE62254 dataset was analyzed with the GTM file (c2.KEGG.v6.2) to identify enriched KEGG pathways. Four files containing expression datasets, gene sets, phenotype labels and chip platforms were required for running GSEA. |NES| > 1 and FDR < 0.25 were considered statistically significant.

The Establishment of the Predictive Nomogram
After testing for collinearity, prognostic gene signatures and relevant clinical parameters were included to establish a prognostic nomogram via a stepwise Cox regression model to predict the 1-, 3-and 5-year overall survival of diffuse-type gastric cancer patients in the GSE62254 dataset. A time-dependent ROC curve, Harrell's concordance index and a calibration curve were utilized to assess the performance of the prognostic nomogram. Decision curve analysis was used to evaluate the net benefit of the program compared with TNM staging alone.
Based on the prognostic nomogram, the diffuse-type gastric cancer samples from the GSE62254 dataset were assigned to high-risk and low-risk score groups according to the median risk score. Kaplan-Meier analysis was performed to demonstrate the relationship between risk score and overall survival time by using the "survival" package. A log-rank test was used to distinguish the differences between groups.

External Validation of Two-Gene Signature-Based Nomogram by TCGA Dataset
To further confirm the prediction value of the two-gene signature nomogram, we performed ROC analysis to show the predictive performance of TNM staging and the nomogram-based model. Kaplan-Meier analysis and the log-rank test were applied to demonstrate the survival difference between the high-risk group and low-risk group. Decision curve analysis was also utilized to quantify the clinical benefits of the nomogram at different threshold probabilities. The above analyses of external validation were performed by using the TCGA-STAD dataset.

Identification of Diffuse-Type Gastric Cancer-Specific Gene Signatures
The flowchart of the screening process used in our study to identify diffuse-type gastric cancer gene signatures is shown in Figure 1. The details of the GEO datasets included in this study are displayed in Table 1. A total of 991 (GSE22377), 166 (GSE38749) and 171 (GSE47007) DEGs were identified between the diffuse-type and intestinal-type gastric cancer datasets. Two genes (COL4A3, MEF2C) were highly expressed in diffuse-type gastric cancer, whereas two (TRIM15, MMP12) were lowly expressed in diffuse-type gastric cancer.

Validation of the Expression Level of Four Differentiated Expressed Genes
In this study, the expression level of the four DEGs identified was validated in a large dataset (GSE62254). Two upregulated and one downregulated gene were identified ( Figure  2B).

Validation of the Expression Level of Four Differentiated Expressed Genes
In this study, the expression level of the four DEGs identified was validated in a large dataset (GSE62254). Two upregulated and one downregulated gene were identified ( Figure 2B).

Clinical Correlation Analysis of Three DEGs
Detailed clinical information of 134 patients from the GSE62254 dataset was extracted. A Chi-square test was used to evaluate the relationship between the three DEGs and clinical pathological characteristics (Table 2). In brief, the expression level of COL4A3 and TRIM15 were significantly correlated with the T stage and age, respectively, while the expression level of MEF2C was significantly correlated with age, T stage and TNM stage.

Clinical Correlation Analysis of Three DEGs
Detailed clinical information of 134 patients from the GSE62254 dataset was extracted. A Chi-square test was used to evaluate the relationship between the three DEGs and clinical pathological characteristics ( Table 2). In brief, the expression level of COL4A3 and TRIM15 were significantly correlated with the T stage and age, respectively, while the expression level of MEF2C was significantly correlated with age, T stage and TNM stage.

Kaplan-Meier Analysis and Evaluation of Prognostic Factors in Diffuse-Type Gastric Cancer
The survival information of COL4A3, MEF2C and TRIM15 was freely obtained in Kaplan-Meier Plotter. In this study, we assessed the difference between the expression level of the three DEGs and overall survival in diffuse-type gastric cancer. Notably, in the results of Kaplan-Meier Plotter, we used the best cutoff value of COL4A3, MEF2C and TRIM15 expression to divide diffuse-type gastric cancer patients into a high expression-and low expression-group, respectively. It was found that the high expression of COL4A3 and MEF2C and the low expression of TRIM15 were associated with worse OS for diffuse-type gastric cancer patients ( Figure 3). However, the false discovery rate (FDR) was 50%, 50% and over 50% for the survival difference of COL4A3, MEF2C and TRIM15, respectively. These FDR values were high. Thus, it made it difficult for us to evaluate the prognostic value of these genes.  II  8  26  11  23  21  13  III  23  26  33  16  27  22  IV  23  23  29  17  23 23 * p value < 0.05; ** p value < 0.01.

Kaplan-Meier Analysis and Evaluation of Prognostic Factors in Diffuse-Type Gastric Cancer
The survival information of COL4A3, MEF2C and TRIM15 was freely obtained in Kaplan-Meier Plotter. In this study, we assessed the difference between the expression level of the three DEGs and overall survival in diffuse-type gastric cancer. Notably, in the results of Kaplan-Meier Plotter, we used the best cutoff value of COL4A3, MEF2C and TRIM15 expression to divide diffuse-type gastric cancer patients into a high expressionand low expression-group, respectively. It was found that the high expression of COL4A3 and MEF2C and the low expression of TRIM15 were associated with worse OS for diffusetype gastric cancer patients ( Figure 3). However, the false discovery rate (FDR) was 50%, 50% and over 50% for the survival difference of COL4A3, MEF2C and TRIM15, respectively. These FDR values were high. Thus, it made it difficult for us to evaluate the prognostic value of these genes.  A univariate Cox regression analysis was performed to evaluate the prognostic value and identify the risk factors in diffuse-type gastric cancer. The results of the univariate Cox regression analysis demonstrated that T stage (p < 0.01), N stage (p < 0.01), M stage (p < 0.001) and the expression level of MEF2C (p < 0.01) and TRIM15 (p < 0.05) were significantly correlated with overall survival in diffuse-type gastric cancer (Table 3).

Establishment of the Prognostic Nomogram of Diffuse-Type Gastric Cancer
The clinical information of the 134 diffuse-type gastric cancer patients from the GSE62254 dataset was used to construct a prognostic nomogram for predicting 1-, 3-, 5-year overall survival based on a stepwise Cox regression model ( Figure 4A). T stage, N stage, M stage and the expression of TRIM15 and MEF2C were parameters included in the nomogram. The calibration curves showed a good consistency between the actual and the nomogram-predicted 1-, 3-and 5-year overall survival probabilities ( Figure 4B). The risk score was calculated as Formula (1): When T stage is T2, T3 or T4, the value of β T is 0, 0.3812 or −0.2084, respectively.
When N stage is N0, N1, N2 or N3, the value of β N is 0, 0.8741, 1.4293 or 2.7244, respectively. When M stage is M0 or M1, the value of β M is 0 or 0.994, respectively.

Evaluation the Predictive Performance of Nomogram and External Validation of Nomogram
The AUC of the predicted 1-, 3-and 5-year overall survival were 0.82, 0.84 and 0.87, respectively ( Figure 5A). When the seventh AJCC TNM stage was used, the AUC values for the 1-, 3-and 5-year overall survival predictions were 0.79, 0.79 and 0.84, respectively ( Figure 5A). As seen in Figure 4A, the calculated overall score could estimate the survival prognosis (1-, 3-and 5-year survival probabilities), and the C-index of our nomogram model was 0.766 (95% CI = 0.711 ~ 0.821). The Kaplan-Meier analysis showed a significant

Evaluation the Predictive Performance of Nomogram and External Validation of Nomogram
The AUC of the predicted 1-, 3-and 5-year overall survival were 0.82, 0.84 and 0.87, respectively ( Figure 5A). When the seventh AJCC TNM stage was used, the AUC values for the 1-, 3-and 5-year overall survival predictions were 0.79, 0.79 and 0.84, respectively ( Figure 5A). As seen in Figure 4A, the calculated overall score could estimate the survival prognosis (1-, 3-and 5-year survival probabilities), and the C-index of our nomogram model was 0.766 (95% CI = 0.711~0.821). The Kaplan-Meier analysis showed a significant difference in prognostic value between the high-risk and low-risk groups ( Figure 5B). Decision curve analysis showed the effectiveness of the nomogram was better than the seventh AJCC TNM staging system.
Curr. Oncol. 2023, 30, FOR PEER REVIEW 9 dataset. Also, the patients in the low-risk group had a more favorable overall prognosis than the high-risk group in the validation set ( Figure 5E). Similarly, decision curve analysis also demonstrated the net benefit of the nomogram was better than the TNM staging system in the TCGA-STAD dataset ( Figure 5F). Therefore, the nomogram showed better discriminatory ability than the seventh AJCC TNM classification.

Scatter Point
The patients were divided into two groups according to the scoring of the nomogram ( Figure 6A). We utilized a scatter plot to reveal the relationship between the level of risk score and the overall survival of diffuse-type GC. As a result, the high-risk group exhibited significantly poorer overall survival ( Figure 6B). For external validation, using the TCGA-STAD dataset, the results of ROC curves, Kaplan-Meier analysis and decision curve analysis were similar with the training set (GSE62254). The AUC of predicted 1-, 3-and 5-year overall survival were 0.61, 0.69 and 0.77, which were better than the TNM stating system ( Figure 5D) in the TCGA-STAD dataset. Also, the patients in the low-risk group had a more favorable overall prognosis than the high-risk group in the validation set ( Figure 5E). Similarly, decision curve analysis also demonstrated the net benefit of the nomogram was better than the TNM staging system in the TCGA-STAD dataset ( Figure 5F). Therefore, the nomogram showed better discriminatory ability than the seventh AJCC TNM classification.

Scatter Point
The patients were divided into two groups according to the scoring of the nomogram ( Figure 6A). We utilized a scatter plot to reveal the relationship between the level of risk score and the overall survival of diffuse-type GC. As a result, the high-risk group exhibited significantly poorer overall survival ( Figure 6B).

Discussion
Compared to intestinal-type gastric cancer, diffuse-type gastric cancer exhibits a more aggressive phenotype with a relatively poor prognosis and a 5-year overall survival rate of 32.1% [15]. It is widely acknowledged that the treatment failure of diffuse-type gastric cancer is due to drug resistance and disease progression, including tumor recurrence and metastasis. The prognostic model is important to clinicians to provide individualized treatment by determining which patients would benefit most from a particular or a combination of treatment approaches, including radical surgery, adjuvant chemotherapy, neoadjuvant chemotherapy, targeted molecular medicine or immunotherapy. However, today the big problem is that the prognostic model based on clinical characteristics and histopathological characteristics is not accurate [16,17]. Accordingly, it is important to develop a novel prognostic model for improving patient management by stratifying patients according to their characteristics.
In the present study, we constructed a nomogram that incorporated a two-gene (MEF2C and TRIM15) signature and clinicopathological parameters to assist clinicians in determining the prognosis of individual diffuse-type GC patients. The sensitivity and specificity of our prognostic model were more satisfying than the TNM staging system ( Figure 5). Gene set enrichment analysis showed that MEF2C and TRIM15 were closely related to invasive and metastasis signaling pathways in diffuse-type gastric cancer. The MAPK signaling pathway was the most significant in the high MEF2C expression group of diffuse-type gastric cancer ( Figure 7A). Moreover, the glycosaminoglycan biosynthesis chondroitin sulfate signaling pathway was the most enriched in the low TRIM15 expression group of diffuse-type gastric cancer ( Figure 7B). MEF2C upregulation and TRIM15 downregulation in diffuse-type gastric cancer were related to poor prognoses.
In recent years, multiple gene signatures, mRNAs or non-coding RNAs have been used to evaluate the prognosis of gastric cancer patients [18][19][20]. Nevertheless, rare studies have focused on the Lauren subtype-specific gene signature to evaluate the prognosis of diffuse-type gastric cancer. Moreover, few studies have sought to combine the TNM stage with the multi-gene signature to assess the prognosis of diffuse-type gastric cancer. A previous study reported a three-gene signature to predict the prognosis of diffusetype gastric cancer [21]. However, the prognostic model only considered the three-gene expression level, but lacked the clinical parameters of the diffuse-type gastric cancer patients. The TNM staging system only considers tumor invasion depth, lymph node metastasis and distant metastasis. The biological characteristics of the tumors, such as the immune infiltration status, drug response and intracellular signal pathways are not reflected in the TNM staging system. However, the genomic sequence of the tumor is an effective tool to uncover heterogeneous malignance [22,23]. Several tumor biomarkers can help guide treatment decisions, including Human Epidermal Growth Factor Receptor-2 (HER2), Programmed Cell Death-Ligand 1 (PDL1) and Vascular Endothelial Growth Factor Receptor (VEGFR) [24][25][26]. Accordingly, in the current study, we identified risk factors, including age, T stage, N stage, M stage, the expression level of TRIM15 and MEF2C and established a prognostic model. Finally, a nomogram integrating a two-gene signature and clinicopathologic features was constructed and yielded an accurate prediction of overall survival. Through ROC curves, Kaplan-Meier analysis and decision curve analysis of the external validation in the TCGA-STAD dataset, as a supplement to AJCC staging, our two-gene signature and nomogram demonstrated a similar predictive performance with the training set ( Figure 5). Our predictive nomogram will exhibit a potential value of diffuse-type GC in future clinical practice. Similarly, several previous studies integrated clinical features and risk scores based on the expression level of risk genes into a novel prognostic nomogram [27][28][29]. The predictive value of their integrated nomograms was also better than using the risk factor alone. These studies and our present study have a certain reference significance for future clinical research.
MEF2C and TRIM15 have previously been reported to be associated with gastric cancer. Interestingly, Myocyte Enhancer Factor 2C (MEF2C) has been documented in pathways of organelle biogenesis and maintenance and transcriptional misregulation in cancer, which involved DNA-binding transcription factor activity and protein heterodimerization activity. MEF2C has been associated with DNA methylation and enhanced PD-L1 expression in gastric cancer [30,31]. Recent studies have also shown that MEF2C plays an important role in myocilin mediating cancer-induced muscle wasting and cachexia in cancer patients [32] and regulates chemotherapeutic resistance [33] and the disease progression of acute myeloid leukemia [34]. TRIM15 is a member of the tripartite motif (TRIM) family. The protein encoded by TRIM15 has a TRIM motif, including three zinc-binding domains, a RING, a B-box type 1, a B-box type 2 and a coiled-coil region. However, the biological function of TRIM15 remains unknown. Our GSEA results showed that TRIM15 was correlated with the glycosaminoglycan biosynthesis chondroitin sulfate signaling pathway in diffuse-type gastric cancer. Importantly, a recent study has found that the expression of TRIM15 is an independent risk factor of prognosis in gastric cancer patients [35]. However, the roles of the MEF2C and TRIM15 genes in diffuse-type gastric cancer are still unclear. Our current study disclosed that MEF2C and TRIM15 could promote invasion and metastasis through cancer-related signaling pathways: the MEF2C-activated MAPK signaling pathway and the TRIM15-activated glycosaminoglycan biosynthesis chondroitin sulfate signaling pathway ( Figure 7). Moreover, our studies revealed poor prognoses associated with upregulated MEF2C and downregulated TRIM15 expression in diffuse-type gastric cancer. Accordingly, the present research revealed the roles of these two genes in diffuse-type gastric cancer and established a risk model to complement the AJCC staging system to improve the outcomes of diffuse-type gastric cancer. However, further in vivo studies are needed to explore the molecular mechanism underlying the oncological function of these two genes in diffuse-type gastric cancer.
This study contains several limitations. First, our study was based on RNA sequence data rather than proteomics, which could have affected the accuracy of our prediction model. Accordingly, the expression of these two genes should be analyzed in another study with a large sample of diffuse-type gastric cancer patients to validate the predictive performance of our model. Furthermore, it may be hard to promote the utilization of multi-genome sequencing during clinical practice due to its high price and practicability. With the development of sequencing technology and precision medicine, the identified two-gene signature will be clinically feasible. Moreover, our predictive model should be externally validated with another large sample of diffuse-type gastric cancer patients.

Conclusions
In summary, we constructed a nomogram that incorporated a two-gene signature and clinicopathological parameters to assist clinicians in determining the prognosis of individual GC patients. Our nomogram is simple to use and can be harnessed to provide optimal treatment and make medical decisions. To the best of our knowledge, the two-gene prognostic signature described and the nomogram constructed have not been reported previously. The current study provides a new perspective of the molecular mechanisms underlying prognosis prediction in diffuse-type gastric cancer. In addition, MEF2C and TRIM15 were obtained by a pooled analysis of multiple datasets and are accordingly highly reliable. Importantly, these two genes may be potential molecular targets for the treatment of diffuse-type gastric cancer.