Extracellular Matrix-Related Hubs Genes Have Adverse Effects on Gastric Adenocarcinoma Prognosis Based on Bioinformatics Analysis

Gastric adenocarcinoma (GAC) is the most frequent type of stomach cancer, characterized by high heterogeneity and phenotypic diversity. Although many novel strategies have been developed for treating GAC, recurrence and metastasis rates are still high. Therefore, it is necessary to screen new potential biomarkers correlated with prognosis and novel molecular targets. Gene expression profiles were obtained from the from NCBI Gene Expression Omnibus (GEO) database. We conduct an integrated analysis using the online Venny website to explore candidate hub genes between differentially expressed genes (DEGs) of two datasets. Gene ontology (GO) and Kyoto Encyclopedia 18 of Genes and Genomes (KEGG) pathway enrichment analysis found that extracellular matrix plays an important role in GAC. In addition, we applied protein-protein interaction (PPI) network analysis by using the Search Tool for the Retrieval of Interacting Genes (STRING) and visualized with Cytoscape software. Furthermore, we employed Cytoscape software to analyze the interactive relationship of candidate gene for further analysis. We found that ECM related proteins played an important role in GAC, and 15 hub genes were extracted from 123 DEGs genes. There were four hub genes (bgn, vcan, col1a1 and timp1) predicted to be associated with poor prognosis among the 15 hub genes.


Introduction
Stomach cancer, one of the most common malignancies, is the third leading cause of cancer-related death [1]. As the most frequent type of stomach cancer, gastric adenocarcinoma (GAC) causes a significant public health burden with a 5-year overall survival (OS) rate of less than 30% [2]. There are four subtypes of gastric cancer identified by TCGA Project: tumors positive for Epstein-Barr virus, microsatellite unstable tumors, genomically stable tumours and tumors with chromosomal instability [3]. To date, various genetic factors have been reported to be associated with the pathogenesis of GAC. There are several treatment options for GAC including anti-her2 therapy, anti-VEGF therapy, anti-EGFR therapy and anti-FGFR-2 therapy; however, their efficiency is hampered by toxicities [4]. A number of studies have been conducted to investigate the pathogenesis of GAC. Nevertheless, the exact mechanisms are still not well defined. Using bioinformatics analysis techniques, some potential biomarkers can be found through big database analysis.
Public bioinformatics databases such as GEO and the cancer genome atlas (TCGA) allow the acquisition of gene expression profiles, which can be followed by functional analysis to screen significant genetic or epigenetic variations occurring in carcinogenesis. Indeed, bioinformatic methods are promising for the screening of potential biomarkers that are feasible for clinical diagnosis and prognosis evaluation. In the past five years, Genes 2021, 12, 1104 2 of 12 many researchers have studied gastric cancer through multiomics data [5][6][7][8][9]. Through the analysis of multi-omics data, tumor immune signals and microenvironment can be reshaped during neoadjuvant chemotherapy. C10ORF71 mutation affects the efficacy of neoadjuvant chemotherapy for gastric cancer [5]. The efficacy of pembrolizumab in GC patients can be assessed using tumor microenvironment evaluation through multiomics data analysis [6]. The expression of some extracellular matrix (ECM) related genes has been associated with poor prognosis in many cancers [10,11]. Unfortunately, few studies have reported the effects of ECM on the pathogenesis and prognosis of GAC.
In this study, we investigated the molecular signatures and transcription networks of differentially expressed genes (DEGs). Additionally, we explored the correlation and overall survival (OS) of stomach adenocarcinoma (STAD) based on hub genes of the TCGA database at GEPIA website. The results may provide new light on the biomarkers of GAC derived from bioinformatics analyses.

Microarray Data Collection
Two gene expression profiles (i.e., GSE103236 and GSE96668) were downloaded from the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/, accessed on 27 July 2020). The array data of GSE103236 consisted of 11 cancer samples and nine normal adjacent tissue samples from GAC patients [12]. GSE96668 was consisted of 60 samples including 49 gastro-esophageal adenocarcinoma samples and 11 noncancer control samples [13].

Data Processing
For the data comparison between GSE103236 and GSE96668, an interactive web tool, GEO2R, was utilized based on the R programming language, and we selected the Benjamini & Hochberg (false discovery rate) as the p-value adjustment option [14]. The adjusted p-value < 0.05 and |log Fold Change (FC)| ≥ 1 were used as the cutoff criteria for statistical analysis of each dataset.

Identification of Coexpression Modules
In order to identify the intersection of nodes among the GSE103236 and GSE96668 DEGs genes, we further employed an online Venn diagram (http://bioinfogp.cnb.csic.es/ tools/venny/, accessed on 27 July 2020). As a result, the DEGs, including coupregulated genes and t codownregulated genes, were identified between the two databases.

Functional and Pathway Enrichment Analysis
To better explore the biological significance of these co-DEGs, we performed the Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis by using the online Database for Annotation, Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/summary.jsp, accessed on 27 September 2020). The results of Biological Process (BP), cellular component (CC), molecular function (MF) and KEGG pathway were download and further visualized by using R software 3.6.3 (https://www.r-project.org/, accessed on 27 September 2020).

PPI Network Construction and Analysis
The Search Tool for the Retrieval of Interacting Genes (STRING) (version 11.0) (http://www.string-db.org/, accessed on 27 September 2020) was applied to analyze the network among the DEGs proteins. Then, the PPI networks for the DEGs genes were visualized by Cytoscape software (version 3.8.0). Genes with degrees >10 were selected as hub genes among the PPI networks. The molecular complex detection (MCODE) plugin was used to create the modules in Cytoscape with the following parameters: Degree cutoff of 2, node score cutoff of 0.2, k-core of 3, and max. depth of 100.

The Expression and Survival Analysis of Hub Genes
Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn, accessed on 27 September 2020) was utilized to validate the expression of hub genes in TCGA-STAD tumor samples [15]. To further analyze the effects of the hub genes on GAC patient prognosis survival, the GEPIA website was selected to draw overall survival (OS) curves. Meanwhile, we selected quartile as the group cutoff. Genes that significantly affect the prognosis survival of GAC patients were further analyzed for correlation with each other in the GEPIA website. The protein levels of these genes were obtained from the Human Protein Atlas (https://www.proteinatlas.org/, accessed on 26 January 2021).

DEGs and Clusters
In total, 501 genes were extracted from the GSE103236 dataset and 904 genes were extracted from the GSE96668 datasets, respectively ( Figure 1). In the GSE103236 data, 331 genes were upregulated and 170 genes were downregulated. In GSE96668 dataset, 416 genes were upregulated and 488 genes were downregulated. A total of 123 DEGs were identified in the two datasets, including 75 upregulated and 48 downregulated DEGs (Table 1, Figure 2).
(http://www.string-db.org/, accessed on 27 September 2020) was applied to analyze the network among the DEGs proteins. Then, the PPI networks for the DEGs genes were visualized by Cytoscape software (version 3.8.0). Genes with degrees >10 were selected as hub genes among the PPI networks. The molecular complex detection (MCODE) plugin was used to create the modules in Cytoscape with the following parameters: Degree cutoff of 2, node score cutoff of 0.2, k-core of 3, and max. depth of 100.

The Expression and Survival Analysis of Hub Genes
Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn, accessed on 27 September 2020) was utilized to validate the expression of hub genes in TCGA-STAD tumor samples [15]. To further analyze the effects of the hub genes on GAC patient prognosis survival, the GEPIA website was selected to draw overall survival (OS) curves. Meanwhile, we selected quartile as the group cutoff. Genes that significantly affect the prognosis survival of GAC patients were further analyzed for correlation with each other in the GEPIA website. The protein levels of these genes were obtained from the Human Protein Atlas (https://www.proteinatlas.org/, accessed on 26 January 2021).

DEGs and Clusters
In total, 501 genes were extracted from the GSE103236 dataset and 904 genes were extracted from the GSE96668 datasets, respectively ( Figure 1). In the GSE103236 data, 331 genes were upregulated and 170 genes were downregulated. In GSE96668 dataset, 416 genes were upregulated and 488 genes were downregulated. A total of 123 DEGs were identified in the two datasets, including 75 upregulated and 48 downregulated DEGs (Table 1, Figure 2).

Functional and Pathway Enrichment Analyses
The top 10 significant enrichment results of the GO term were classified into three parts: biological process group, molecular function group and cellular component group ( Figure 3A-C). A p-value < 0.05 was regarded as the threshold value. The enriched GO terms and gene lists with the top 10 significant enrichment results are shown in Table 2. The first three terms sorted by count are marked in black. The biological processes GO categories were mainly involved in cell adhesion, positive regulation of cell proliferation

Functional and Pathway Enrichment Analyses
The top 10 significant enrichment results of the GO term were classified into three parts: biological process group, molecular function group and cellular component group ( Figure 3A-C). A p-value < 0.05 was regarded as the threshold value. The enriched GO terms and gene lists with the top 10 significant enrichment results are shown in Table 2. The first three terms sorted by count are marked in black. The biological processes GO categories were mainly involved in cell adhesion, positive regulation of cell proliferation and negative regulation of apoptotic process ( Figure 3A). The GO molecular function analysis revealed enrichment in protein binding, calcium ion binding and chromatin binding ( Figure 3B). Moreover, in the cellular components category, the co-DEGs were mainly enriched in extracellular exosome, extracellular space and extracellular region ( Figure 3C). The top three enriched KEGG pathways of DEGs were the PI3K-Akt signaling pathway, ECM-receptor interaction and focal adhesion ( Figure 3D, Table 3).

PPI Network and Module Analyses
In order to understand the relationship and interaction between the co-DEGs of the two datasets, we used the online STRING website to analyze the PPI network of co-DEGs-

PPI Network and Module Analyses
In order to understand the relationship and interaction between the co-DEGs of the two datasets, we used the online STRING website to analyze the PPI network of co-DEGsencoded proteins, which was visualized by Cytoscape software. Based on the STRING website, 123 DEGs were filtered into the DEGs PPI networks complex consisting of 88 nodes and 230 edges ( Figure 4A). Fifteen hub genes with a degree of > 10 were selected from those networks. Subsequently, we performed module analysis by MCODE, a plugin using scoring and finding parameters for the best results, and two clusters ( Figure 4B,C) were screened out from the PPI networks of co-DEGs. Eventually, 15 hub genes were included in the two clusters (Table 3).
Genes 2021, 12, x FOR PEER REVIEW 7 of 13 encoded proteins, which was visualized by Cytoscape software. Based on the STRING website, 123 DEGs were filtered into the DEGs PPI networks complex consisting of 88 nodes and 230 edges ( Figure 4A). Fifteen hub genes with a degree of > 10 were selected from those networks. Subsequently, we performed module analysis by MCODE, a plugin using scoring and finding parameters for the best results, and two clusters ( Figure 4B,C) were screened out from the PPI networks of co-DEGs. Eventually, 15 hub genes were included in the two clusters (Table 3).

Hub Genes Survival Analysis
The mRNA expression levels of 15 hub genes were further analyzed on GEPIA website and it was found that those genes were significantly upregulated ( Figure 5). The survival situations of 15 hub genes were analyzed by using the GEPIA website. The results indicated that four genes (i.e., bgn, vcan, col1a1 and timp1) were closely related with poor prognosis (Figure 6) in TCGA-STAD samples. We further analyzed the correlation between the four genes. The outcome illustrated that bgn was significantly correlated with vcan (R = 0.78), col1a1 (R = 0.8) and timp1 (R = 0.68, Figure 7). In gastric cancer tissues where protein expression was detected, the levels of the protein (BGN, VCAN, COL1A1 and TIMP1) were higher than in normal tissues ( Figure 8). Therefore, the expression levels and translation levels of these four genes in cancer tissues were higher than those in normal tissues.

Hub Genes Survival Analysis
The mRNA expression levels of 15 hub genes were further analyzed on GEPIA website and it was found that those genes were significantly upregulated ( Figure 5). The survival situations of 15 hub genes were analyzed by using the GEPIA website. The results indicated that four genes (i.e., bgn, vcan, col1a1 and timp1) were closely related with poor prognosis (Figure 6) in TCGA-STAD samples. We further analyzed the correlation between the four genes. The outcome illustrated that bgn was significantly correlated with vcan (R = 0.78), col1a1 (R = 0.8) and timp1 (R = 0.68, Figure 7). In gastric cancer tissues where protein expression was detected, the levels of the protein (BGN, VCAN, COL1A1 and TIMP1) were higher than in normal tissues ( Figure 8). Therefore, the expression levels and translation levels of these four genes in cancer tissues were higher than those in normal tissues.

Discussion
The age-standardized 5-year OS of GAC patients is less than 30% due to poor prognosis in the Chinese mainland [16]. Therefore, it is urgent to understand the molecular basis of GAC based on bioinformatics analysis of microarray data. Our results revealed that four ECM related hub genes predicted poor prognosis in GAC based on bioinformatics analysis. Three ECM related hub genes were well correlated with bgn. Therefore, bgn, vcan, col1a1 and timp1 associated with the extracellular matrix may serve as new biomarkers for clinical diagnosis and prognosis of GAC.
The immune tumor microenvironment could affect the survival of cancer patients [17]. ECM is a key component of the microenvironment around the cells, playing important roles during embryonic development and organ homeostasis. ECM is common in the abnormal state and reconstituted in diseases such as cancer [18]. In the present study, GO and KEGG pathways enrichment analyses demonstrated that extracellular environment proteins were crucial for GAC cancer maintenance. Among the coexpression module, the top three terms enriched in CC were all related to the extracellular environment. Cell adhesion is an essential process for cell migration. Directional cell migration is initiated by extracellular cues including ECM proteins and mechanical forces [18]. Cell adhesion is regulated by cellular contact inhibition at the earlier stage of the neoplastic process [19]. Moreover, cell adhesion was the term with the largest number related to BP enrichment in our analysis, while calcium ion binding was the term with the largest number and smallest p-value in MF enrichment. The ECM-receptor interaction term enriched in the KEGG pathway had the smallest p-value. Therefore, ECM proteins may play an important part in GAC, and further studies are required to investigate the specific mechanisms.
Based on the PPI network using the Cytoscape software, we screened 15 hub genes, among which four (i.e., BGN, VCAN, COL1A1 and TIMP1) were associated with poor prognosis in TCGA-STAD samples. To further analyze the correlation between the four genes, we found that VCAN, COL1A1 and TIMP1 genes were closely correlated with biglycan (BGN) serving as an important component of ECM protein belonging to the small leucine-rich proteoglycans family [20]. Currently, a number of studies have indicated that the expression level of BGN is significantly higher in tumor tissues than in adjacent normal tissues [21][22][23]. In addition, it could act as an oncogenic by activating the FAK signaling pathway in metastasis of gastric cancer [24]. In the present study, VCAN was well correlated with BGN. As an ECM proteoglycan, VCAN could interact with other ECM components, and be implicated in regulating cell proliferation, differentiation, apoptosis, migration and adhesion in a variety of malignancies. The expression of versican in some tumor cells showed an increase in bladder cancer [25], colon carcinoma [26], ovarian cancer [27] and hepatocellular carcinoma [28]. In a previous study, the density of VCAN expression was stronger in metastatic tumors compared to the primary tumor [29]. Additionally, the expression of versican in tumors was associated with cancer grade and adverse outcome [30]. Collagen type I α 1 (COL1A1) is the pro-α 1 chain of type I collagen protein, as an essential component of the ECM, involved in many biological processes including ECM remodeling, tumor cell adhesion and cell migration [11,31]. Moreover, COL1A1 accelerated intraperitoneal metastasis of ovarian cancer xenograft by intraperitoneally injection in mice [10]. Previous studies indicated that as a part of ECM, TIMP1 could led to accelerated differentiation and hypertrophy of adipocytes, which contributed to the pathogenesis of cancer [32,33]. A recent study showed that high TIMP1 mRNA was associated with a lower OS in clear cell renal cell carcinoma [34]. This is consistent with our results that a higher TIMP1 level was associated with poor OS in GAC patients. In summary, BGN, VCAN, COL1A1 and TIMP1 may be helpful in revealing pathogenic mechanism of GAC.

Conclusions
In summary, four hub genes in the two datasets (i.e., GSE103236 and GSE96668) predicted poor prognosis in TCGA-STAD samples. Among these four hub genes, the expression of VCAN, COL1A1 and TIMP1 was well correlated with BGN. Moreover, all four genes were significantly upregulated in the mRNA level, which plays an important role in the tumor microenvironment. Eventually, our findings suggested that ECM-related proteins including BGN, VCAN, COL1A1 and TIMP1 may serve as potential candidate biomarkers for the detection and prognosis prediction of GAC.