CAST as a Potential Oncogene, Identified by Machine Search, in Gastric Cancer Infiltrated with Macrophages and Associated with Lgr5

Background: Gastric cancer (GC) is one of the leading malignant diseases worldwide, especially in Asia. CAST is a potential oncogene in GC carcinogenesis. The character of macrophage infiltration in the GC microenvironment also remains unaddressed. Methods: We first applied machine searching to evaluate gene candidates for GC. CAST expression and pan-cancer surveyance were analyzed using the Human Protein Atlas (HPA) and Gene Expression Profiling Interactive Analysis 2 (GEPIA2) database. The protein–protein interaction (PPI) network was downloaded from STRING. We investigated the impact of CAST on clinical prognosis using a Kaplan–Meier plotter. The correlations between CAST and Lgr5 and macrophage infiltration in GC were determined using TIMER 2.0. Finally, GeneMANIA was also used to evaluate the possible functional linkages between genes. Results: After the machine-assisted search, CAST expression was found to significantly influence the overall survival of GC patients. STRING revealed CAST-related proteomic and transcriptomic associations, mainly concerning the CAPN family. Moreover, CAST significantly impacts the prognosis of GC based on the validation of other datasets. Notably, high CAST expression was correlated with worse overall survival in GC patients (hazard ratio = 1.59; log-rank P = 9.4 × 10−8). CAST and Lgr5 expression were both positively correlated with WNT 2 and WNT 2B. Among the GC patients in several datasets, CAST and macrophage infiltration, evaluated together, showed no obvious association with poor clinical overall survival. Conclusions: CAST plays an important role in the clinical prognosis of GC and is associated with WNT 2/WNT 2B/Lgr5. Our study demonstrates that CAST’s influence on overall survival in GC is regulated by macrophage infiltration.


Introduction
Gastric cancer (GC) is one of the most prolific diseases worldwide. It has been estimated that there are more than 1 million newly diagnosed GC patients worldwide each year. GC is the fourth most common cancer and the second most common cause of death worldwide [1]. Globally, one in 33 men and one in 78 women will develop GC in their lifetimes [2,3]. Since GC is often diagnosed at an advanced stage, the mortality rate is high. In 2018, 784,000 people died from GC worldwide, twice as many men as women, with East Asia, Eastern Europe, and South America being the regions with the greatest GC incidence and deaths [4]. Clinically, we can expect to see more cases of GC in the future due to the aging of the population. In recent years, we have even observed an increase in the incidence of GC in young people [5].
Approximately 10% of GC patients have familial genetic clusters, and approximately 1-3% of them have mutations [6]. Familial GC includes at least three major classifications: hereditary diffuse GC (HDGC), gastric adenocarcinoma and proximal gastric polyps and disease, and familial gastrointestinal cancers [7][8][9]. To explore the frontier of the mechanisms of gastric carcinogenesis, recent studies have considered Lgr5 as an activator of the WNT signaling pathway, which promotes the proliferation of gastric adenocarcinoma cells. Stem cells overexpressing the marker Lgr5 are derived from the stomach, kidneys, colon, hair follicles, and mammary glands [10]. Wu et al. found Lgr5 expression at the bottom of normal gastric gland units and revealed differential expression in GC with varying differentiation. Furthermore, Lgr5 and Bmi1 were identified as marking the same stem-cell population. CD133, CD26, CD44, and ALDH1 associated with Lgr5 may be related to the growth of GCs [11].
Calpastatin (CAST) is usually found in the plasma membrane and surrounding the nucleus [12]. CAST inhibits calpains, which can translocate into the nucleus and further regulate the WNT/β-catenin pathway [13]. The single CAST gene can encode eight or more CAST polypeptides, ranging from 17 to 85 kDa in molecular weight, with the functions of binding to calpain molecules and Ca 2+ dependency. The CAST/calpain system regulates a variety of cellular processes, involving the remodeling of cytoskeletal/membrane attachments, multi-signal transduction pathways, and cell apoptosis. The CAST/calpain system also participates in numerous membrane-fusion events, such as neural vesicle exocytosis and platelet aggregation [14]. CAST has previously been reported as a possible novel marker in GC development. Liu's study results revealed that calpastatin levels were decreased in GCs. Furthermore, the ratio of (calpain 1 (CAPN1) × calpain 2 (CAPN2))/(calpastatin × calmodulin (CaM)) has been considered a potential index for GC diagnosis [15].
In recent years, tumor-associated macrophages (TAMs) have been associated with the tumor microenvironment, acting in both tumor-promoting and tumor-suppressing manners [16]. TAMs are categorized into the anti-tumor M1 phenotype (classically activated state) and the protumorigenic M2 phenotype (alternatively activated state), reflecting the Th1-Th2 polarization of T cells [17]. TAMs participate in innate host defenses and kill tumor cells. Meanwhile, TAMs also play a critical regulatory role in epithelialmesenchymal transition, angiogenesis, and immunosuppression, hampering the efficacy of chemotherapy [18,19].
However, the characteristics of CAST associated with the immunological responses of macrophages and their relevance to Lgr5 remain unaddressed. We aimed to explore the possible interactions of the above-mentioned characteristics.

The Cancer Genome Atlas (TCGA) Program Analysis Using Machine Searching
The expression levels for the CAST gene in various types of cancers were identified in the Human Protein Atlas (HPA) database (https://www.proteinatlas.org/, accessed on 1 September 2021). We used Python Selenium (Version 3.8) to automatically search the TCGA database by entering different gene candidates, and we recorded all the candidate Biomolecules 2022, 12, 670 3 of 17 genes associated with the overall survival (OS) rate for GC. Then, the most relevant genes, including CAST and WNT (p-value < 0.001), were precisely selected.

Protein-Protein Interaction (PPI) Network from STRING
The STRING database (version 11.5) [20] is applied in the search for PPIs that are of interest to scientists and worthy of investigation. Proteins relevant to the same topic can be linked by direct and indirect relationships and mapped to a weight network in STRING, containing 14,094 organisms, 67.6 million proteins, and >20 billion interactions. Proteins are marked as nodes, and every two proteins is given as an edge and highlighted with a confidence score. The higher the confidence score, the greater the number of analogous functions among proteins [21].

CAST Bioinformatics Analysis Using Gene Expression Profiling Interactive Analysis 2 (GEPIA2) Datasets
We examined the mRNA levels for CAST, comparing tumor and matched normal samples using the GEPIA2 database, which can provide cancer genomic data on the basis of TCGA, and the GTEx [22].

Using Human Protein Atlas (HPA) for Further Validation of CAST in Different Human Tissues
We used the HPA, which is one of the most robust and comprehensive databases of protein and RNA in tissues and cells. The HPA's goal in the Cell Atlas is to map the subcellular distributions of all human proteins over the course of a cell cycle in a canonical human cell. The HPA includes over 85% of all human protein-coding gene data. Furthermore, both immunohistochemistry (IHC) scoring parameters and subcellular localization classifications are purified to increase the numbers of cell types and organelles, and supply clinicians with bioinformatic information on intraorganellar locations. The HPA can contribute to deeper investigations for both basic and clinical research [23]. We used transcriptomic and proteomic expression to represent the characteristics of CAST in different tumor tissues Table S1.

Survival Analysis Using Kaplan-Meier (KM) Plotter
The cancer-survival information and CAST bioinformatic information for the GC patients contained in the KM plotter database were extracted from the Gene Expression Omnibus (GEO), the Cancer Biomedical Informatics Grid, and The Cancer Genome Atlas database. The following GC datasets were retrieved from the GEO database: GSE62254, GSE22377, GSE51105, GSE14210, GSE29272, and GSE15459 (https://kmplot.com/analysis/ index.php?p=service&cancer=gastric, accessed on 1 September 2021) [24]. We also acquired KM survival plots, in which the numbers of cancer patients for a specific period are compared between subgroups with different gene-expression statuses. We determined the hazard ratios (HRs), 95% confidence intervals (CIs), and log-rank p-values. A p-value < 0.05 was considered statistically significant.

TIMER 2.0 Database for Genes and Infiltrating Immune Cells
The TIMER 2.0 database (http://timer.cistrome.org/, accessed on 1 September 2021) is a website that sources a large amount of immune and gene bioinformatic information, which can be used to further analyze and summarize tumor immune-infiltration scores, such as for neutrophils, macrophages, T cells, B cells, and NK cells. TIMER 2.0 can also analyze specific oncogene mutation groups, and genes have been input for the analysis of well-known oncogenic mutations in specific tumors [25][26][27]. The correlations among the Lgr family, CAST, WNT family, and macrophages were surveyed, the data for which were taken from the TCGA database. The results of the surveyance were downloaded to observe the outcome. The relationship between the CAST gene and well-known immune infiltration in tumors was also analyzed using TIMER 2.0 for confirmation. A p-value < 0.05 was considered statistically significant.

Gene and Protein Networks Analysis
GeneMANIA (http://genemania.org/, accessed on 15 August 2021, version 3.6.0) is a real-time multiple association network integration algorithm for predicting gene function [28]. The data could be extracted for gene-gene interactions (GGIs) in our study. Regarding previous studies concerning the WNT family related to gastric cancer development, we surveyed the relationships among the WNT, CAST, and Lgr5 genes. Moreover, we analyzed the functions involving G-protein-coupled receptor binding, the canonical WNT signaling pathway, stem-cell differentiation, and the positive regulation of the WNT signaling pathway for the demonstration of GGIs.

Statistical Analysis
The results from the KM plotter and TIMER 2.0 are shown with the hazard ratios (HRs) and Cox p-values from a log-rank test. We evaluated the correlation of gene expression using Spearman's rank correlation and statistical significance. Rho-values were applied in the determination of positive or negative correlations in protein/RNA expression.

CAST-Centered Network Interaction and Clustering Analysis
CAST was introduced into the STRING database to obtain the functional proteincorrelation network. The PPI network of this functional protein expression relevant to CAST contained 11 nodes and 40 edges, obtained with confidence scores for CAPN2/CAPN1/CAPNS1 of 0.999/0.999/0.986. The enriched p-value was 1.12 × 10 −11 . The K-means algorithm for clustering analysis in the constructed network of interaction, causing three distinct numbers of interactive networks, is represented in Figure 1. In addition, the gene ontology (GO) bioinformation related to CAST is shown in Table 1. [28]. The data could be extracted for gene-gene interactions (GGIs) in Regarding previous studies concerning the WNT family related to ga development, we surveyed the relationships among the WNT, CAST, and Moreover, we analyzed the functions involving G-protein-coupled receptor canonical WNT signaling pathway, stem-cell differentiation, and the positiv of the WNT signaling pathway for the demonstration of GGIs.

Statistical Analysis
The results from the KM plotter and TIMER 2.0 are shown with the h (HRs) and Cox p-values from a log-rank test. We evaluated the correlat expression using Spearman's rank correlation and statistical significance. were applied in the determination of positive or negative correlations in p expression.

CAST-Centered Network Interaction and Clustering Analysis
CAST was introduced into the STRING database to obtain the functio correlation network. The PPI network of this functional protein expression CAST contained 11 nodes and 40 edges, obtained with confidence CAPN2/CAPN1/CAPNS1 of 0.999/0.999/0.986. The enriched p-value was 1.12 K-means algorithm for clustering analysis in the constructed network of causing three distinct numbers of interactive networks, is represented in addition, the gene ontology (GO) bioinformation related to CAST is shown in

CAST Expression in Different Tissues
We extracted the CAST RNA-sequencing expression level from the GEPIA2 database. Figure 2 demonstrates the CAST expression in transcripts per million (TPM). Glioblastoma (GBM), pancreatic adenocarcinoma (PAAD), and stomach adenocarcinoma (STAD) showed prominent CAST expression, while testicular-germ-cell tumors (TGCTs), uterine corpus endometrial carcinoma (UCEC), and uterine carcinosarcoma (UCS) involved less. The CAST expression in different tissues is shown in Table 1.

Validation of CAST Expression in GC
To gain robust confidence in the association between CAST and GC, we further mined the HPA, with the cancer types color-coded according to which types of normal organ the cancers originated from, including HPA036881, HPA036882, and CAB009491, as shown in Figure 3A-C, respectively. No patients with high expression, six patients with medium expression, three patients with low expression, and three patients with undetected expression of CAST were recorded in HPA036881. No patients with high expression, two patients with medium expression, two patients with low expression, and eight patients with undetected expression of CAST were recorded in HPA036882. Four patients with high expression, five patients with medium expression, one patient with low expression, and two patients with undetected expression of CAST were recorded in CAB009491. An overview of the RNA expression is shown in Figure 3D.   Table 2 shows significant differences in survival between the low-expression and high-expression cohorts. Among the cohorts, the low expression of CAST cohort had longer median survival than the high expression of CAST cohort, except for GSE62254. p-  Table 2 shows significant differences in survival between the low-expression and highexpression cohorts. Among the cohorts, the low expression of CAST cohort had longer median survival than the high expression of CAST cohort, except for GSE62254. p-value was statistically significant (<0.05) in all, GSE22377, GSE14210, GSE29272, and GSE15459 (Table 3). In GC-cohort analyses from the KM plotter, CAST was significantly related to patient survival (all, HR: 1.59; 95% confidence interval (CI): 1.34-1.88; log-rank p-value: 9.4 × 10 −8 ) when the median expression of CAST was set as a cutoff point for stratifying patients in Figure 4A. In Figure 4B, most subgroups showed lower survival in the high CAST expression cohorts than in the low CAST expression cohorts with significant p-values. Figure 5 shows the relevance of CAST alone to GC survival (HR) and clinical outcome (HR: 1.22; p = 0.0415), which was compatible with the dataset retrieved from the KM plotter. In this analysis, 5-year survival was measured, and the low CAST expression cohort was found to have higher cumulative survival than the high CAST expression cohort with clinical significance.    Figure 5 shows the relevance of CAST alone to GC survival (HR) and clinical out (HR: 1.22; p = 0.0415), which was compatible with the dataset retrieved from the plotter. In this analysis, 5-year survival was measured, and the low CAST expre cohort was found to have higher cumulative survival than the high CAST expre cohort with clinical significance.

CAST and Macrophages in GC
TIMER 2.0 showed databases including TIMER, EPIC, XCELL, CIBERSORTand QUANTISEQ. We discovered that, as shown in Figure 8, TIMER indicated that CAST expression and high macrophage infiltration were significantly associated lower cumulative survival than high CAST expression and low macrophage infiltra with an HR of 2.08 and a p-value of 0.00927. However, there was no significance rega cumulative survival according to the EPIC and XCELL CAST expression macrophage-infiltration analyses. The evaluation of M1 and M2 macrophages showed no significant difference in cumulative survival.

CAST and Macrophages in GC
TIMER 2.0 showed databases including TIMER, EPIC, XCELL, CIBERSORT-ABS, and QUANTISEQ. We discovered that, as shown in Figure 8, TIMER indicated that high CAST expression and high macrophage infiltration were significantly associated with lower cumulative survival than high CAST expression and low macrophage infiltration, with an HR of 2.08 and a p-value of 0.00927. However, there was no significance regarding cumulative survival according to the EPIC and XCELL CAST expression and macrophage-infiltration analyses. The evaluation of M1 and M2 macrophages also showed no significant difference in cumulative survival.

CAST-WNT2/WNT2B-Lgr5 Linkages Associated with Gastric Carcinogenesis
We input CAST, WNT2, WNT2B, and Lgr5, using GeneMANIA, and found that CAST was linked to the WNT family and Lgr family, as shown in Figure 9. WNT2 and WNT2B were linked to G-protein-coupled receptor binding. WNT2 was linked to the canonical WNT signaling pathway, but WNT2B did not involve it. Lgr5 showed positive regulation of the WNT signaling pathway and canonical WNT signaling pathway.

CAST-WNT2/WNT2B-Lgr5 Linkages Associated with Gastric Carcinogenesis
We input CAST, WNT2, WNT2B, and Lgr5, using GeneMANIA, and found that CAST was linked to the WNT family and Lgr family, as shown in Figure 9. WNT2 and WNT2B were linked to G-protein-coupled receptor binding. WNT2 was linked to the canonical WNT signaling pathway, but WNT2B did not involve it. Lgr5 showed positive regulation of the WNT signaling pathway and canonical WNT signaling pathway.

Discussion
In our present study, we demonstrated that CAST is an oncogene associated with Lgr5 in gastric cancer via the WNT signaling pathway. The expression of WNT2 and WNT2B showed a significant positive correlation with both CAST and Lgr5, which warrants further study of the molecular biochemistry, transcriptomics, and proteomics in GC. Though CAST has been discovered to have a prominent impact on GC patients' survival, after multivariate adjustments, multi-database datasets revealed that macrophages might play a key role in immune regulation in the GC microenvironment, promoting tumor suppression.
Our research revealed CAST as a potential oncogene promoting GC formation. Previous studies seldom focused on this novel issue. Liu et al. [15] proposed that-other than CAST-CAPN1, CAPN2, and CaM might also contribute to GC formation, which is partially compatible with our results. The calpain system was also associated with colorectal adenocarcinoma and prostate cancer, which suggested that calpains might be important in tumor progression [29,30]. The calpain system is relevant to human epidermal growth factor receptor 2 and E-cadherin in breast cancer [31,32]. Meanwhile, calpain-2 was proven to contribute to the methylation of CRMP4 s promoter, repressing its transcription, thereby promoting the metastasis of prostate cancer by enhancing expression of vascular endothelial growth factor C [33].
The mechanism by which CAST promoted GC remained unclear. We tried to identify relevant gene expression or possible pathways. After database mining, Lgr5 and CAST were found to possibly regulate GC formation via the same pathway-the signaling of the WNT family, especially WNT 2 and WNT 2B-representing novel findings regarding the signature of GC formation. The WNT/β-catenin pathway in gastric cancer was shown to be important in regulating proliferation, stem-cell maintenance, and homeostasis in the gastric mucosa [34,35]. Activated WNT/β-catenin signaling can be observed in more than 30% of GCs. The fundamental role of WNT/β-catenin signaling in the self-renewal of GC stem cells has been demonstrated [36][37][38]. The WNT/β-catenin signaling paradox was recently discussed, with regard to the hyperactivation of WNT signaling by mutations in β-catenin destruction complex components or β-catenin itself contributing to tumorigenesis [39]. β-catenin can be further activated by additional layers of regulation, highlighting the complicated nature of the role of WNT signaling deregulation in cancer [40][41][42]. The dual function (tumorigenesis or tumor suppression) of the WNT/β-catenin system was highlighted in our clinicopathological dataset survival follow-up.
Recently, TAMs were discovered to be associated with WNT signaling in the tumor microenvironment. Wu et al. [43] demonstrated that macrophages play a protumorigenic role in GC patients. The mechanism could originate from tumor-microenvironment-related inflammation, matrix remodeling, angiogenesis, seeding at distant sites, intravasation, or tumor-cell invasion [44]. The current studies also provide scientists with a clue that macrophages may play a helpful or harmful role in the GC microenvironment. Huang et al. also demonstrated that the heterogeneity of macrophages within the tumor is present at both the macro-and microlevels due to the gradient changes in different markers [45]. In our study, the role of macrophage infiltration in GC associated with CAST remained unclear regarding GC formation and survival. We hypothesize that macrophage infiltration could manipulate specific signaling pathways in GC carcinogenesis. Perhaps further in vitro research should be conducted to determine the mechanism.
We had confidence in the database mining for genes and macrophages relevant to GC on the basis of certain characteristics, such as the high reproducibility, high convenience, and lack of need to inform and obtain consent from patients. The analytical methodology of our article is very suitable for establishing a precise/personalized evaluation of the molecular investigation of GC. Though we found a novel marker and immune infiltration to be correlated with GC, we acknowledge some limitations in our study. First, though the databases contain a large amount of bioinformatic information online, we still need to conduct further experiments for the external validation of the results. Second, the details of the mechanisms by which these genes (CAST, WNT, and Lgr5) induce GC carcinogenesis remain to be elucidated. However, we could use databases to make preliminary reports on these genes, to facilitate confidence in future novel GC carcinogenetic models. Third, we need to perform tissue-sample confirmation due to the potential for errors in tumor purification.

Conclusions
Our study explored CAST as a signature oncogene in GCs. The CAST gene in gastric carcinogenesis was found to be regulated by macrophages in our OS analyses. The details of the mechanism of CAST-gene-related GC formation require further investigation; the mechanism is probably associated with Lgr5-related pathways and WNT/β-catenin cellular signaling.