Molecular Network Profiling in Intestinal- and Diffuse-Type Gastric Cancer

Simple Summary Cancer has several phenotypic subtypes where the responsiveness towards drugs or capacity of migration or recurrence are different. The molecular networks are dynamically altered in various phenotypes of cancer. To reveal the network pathways in epithelial-mesenchymal transition (EMT), we have profiled gene expression in mesenchymal stem cells and diffuse-type gastric cancer (GC), as well as intestinal-type GC. Gene expression signatures revealed that the molecular pathway networks were altered in intestinal- and diffuse-type GC. The artificial intelligence (AI) recognized the differences in molecular network pictures of intestinal- and diffuse-type GC. Abstract Epithelial-mesenchymal transition (EMT) plays an important role in the acquisition of cancer stem cell (CSC) feature and drug resistance, which are the main hallmarks of cancer malignancy. Although previous findings have shown that several signaling pathways are activated in cancer progression, the precise mechanism of signaling pathways in EMT and CSCs are not fully understood. In this study, we focused on the intestinal and diffuse-type gastric cancer (GC) and analyzed the gene expression of public RNAseq data to understand the molecular pathway regulation in different subtypes of gastric cancer. Network pathway analysis was performed by Ingenuity Pathway Analysis (IPA). A total of 2815 probe set IDs were significantly different between intestinal- and diffuse-type GC data in cBioPortal Cancer Genomics. Our analysis uncovered 10 genes including male-specific lethal 3 homolog (Drosophila) pseudogene 1 (MSL3P1), CDC28 protein kinase regulatory subunit 1B (CKS1B), DEAD-box helicase 27 (DDX27), golgi to ER traffic protein 4 (GET4), chromosome segregation 1 like (CSE1L), translocase of outer mitochondrial membrane 34 (TOMM34), YTH N6-methyladenosine RNA binding protein 1 (YTHDF1), ribonucleic acid export 1 (RAE1), par-6 family cell polarity regulator beta (PARD6B), and MRG domain binding protein (MRGBP), which have differences in gene expression between intestinal- and diffuse-type GC. A total of 463 direct relationships with three molecules (MYC, NTRK1, UBE2M) were found in the biomarker-filtered network generated by network pathway analysis. The networks and features in intestinal- and diffuse-type GC have been investigated and profiled in bioinformatics. Our results revealed the signaling pathway networks in intestinal- and diffuse-type GC, bringing new light for the elucidation of drug resistance mechanisms in CSCs.


Introduction
Different cell types show a variety of molecular networks. Gastric cancer (GC) has several subtypes, which includes intestinal-and diffuse-type GC [1,2]. Intestinal-type GC has a trend to be more rigid. In contrast, diffuse-type GC has a tendency to be more loose or sparse, which confers the diffuse-type GC malignant property and the migration capacity to the secondary site of cancer. It is essential to distinguish the subtypes of GC, since the prognosis is different, and the anti-cancer drug resistance may also be involved in diffuse-type GC [3]. Thus, the therapeutic strategies may differ in each subtype of GC. Although the gene mutations of CDH1 and RHOA distinguished GC from colorectal and esophageal tumors, and these mutations were specific to diffuse-type GC, it is still challenging to discriminate the intestinal-type and diffuse-type GC in molecular gene expression networks [4]. We have previously revealed that the mRNA ratios of CDH2 to CDH1 distinguish the intestinal-and diffuse-type GC [2]. The precise molecular mechanisms behind the differences between the intestinaland diffuse-type GC are still under investigation.
Epithelial-mesenchymal transition (EMT) is associated with the malignancy of GC and diffuse-type GC [5]. EMT is one of the critical features in cancer stem cells (CSCs), which plays an essential role in cancer metastasis and drug resistance, and therefore, is an important therapeutic target [6][7][8].
EMT program contributes to development as well as several pathogenesis conditions such as wound healing, tissue fibrosis and cancer progression [7]. Abundant molecules and networks are involved in EMT process, while core EMT transcription factors have been defined as SNAI1/2, ZEB1/2 and TWIST2 [8,9]. The EMT mechanism has many aspects and layers in morphological changes and cancer microenvironment [10]. To reveal the network pathways in the EMT, we have profiled gene expression and networks in mesenchymal stem cells and diffuse-type GC, as well as intestinal-type GC [2,11]. To better understand the pathogenesis of GC and treat EMT-like malignant diffuse-type GC, it is essential to know and predict the network pathway difference between intestinal-and diffuse-type GC.
The importance and potential to use the molecular network profile to distinguish diffuse-and intestinal-type GC are increasing in the digital era to reveal the EMT mechanism [10]. The previous study clearly demonstrated that the gene regulatory network construction identified nuclear transcription factor Y subunit alpha (NFYA) as a prognostic factor in diffuse-type GC [12]. Recent progress in computational analysis and public databases enables multi-disciplinary assessment for big data, including network analysis of the RefSeq data. In this study, the open-sourced RefSeq data of intestinal-and diffuse-type GC were compared, followed by molecular network analysis and gene ontology analysis [13]. In the meantime, the prediction modeling utilizing Artificial Intelligence (AI) for the molecular networks has been established. This research is integrating the gene expression, molecular networks and AI for future networking.

Networks Generated from Genes Altered in Intestinal-and Diffuse-Type GC
Networks of genes altered in intestinal-and diffuse-type GC were analyzed using Ingenuity Pathway Analysis (IPA). A total of 2815 IDs that had significant difference between intestinal-and diffuse-type GC were analyzed in IPA (t-test, p < 0.00001). A total of 25 networks generated from genes that have significant difference between intestinal-and diffuse-type GC are shown in Table 2. The Network #1 which is related to cancer, gastrointestinal disease, organismal injury, and abnormalities is shown in Figure 2. Networks generated from genes altered in intestinal-and diffuse-type gastric cancer (GC). A total of 2815 IDs, which had significant difference between intestinal-and diffuse-type GC, were analyzed in Ingenuity Pathway Analysis (IPA); and Network 1 related to cancer, gastrointestinal disease, organismal injury and abnormalities is shown. (a) Network in intestinal-type GC; (b) Network in diffuse-type GC. A total of 463 direct relationships with three molecules (MYC, NTRK1, UBE2M) are shown in the network of biomarker-filtered genes in intestinal-type GC (c) and diffuse-type GC (d). From 613 genes biomarker-filtered (human, blood, cancer), 285 genes including MYC, NTRK1 and UBE2M are included in the network. All relationships were 609. Table 2. Networks generated from genes which have significant difference between intestinal-and diffuse-type gastric cancer (GC). The networks were generated from a total of 2815 probe set IDs differentiated between intestinal-type (CIN; chromosomal instability) and diffuse-type (GS; genomically stable) GC (Student's t-test, p < 0.00001).

ID Focus Molecules
Top Diseases and Functions

Regulator Effect Networks Related to Cancer in Intestinal-and Diffuse-Type GC
Regulator effects were analyzed by IPA. The target disease was selected as cancer in the analysis. The types of regulators analyzed include biological drug, canonical pathway, and chemical drug ( Figure 3). Table 3 shows regulator effect networks related to cancer in intestinal-type GC. Regulator effect networks related to cancer have been generated. Table 4 shows regulator effect networks related to cancer in diffuse-type GC.

Upstream Regulators in Intestinal-and Diffuse-Type GC
Upstream regulators of genes altered in intestinal-and diffuse-type GC were defined by IPA analysis. The top 25 upstream regulators of the altered genes in intestinal-and diffuse-type GC are shown in Table 7. The top 25 upstream regulators include NUPR1, CSF2, PTGER2, TP53, EGFR, let-7

Prediction Model for Molecular Networks of Intestinal-and Diffuse-Type GC
The results of upstream analysis of intestinal-and diffuse-type GC data were analyzed in DataRobot Automated Machine Learning version 6.0 for creating prediction models. The list of upstream regulators was up-loaded and linked with network picture data, followed by the target prediction setting as subtype differences in intestinal-and diffuse-type GC ( Figure 5). Among various prediction models DataRobot created, Elastic-Net Classifier (mixing alpha = 0.5/Binomial Deviance) was the highest predictive accuracy model with AUC of 0.7185 in cross-validation score. For this model, the feature impact chart using Permutation Importance showed that the most important features for accurately predicting the subtype of GC ("Analysis" values) were upstream network pictures (NWpic) (Figure 5a,b). Figure 5c shows the Partial Dependence Plot in Predicted Activation State of the upstream network. Figure 5d shows the Word Cloud of the target molecules. The size of the molecules indicates the appearance in the dataset, and the color shows the coefficient. Figure 5e shows the activation maps where the attention of AI is highlighted. Figure 5f shows an exemplified Receiver Operating Characteristic (ROC) curve for the model.

EMT Molecular Pathway and Diffuse-Type GC Mapping
The canonical pathways for Regulation of the EMT pathway include TGF-beta pathway, Wnt pathway, Notch pathway, and Receptor Tyrosine Kinase pathway ( Figure 6). In each pathway related to EMT, genes of which expression was altered in diffuse-type GC compared to intestinal-type GC are mapped in pink (up-regulated) or green (down-regulated) color. The activation states of the pathways are predicted with IPA and shown in orange (activation) or blue (inhibition) color. RNA-RNA interaction analysis identified the interacted miRNAs as let-7, mir-10, mir-126, mir-181, mir-26, mir-515, MIR100-LET7A2-MIR125B1, MIR124, MIR99A-LET7C-MIR125B2, and MIRLET7.

Discussion
It is critical to distinguish the intestinal-and diffuse-type GC for effective therapeutic strategies, since the pathogenesis and prognosis are quite different in these subtypes. We previously revealed the gene signature of intestinal-and diffuse-type GC, which is indicated by the ratio of gene expression in CDH2 to CDH1 [2]. CDH1 and CDH2 are important factors as the signatures for distinguishing the subtypes of GC. Since our previous reports, the abundant useful open-source data, including RefSeq data for the intestinal-and diffuse-type GC, have been available in public [13][14][15][16]. Our current study highlights the relevance of using open-source data for human health. In the current study, the RefSeq data of intestinal-and diffuse-type GC have been analyzed for exploring the molecular networks and AI modeling application. The top 10 genes of which gene expression was altered in intestinaland diffuse-type GC RefSeq data included CKS1B, CSE1L, DDX27, GET4, MRGBP, MSL3P1, PARD6B, RAE1, TOMM34, and YTHDF1. The network analysis of altered genes in intestinal-and diffuse-type GC generated networks related to cancer, gastrointestinal disease, organismal injury and abnormalities, amino acid metabolism, molecular transport, small molecule biochemistry, and so on. Several miRNAs including miR-205-5p, miR-21-5p, let-7a-5p, let-7, miR-24-3p, and miR-291a-3p were identified to regulate networks involved in intestinal-and diffuse-type GC. Since previous studies have revealed the involvement of miR-200s in promoting metastatic colonization by inhibiting EMT and promoting mesenchymal-epithelial transition (MET), it may be an intriguing approach to reveal miRNA networks in EMT [17,18]. The several miRNAs are involved and regulated in EMT and MET, which would be critical for progression and metastasis process [19][20][21]. DataRobot Automated Machine Learning created prediction models to distinguish intestinal-and diffuse-type GC with results of up-stream analysis and the network picture data. The image recognition of molecular networks by AI would distinguish the intestinal-and diffuse-type GC. It was indicated that Predicted Activation State could anticipate the subtypes of GC with approximately 0.5 of partial dependence, which showed that the predicted activation state of the molecular networks might distinguish the subtypes of GC.
The intestinal-and diffuse-type GC can be distinguished with the mRNA ratios of CDH2 to CDH1, as previously shown [2]. The molecular network profiling is vital to reveal the mechanisms behind the differences between the intestinal-and diffuse-type GC, such as EMT and drug resistance in CSCs. The research exploring the differences between molecular networks in intestinal-and diffuse-type GC would reveal the interesting mechanisms leading to the therapeutic target identification. It is easier to detect miRNAs in the blood than to analyze the tissues. The current study exploring the miRNA regulation in intestinal-and diffuse-type GC might identify the miRNAs involving the EMT in diffuse-type GC, and these miRNAs might be detected in the blood. The profile in the molecular networks of RNAs detected in blood would be the next pathways to be revealed in future research.

Data Collection
The RefSeq data of intestinal-and diffuse-type GC are publicly available in The Cancer Genome Atlas (TCGA) of The cBioPortal for Cancer Genomics database [13][14][15] in NCI Genomic Data Commons (GDC) Data Portal [22]. From the data of stomach adenocarcinoma (TCGA, PanCancer Atlas), intestinal-and diffuse-type GC data, which are noted as chromosomal instability (CIN) and genomically stable (GS), respectively, in TCGA Research Network publication, were compared [13].

Network Analysis
Data of intestinal-and diffuse-type GC in TCGA cBioPortal Cancer Genomics were uploaded and analyzed through the use of Ingenuity Pathway Analysis (IPA) (QIAGEN Inc., Hilden, Germany) [23].

Gene Ontology (GO) Analysis
Gene Ontology (GO) was analyzed in the Database for Annotations, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources 6.8 (Laboratory of Human Retrovirology and Immunoinformatics) [24,25].

AI Prediction Modeling
To create a prediction model by using multi-modal data including images and text description of molecular networks, an enterprise AI platform (DataRobot Automated Machine Learning version 6.0; DataRobot Inc., Boston, MA, USA) was used. For the modeling, the 116 molecular networks of IPA upstream analysis in intestinal-and diffuse-type GC were collected and input as image data in the DataRobot (58 images in each subtype), which automatically created and tuned prediction models using various machine learning algorithms (e.g., eXtreme gradient-boosted trees, random forest, regularized regression such as Elastic Net, Neural Networks) [26,27]. Finally, the AI model with the highest predictive accuracy on DataRobot was identified and various insights (such as Permutation Importance or Partial Dependence Plot) obtained from the model were reviewed.

Data Visualization
The results of gene expression data of RefSeq and network analysis were visualized by Tableau software.

Statistical Analysis
The RefSeq data were analyzed by Student's t-test. Z-score in intestinal-and diffuse-type GC samples were compared, and the difference was considered to be significant in p value < 0.00001. For DAVID Gene Ontology (GO) enrichment analysis, data was analyzed in the default setting. GO enrichment was considered significant in modified Fischer Exact p value < 10 −6 . Bonferroni statistics showed p value < 0.005.

Conclusions
The regulatory molecular networks are altered in intestinal-and diffuse-type GC. Networks generated from genes altered in intestinal-and diffuse-type GC included a network related to cancer, gastrointestinal disease, and organismal injury and abnormalities. We demonstrated that several miRNAs regulated the networks in intestinal-and diffuse-type GC. Machine learning of network image data created prediction models to distinguish the subtypes of the GC. The molecular mapping of intestinal-and diffuse-type GC may reveal the EMT mechanism. The miRNAs identified in the study may be regulated in EMT, which would be critical for progression and metastasis process. Our results support further identification of GC subtypes through visual changes in molecular networks.