Identification of Key Genes and Pathways in Pancreatic Cancer Gene Expression Profile by Integrative Analysis

Background: Pancreatic cancer is one of the malignant tumors that threaten human health. Methods: The gene expression profiles of GSE15471, GSE19650, GSE32676 and GSE71989 were downloaded from the gene expression omnibus database including pancreatic cancer and normal samples. The differentially expressed genes between the two types of samples were identified with the Limma package using R language. The gene ontology functional and pathway enrichment analyses of differentially-expressed genes were performed by the DAVID software followed by the construction of a protein–protein interaction network. Hub gene identification was performed by the plug-in cytoHubba in cytoscape software, and the reliability and survival analysis of hub genes was carried out in The Cancer Genome Atlas gene expression data. Results: The 138 differentially expressed genes were significantly enriched in biological processes including cell migration, cell adhesion and several pathways, mainly associated with extracellular matrix-receptor interaction and focal adhesion pathway in pancreatic cancer. The top hub genes, namely thrombospondin 1, DNA topoisomerase II alpha, syndecan 1, maternal embryonic leucine zipper kinase and proto-oncogene receptor tyrosine kinase Met were identified from the protein–protein interaction network. The expression levels of hub genes were consistent with data obtained in The Cancer Genome Atlas. DNA topoisomerase II alpha, syndecan 1, maternal embryonic leucine zipper kinase and proto-oncogene receptor tyrosine kinase Met were significantly linked with poor survival in pancreatic adenocarcinoma. Conclusions: These hub genes may be used as potential targets for pancreatic cancer diagnosis and treatment.


Introduction
In modern medicine, pancreatic cancer is one of the most difficult diseases to diagnose because of the early development of systemic metastatic disease. Although the incidence of pancreatic cancer is increasing, awareness of pancreatic cancer is relatively low. The 5-year survival rate of pancreatic carcinoma was about 8%, much less than that of other cancers [1]. The perspective of pancreatic cancer patients has not elevated notably although surgical methods and pharmaceuticals have enhanced treatment to some extent. Moreover, lower enrollment on clinical trials has resulted in a decrease of new therapy development [2]. Generally, some factors that may increase the risk of pancreatic carcinoma including pancreatitis, family history of pancreatic cancer, obesity, and older age so on. One of the main challenges of pancreatic carcinoma chemotherapy is the development of new therapeutic ways affording the elimination of tumors cells while sparing normal tissues. It may ameliorate the depressing outcome of pancreatic carcinoma by molecularly targeted therapeutic approaches used for aberrant signaling pathway in pancreatic cancer cells. Consequently, a relevant molecular target needs to be identified.
Microarray is one of the most recent advances being used for cancer research. Tumor formation involves aberrant changes in numerous cells and variations in genes. Microarray can help peculiarly in the identification of target genes of tumor suppressors and cancer biomarkers, and classification of tumors [3]. In recent investigations, numerous differentially expressed genes (DEG) have been identified through microarray in pancreatic carcinoma, and several potentially pivotal biomarkers were disclosed [4][5][6]. For instance, some key biomarkers have been exposed in pancreatic carcinoma, namely intercellular adhesion molecule 2 (ICAM2), anoctamin 9 (ANO9), proline-rich tyrosine kinase 2 (PYK2) and cyclin-dependent kinase 9 (CDK9) [7][8][9][10]. However, a different biomarker was uncovered in different research lab. Accordingly, there was no responsible biomarker in gene expression profile research of pancreatic carcinoma. The integrative bioinformatics method connecting with gene expression profiling technology might solve the deficiencies.
In this study, we used the publicly available microarray data sets of human pancreatic tissue and performed integrative analysis on DEG by bioinformatics analysis. Our results will disclose the particular biomarker and the underlying therapeutic target for pancreatic carcinoma.

Microarray Data
Four publicly available gene expression profiles (GSE15471, GSE19650, GSE32676 and GSE71989) were downloaded from the Gene Expression Omnibus (GEO) database and used in this study. Criteria of the selected dataset was as follows: (1) the GEO platform (GPL) is GPL570 (Affymetrix Human Genome U133 Plus_2.0 Array); (2) the number of samples is more than 20 containing normal and cancer tissues; (3) the samples are human pancreatic cancer tissue. The dataset of GSE15471 contained pancreatic tissue samples of 39 cancer patients and 39 healthy subjects. The dataset of GSE19650 contained pancreatic tissue samples of 15 cancer patients and 7 healthy subjects. The dataset of GSE32676 contained pancreatic tissue samples of 25 cancer patients and 7 healthy subjects. The dataset of GSE71989 contained pancreatic tissue samples of 13 cancer patients and 8 healthy subjects. Data of chronic pancreatitis tissue samples in GSE71989 were not included in this study. These four datasets were chosen for integrative analysis in this study including 92 pancreatic cancer samples and 61 healthy subjects.

Data Preprocessing and DEG Screening
Affy package of R language was used for manipulating the raw data following a 3-step process: background-adjusted, normalized, and log-transformed the raw data values [11]. Afterwards, the expression matrix with gene level was gained by transforming the expression matrix with probe level grounded on annotation files. DEG analysis was performed with multiple linear regression Limma package [12]. It estimates the fold changes and standard errors by fitting a linear model for each gene by lmFit and the empirical Bayes statistics implemented by eBayes, topTable etc. Statistical significance was set at p value <0.01 and log2-fold change (log2|FC|) > 1 for each dataset. In the following study, intersection of the 4-dataset DEG was defined as common DEG. A Venn diagram was used for showing the common DEG by VennDiagram package of R language. We further analyzed the DEG of intraductal papillary-mucinous adenoma (IPMA), intraductal papillary-mucinous carcinoma (IPMC) and intraductal papillary-mucinous neoplasm (IPMN) for the GSE19650 dataset by the same method.

Hierarchical Clustering Analysis
Gene expression values were extracted from the expression profile for each dataset. A bidirectional hierarchical clustering heatmap was constructed using gplots package of R language for DEG in every dataset. Besides, the hierarchical clustering was performed by limiting the analysis only to the 138 common DEG obtained from the 4 datasets. We used heatmap.2 function in gplots package of R to draw the heat map. In heatmap.2, the expression value of gene is in the row and the sample is in the

Protein-Protein Interaction Network Construction and Hub Gene Analysis
The Search Tool for the Retrieval of Interacting Genes (STRING) version 10.5 (http://www.stringdb.org/) was used for constructing the protein-protein interaction (PPI) networks [14]. The PPI network was constructed and visualized using cytoscape software version 3.5.0 (California, USA) for the common DEG [15]. The plug-in cytoHubba was used for exploring key nodes and fragile motifs in the PPI network by some topological algorithms including Degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Clustering Coefficient, Maximal Clique Centrality, Bottleneck, EcCentricity, Closeness, Radiality, Stress, and Betweenness [16]. A definition of 12 topological algorithms is described in the Supplementary Table S1. The top 30 nodes were considered as notable genes in the network for every topological analysis method. The intersected genes of top 30 nodes of every topological algorithm were regarded as the most important hub genes in the network.

Validation and Survival Analysis of the Hub Genes in The Cancer Genome Atlas (TCGA) Dataset
UALCAN is an interactive web-portal to perform to in-depth analyses of The Cancer Genome Atlas (TCGA) gene expression data (http://ualcan.path.uab.edu/index.html) and it uses TCGA level 3 RNA-seq and clinical data from 31 cancer types [17]. The correlation between hub genes expression and survival in pancreatic adenocarcinoma was analyzed by UALCAN. The patient objects with pancreatic adenocarcinoma were split into two groups according to the expression of a particular gene (high vs. low/medium expression).

Identification of DEG
A total of 138 common DEG were identified from the intersected parts of the four profile datasets including 93 up-regulated genes and 45 down-regulated genes in the pancreatic carcinoma samples compared to normal samples, which was exhibited by a Venn diagram ( Figure 1). The gene expression value was extracted from every profile dataset and a hierarchical clustering heat map was plotted to show the DEG (Figure 2). In Figure 2, it can be seen that some cancer GSE samples (GSM) of the GSE15471 dataset were not classified as the cancer group. Similarly, this phenomenon also presents in the datasets of GSE32676 and GSE71989. Additionally, it shows that a clearer separation between cancer and normal samples (Supplementary Figure S1), which partly support the idea that these 138 genes can act as a pancreatic cancer signature.

GO Functional and Pathway Enrichment Analysis
As shown in Table 1, the DEG was significantly enriched in biological processes containing cell migration, cell adhesion, cell-cell adhesion, extracellular matrix disassembly and hemidesmosome assembly. GO analysis exhibited that the DEG was obviously enriched in extracellular exosome, plasma membrane, bicellular tight junction, focal adhesion for cell component. In addition, GO analysis also displayed that the DEG was markedly enriched in laminin binding, protein homodimerization activity, protein phosphatase binding and cadherin binding involved in cell-cell adhesion for molecular function. However, there are considerable differences in the ranked GO term by p-value between DAVID and KOBAS. For example, the term of cell migration is ranked first in the results of DAVID but it ranked 60 in the KOBAS analysis (Supplementary Table S2).

PPI Network Construction and Hub Genes Identification
The PPI relationship was displayed in Figure 3. It was apparent from the figure that very few down-regulated genes are there. These genes were thrombospondin 1 (THBS1), coagulation factor VIII (F8) and suppressor of cytokine signaling 3 (SOCS3). In order to identify the key genes in the PPI relationship, 12 topological algorithms were carried out. As shown in Table 2, the top 30 genes of Degree topological algorithm included DNA topoisomerase II alpha (TOP2A), THBS1, and so on. Closer inspection of the table showed TOP2A and proto-oncogene receptor tyrosine kinase Met (MET) are top-ranked in the most topological algorithms. Maternal embryonic leucine zipper kinase (MELK), MET, THBS1, TOP2A and syndecan 1 (SDC1) were considered as common hub genes of 12 topological algorithms analysis in the further statistical tests.

PPI Network Construction and Hub Genes Identification
The PPI relationship was displayed in Figure 3. It was apparent from the figure that very few down-regulated genes are there. These genes were thrombospondin 1 (THBS1), coagulation factor VIII (F8) and suppressor of cytokine signaling 3 (SOCS3). In order to identify the key genes in the PPI relationship, 12 topological algorithms were carried out. As shown in Table 2

Survival Analysis
To validate the reliability of the identified hub genes from the four datasets, UALCAN was used to analyze the hub genes transcript expression and survival in the 182 samples which is derived from the TCGA project. The statistical samples included four normal and 178 pancreatic adenocarcinoma samples. As shown in Figure 4, there was a clear trend of increasing gene expression levels of MET, MELK, SDC1 and TOP2A in primary tumor compared to normal samples. On the contrary, THBS1 was under-expressed in primary tumor. These findings suggested the results of the identified candidate hub genes are reliable. The survival analysis results showed that MELK and TOP2A were linked with poor survival in pancreatic adenocarcinoma (p < 0.01). MET was also related with poor survival in the cancer (p = 0.013). However, expression levels of SDC1 and THBS1 were not significantly associated with survival probability in the samples, respectively (p = 0.12, Figure 5).

Survival Analysis
To validate the reliability of the identified hub genes from the four datasets, UALCAN was used to analyze the hub genes transcript expression and survival in the 182 samples which is derived from the TCGA project. The statistical samples included four normal and 178 pancreatic adenocarcinoma samples. As shown in Figure 4, there was a clear trend of increasing gene expression levels of MET, MELK, SDC1 and TOP2A in primary tumor compared to normal samples. On the contrary, THBS1 was under-expressed in primary tumor. These findings suggested the results of the identified candidate hub genes are reliable. The survival analysis results showed that MELK and TOP2A were linked with poor survival in pancreatic adenocarcinoma (p < 0.01). MET was also related with poor survival in the cancer (p = 0.013). However, expression levels of SDC1 and THBS1 were not significantly associated with survival probability in the samples, respectively (p = 0.12, Figure 5).

Discussion
During the past decades, many studies have been performed to disclose the causes and underlying mechanisms of pancreatic adenocarcinoma formation and progression. However, the 5year survival rate for sufferers has only seen slight improvement. Additionally, trustworthy molecular marker with high prognostic value has not yet been determined in pancreatic adenocarcinoma treatment. Due to this disappointing outcome, development of a specific biomarker is urgently needed to detect early pancreatic adenocarcinoma which has a central role in affording patients the best possible outcome. Of equal importance, a new molecular target of drug needs to be identified and validated in order to develop underlying drugs that may be successful in pancreatic adenocarcinoma treatment. Many studies concentrate on an independent genetic event, or the result is generated from independent studies which are inconsistent with each other by microarray analysis [18][19][20]. In this study, the number of up-regulated genes was significantly more than the downregulated genes (93 vs. 45) for DEG. Previous research reported that the over-expressed genes were markedly more than the under-expressed genes in DEG [21]. Some GEO samples (GSM) of pancreatic adenocarcinoma were not grouped as the cancer group in the cluster analysis. This result may be

Discussion
During the past decades, many studies have been performed to disclose the causes and underlying mechanisms of pancreatic adenocarcinoma formation and progression. However, the 5-year survival rate for sufferers has only seen slight improvement. Additionally, trustworthy molecular marker with high prognostic value has not yet been determined in pancreatic adenocarcinoma treatment. Due to this disappointing outcome, development of a specific biomarker is urgently needed to detect early pancreatic adenocarcinoma which has a central role in affording patients the best possible outcome. Of equal importance, a new molecular target of drug needs to be identified and validated in order to develop underlying drugs that may be successful in pancreatic adenocarcinoma treatment. Many studies concentrate on an independent genetic event, or the result is generated from independent studies which are inconsistent with each other by microarray analysis [18][19][20]. In this study, the number of up-regulated genes was significantly more than the down-regulated genes (93 vs. 45) for DEG. Previous research reported that the over-expressed genes were markedly more than the under-expressed genes in DEG [21]. Some GEO samples (GSM) of pancreatic adenocarcinoma were not grouped as the cancer group in the cluster analysis. This result may be explained by the fact that these patients were diagnosed as pancreatic cancer patients, and there must be a reason for the lack of clustering such as types of pancreatic adenocarcinoma, different disease subtypes and disease activity or disease stages [22]. Another possible explanation for this is that pancreatic adenocarcinoma pathogenesis in different patients may depend on common changes of the expression of particular critical genes, and rather on personal particular changes of different genes.
There is a growing interest in finding for gene network replace alone genes, contributing to the etiology of complicated diseases, because changes in biological characteristics need interaction in expression of gene sets. The enrichment analysis tool is a beneficial step in this direction for estimating overrepresentation of specific gene category or pathway in a gene list [23]. The results of our analysis showed that DEG was significantly enriched in biological processes including cell migration, cell adhesion, cell-cell adhesion, and extracellular matrix disassembly. One study on gene ontology analysis of the 98 DEG showed that cell adhesion was the main enriched process by genome-scale analysis in patients with pancreatic adenocarcinoma [24]. Moreover, the enriched KEGG pathways of DEG included extracellular matrix receptor interaction and focal adhesion. In human pancreatic adenocarcinoma, both stromal and cancer cells present to be the source of extracellular matrix-degrading metalloproteinase and tissue inhibitor of metalloproteinase [25]. According to KEGG pathway analysis, extracellular matrix (ECM)-receptor interaction was the most enriched pathway in this study. The result was in line with those of a previous study [24].
Hub genes, namely MELK, MET, THBS1, TOP2A and SDC1, were selected with the common parts of 12 topological algorithms in analyzing the DEG PPI network. Notably, only three down-expressed genes, namely SOCS3, F8 and THBS1 were displayed in the PPI network. THBS1 acts as an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interaction. In the invasion of human pancreatic adenocarcinomas, THBS1 was implicated in regulating matrix remodeling and played pivotal role in cancer cell growth and metastasis [26]. Hence, stromal THBS1 immunoreactivity and expression was considered as a prognostic marker and a new indicator of invasiveness in patients with pancreatic adenocarcinomas [27]. Simultaneously, one observation suggested that metronomic ceramide analogs (C2 and AL6) inhibited angiogenesis in pancreatic cancer through up-regulation of THBS1 [28]. THBS1 was significantly decreased in pancreatic cancer patients compared with healthy controls, and low levels of THBS1 were markedly correlated with poorer survival, preclinically and at clinical diagnosis [29]. Surprisingly, while the expression level of THBS1 was lower in pancreatic cancer samples than normal, no significant difference was found in the results of this analysis. Survival analysis on individual hub genes disclosed that the survival probability was not obvious between the high expression of THBS1 and the medium/low expression group. It seems possible that these results are due to multifaceted and sometimes opposing effects of THBS1 on tumor progression depending on the molecular and cellular composition of the microenvironment [30]. Therapeutic strategy targeting THBS1 has been widely explored and plentiful peptides and modified structural agents derived from THBS1 have been developed. For example, compounds of ABT898, and CVX-045 were severally conducted in clinical trials but they were no longer in clinical development due to the adverse events of low objective response rate and slow clearance [31,32]. Trabectedin is a marine natural product. It has been approved for the treatment of advanced or metastatic soft tissue sarcoma and relapsed ovarian cancer. One report indicated that trabectedin displayed anti-angiogenic activity related to the up-regulation of THBS1 [33]. Trabectedin is currently undergoing phase II clinical trials for several other tumors.
SDC1 functions as a transmembrane receptor and engages in cellular proliferation, cell transplantation and cell-matrix interaction. It has previously been observed that SDC1 expression may play an important role in the pathobiology of pancreatic cancer cell, which is different from that in other gastrointestinal cancers [34]. Notably, while the expression level of SDC1 was higher in pancreatic cancer samples than normal, survival analysis on separate hub genes revealed that the survival probability was not obvious between the high expression of SDC1 and the medium/low expression group. These results were likely to be related to the cellular localization of SDC1 as cell membrane anchored and/or shed, soluble SDC1 with stromal or nuclear accumulation in individual tumor types [35]. In a human melanoma and ovarian cancer experimental model, the human antibody OC-46F2, specific for the extracellular domain of SDC1, blocked vessel maturation and tumor development [36]. The unusual tumorigenic phenotypes resulting from varied SDC1 expression make it appealing for therapeutic targets. For example, indatuximab ravtansine is a monoclonal antibody-related drug that particularly aims SDC1-expressing cells, pre-clinical research corroborated the activity of indatuximab ravtansine in combination with lenalidamide and examethasone for plasma cell myeloma, and a clinical research is continuous [37]. This achievement seems to promote the hopeful results from pre-clinical and clinical researches that studied the chance of therapeutically targeting SDC1.
TOP2A is one of DNA topoisomerases. Accumulating evidence indicated that it can lead to cancer progression in diverse cancer types, and it has been the certified therapeutic target of anti-cancer and anti-bacterial pharmaceuticals. Recent advances in the field have indicated the feasibility of devising specific-isoform human topoisomerase II poisons, which may be grew as safer anti-cancer pharmaceuticals [38]. In research of forecasting gemcitabine sensitivity with pancreatic adenocarcinomas patient objects, a different expression of TOP2A was discovered in gemcitabine sensitive tumors which authenticated as one of potential genes linked with resistance to drug [39]. Survival analysis results of this study showed that TOP2A was associated with poor survival in pancreatic adenocarcinoma and the survival probability was obvious between the high expression of TOP2A and the medium/low expression group. These results were consistent with another study that discovered that the up-regulation of TOP2A was markedly linked with cancer metastasis and smaller survival in adenocarcinomas patient objects [40]. In pancreatic neuroendocrine tumors, TOP2A was also identified as one of hub genes by gene microarray analysis [41]. Thereby, TOP2A remains as the vital therapeutic target of anti-cancer drugs.
MELK was revealed to be commonly up-regulated in differing types of solid tumor, with crucial roles in formation and maintenance of tumor stem cells. In small cell lung cancer and hepatocellular carcinoma, MELK expression is consistently elevated in cancer relative to normal tissues [42,43]. This study analysis also exhibited similar results, and MELK was linked with poor survival in pancreatic adenocarcinoma. Interesting, survival probability was obviously different between the high expression of MELK and the medium/low expression group. Previous results implicated MELK can control normal and transformed pancreatic duct cell migration [44]. Although the increase of MELK expression has been elucidated in many tumors, no oncogenic variation in the MELK gene has been picked out to date. Thus, a small molecule inhibitor of MELK that particularly suppresses MELK activity may suffer an undesired off-target effect in both normal and tumor cells [45]. Orally administrative MELK-targeting compound OTSSP167 inhibited the growth of different types of human cancer including breast, lung, prostate, and pancreas cancer [46]. Hence, inactivation of MELK may be therapeutically beneficial.
MET protein is a receptor tyrosine kinase and encoded by MET proto-oncogene, also called tyrosine-protein kinase Met (c-Met). The c-Met kinase has appeared as an appealing target for developing anti-cancer drugs because of its close connection with the generation of diversified human tumors, dismal clinical results and even drug resistance. Active human cancer-linked pancreatic stellate cells caused proliferation and microtube formation of microvascular endothelial cells by c-MET signal pathway, which exert a primary effect in human pancreatic adenocarcinoma progression [47]. In stage I-II pancreatic cancer, high MET expression was correlated with dismal prognosis and assisted in identifying patients with a high-risk of cancer recurrence and depressing survival prognoses [48]. One report provided evidence that targeting MET in combination with gemcitabine may be effective in human pancreatic adenocarcinoma and ensured further clinical evaluation [49]. Blocking the activity of c-Met in tumor cells, in combination with other ways for diminishing desmoplasia in the cancer microenvironment, might notably elevate the success of chemotherapy [50]. The mounting evidence demonstrated that MET is regarded as a novel therapeutic approach in pancreatic cancer and as a target for personalized therapy [51]. Crizotinib is a tyrosine kinase inhibitor that it can block peritoneal diffusion in pancreatic adenocarcinoma through inhibiting cancer cell proliferation and invasion, at least in part by the suppression of MET signal [52]. Hence, it can be speculated that MET is a candidate therapeutic target in pancreatic adenocarcinoma and highlighted a collaborative combination of drugs warranting clinical evaluation for pancreatic adenocarcinoma treatment.

Conclusions
In conclusion, our work has identified 138 DEG in the four profile datasets. DEG significant enriched in biological processes including cell migration, cell adhesion, cell-cell adhesion, extracellular matrix disassembly and several pathways, mainly associated with ECM-receptor interaction, proteoglycans in cancer and focal adhesion pathway in pancreatic cancer. These findings could significantly improve our understanding of the cause and underlying molecular events in pancreatic cancer, these promising molecular markers identified that gene expression profiling studies including MET, MELK, SDC1, THBS1 and TOP2A and pathways could be new effective therapeutic targets for pancreatic cancer.

Conflicts of Interest:
The authors declare no conflict of interest.