Tissue-Specific Down-Regulation of the Long Non-Coding RNAs PCAT18 and LINC01133 in Gastric Cancer Development

Gastric cancer (GC) is the fifth most common cancer and the third most frequent cause of cancer deaths worldwide. The high death rate associated with GC, and lack of appropriate biomarkers for diagnosis, prognosis, and treatment emphasize the need for identification of novel molecules. Given the emerging roles for long non-coding RNAs (lncRNAs) in cancer development, we studied novel lncRNA candidates involved in gastric carcinogenesis. LncRNA candidate discovery was performed using analyses of available datasets and literature. Validation was done using an internal sample set of GC/normal tissues, and external independent datasets. Network analysis and functional annotation of co-expressed protein coding genes were performed using the weighted gene correlation network analysis (WGCNA) and ingenuity pathway analysis. Two novel lncRNAs, PCAT18 and LINC01133, associated with GC development were identified by analysis of the discovery Gene Expression Omnibus (GEO) datasets. The down-regulation of these genes in GC tissues was successfully validated internally and externally. The results showed a tissue-specific down-regulation of PCAT18 and LINC01133 in gastrointestinal tissues. WGCNA and ingenuity pathway analyses revealed that the genes co-expressed with the two lncRNAs were mostly involved in metabolic pathways and networks of gastrointestinal disease and function. Our findings of a tissue-specific down-regulation of PCAT18 and LINC01133 in gastric and other gastrointestinal cancers imply that these lncRNAs may have a tumor suppressive function in the development of these tumor entities. The two lncRNA biomarkers may contribute to a better understanding of the complex mechanisms of gastric carcinogenesis.


Introduction
Gastric cancer (GC)/stomach cancer is the fifth most common cancer in the world with almost one million new cases reported in 2012 [1]. More than 70% of GCs occur in developing countries and half the world's total occurs in Eastern Asia. GC is the third leading cause of cancer-related death with the highest estimated mortality rate observed in Eastern Asia. Despite advances in diagnosis, showing the number of common and unique lncRNA genes across the GEO datasets. The blue circle represents the lncRNAs found in GSE79973, the pink circle those found in GSE19826, the green circle those found in GSE54129, and the yellow circle the 13 lncRNAs common to the four-GEO set (four Agilent microarray expression datasets of gastric cancer (GC)).  Next we analyzed expression of the five lncRNAs candidates in the 25 GC/normal tissue sample set using real-time quantitative (q) PCR. Expression data were validated for PCAT18, DANCR, and LINC01133 (validation 2). All three lncRNAs were down-regulated in GC tissues (PCAT18: p ≤ 0.0001; DANCR: p ≤ 0.001; and LINC01133: p ≤ 0.001) ( Figure 3A). No dysregulated expression was observed for EWSAT1 and GAS6-AS1 (p = 0.1034 and p = 0.3049, respectively) and therefore these two lncRNAs were excluded from any further analyses.  Table 2 shows the distribution of the 25 GCs according to selected clinical, histopathological, and epidemiological parameters and associations with PCAT18, DANCR, and LINC01133 expression levels. With the exception of gender and smoking status, parameters showed no associations with expression levels. DANCR expression was associated with gender (p = 0006) and smoking status (p = 0.005), whereas LINC01133 expression was associated with gender (p = 0.002). Both LINC01133 and DANCR lncRNA expression levels were higher in females than males. DANCR lncRNA expression was higher in non-smokers compared to smokers. PCAT18 expression was not associated with any of the investigated parameters.  For further validation TCGA RNA-seq expression data for PCAT18, DANCR, and LINC01133 in stomach cancer and normal tissues were analyzed. As shown in Figure 3B, the expression of PCAT18 and LINC01133 was down-regulated in stomach cancer tissues (p = 6.35 × 10 −19 and p = 0.3.40 × 10 −7 , respectively; validation 3). No deregulated expression was observed for DANCR.
The relationship between expression of PCAT18 and LINC01133, overall survival and tumor grade/stage was investigated using the The Cancer Genome Atlas-Stomach Adenocarcinoma (TCGA-STAD) dataset. No statistically significant associations were identified.
Further analysis of RNA-seq expression data of these genes in six other cancer entities including four gastrointestinal cancers (esophageal, colon, rectum, and liver), prostate, and breast cancers showed an up-regulation of PCAT18 in prostate and breast cancers (p = 2.96 × 10 −11 and p = 6.33 × 10 −8 , respectively), and a down-regulation in all gastrointestinal cancers ( Figure 4). LINC01133 was also down-regulated in esophageal, colon, and rectal cancers as well as in prostate cancer. In contrast, DANCR was up-regulated in breast, prostate, liver, and colon cancers.

Weighted Gene Correlation Network Analysis and Functional Annotation of Co-Expressed Genes
WGCNA identified 17 co-expression clusters/modules of co-expressed genes in the TCGA-STAD RNA-seq dataset. The two down-regulated lncRNAs PCAT18 and LINC01133 were clustered in the red module ( Figure 5A). This module contained 218 other genes which are provided in Table S3. The relation between the co-expressed genes of the red module and the lncRNAs is shown in Figure 5B.
To assess how lncRNA genes may contribute to GC development, functional enrichment analysis of the 218 co-expressed genes was performed using Ingenuity Pathway Analysis (IPA). The top ten canonical pathways and the top ten diseases and functions related to the co-expressed genes are shown in Figure 5C,D. The top canonical pathways included different metabolic pathways which play a role in degradation of glycolysis side-product, amino acid biosynthesis, and retinoate biosynthesis. The biological functions of these genes were associated with molecular transport, organismal development, digestive system development and function, organ morphology, and tissue morphology.

Discussion
Despite advances in diagnosis, prognosis, and treatment, GC remains a worldwide public health concern. Our contribution to this area of investigation was to identify novel lncRNAs involved in gastric carcinogenesis by analyzing gene expression datasets obtained from GC and normal tissues from the GEO and TCGA databases in addition to another sample set of 25 GC and paired normal tissues. While some lncRNAs dysregulated in GC and their clinical value as potential biomarkers for diagnosis and prognosis have been previously reported [22], this study provides additional data on two novel lncRNAs contributing to GC and their potential functions.
LncRNAs are non-coding transcripts longer than 200 nucleotides that do not overlap with annotated coding genes. These transcripts are involved in chromatin remodeling and genome architecture, RNA stabilization and transcriptional regulation [23]. In the present study two lncRNAs down-regulated in GC compared to normal tissues were identified; PCAT18 and LINC01133. The expression data of the lncRNAs are robust, as they were validated in three independent sample sets. This data implies that these lncRNAs may act as tumor suppressors acting during the development of GC.
The prostate cancer associated transcript 18 (PCAT18) gene located at 18q11.2 is highly expressed in prostate cancer. Its' silencing in prostate cancer cells leads to the inhibition of cell proliferation, migration, and invasion [24]. In the present study we observed a down-regulation of PCAT18 in GC/stomach cancer tissues pointing to a role of this lncRNA in GC development. A down-regulation was also observed in other gastrointestinal cancers including cancers of the esophagus, colon, rectum, and liver, while an up-regulation was observed in breast and prostate cancers. These findings imply that PCAT18 down-regulation is specific for gastrointestinal tumors. Furthermore, based on normal tissue RNA-seq expression data from the Genotype-Tissue Expression database (GTEx), PCAT18 is highly expressed in normal stomach tissue [25]. Its high expression suggests a potential regulatory function in this tissue and, when down-regulated, may be a cause or consequence of stomach cancer development.
The long intergenic non-coding RNA 1133 (LINC01133) gene on chromosome 1q23.2 is down-regulated in colorectal cancer [26], while an up-regulation was reported in different types of lung cancer [27,28]. In non-small cell lung cancer (NSCLC), its expression inversely correlated with the expression of KLF, P21, and E-cadherin suggesting an oncogenic function in NSCLC. Furthermore, LINC01133 was shown to sponge the miR-422a to aggravate the tumorigenesis of human osteosarcoma [29]. Our data of a role for LINC01133 in GC are in line with those from a recent study that showed an association of a reduced LINC01133 expression with aggressive tumor phenotypes [30]. The authors also showed that this lncRNA inhibits GC progression and metastasis implying its potential use as an anti-metastatic therapeutic target for this disease. Similar to PCAT18, LINC01133 is also highly expressed in normal stomach tissues based on GTEx data implying that its deregulation in normal stomach tissue may play a role in the fate of cells and cancer progression.
Differentiation antagonizing non-protein coding RNA (DANCR) located at 4q12 is up-regulated in various cancers including hepatocellular carcinoma [31,32], colorectal cancer [33], prostate cancer [34], osteosarcoma [35], and stomach cancer [36]. In contrast to the previous stomach cancer study, a down-regulation of DANCR in GC was found in the present study using the GEO and qPCR data, which however was not validated in the TCGA dataset. Thus, further studies on the expression of this lncRNA in GC and its function are warranted.
The gene co-expression network analysis using WGCNA was performed to identify modules containing PCAT18, LINC01133, and their co-expressed genes. Seventeen modules were identified, one of which was containing both lncRNAs and 218 eigengenes. Pathway analysis revealed that the top canonical pathways were mostly related to various cell metabolic pathways. Given that tumor cells often have an altered metabolism to cope with the demand of cell-mass increase during growth [37], these lncRNAs may be involved in the control of some metabolic pathways in GC cells. Among the networks, the top ones were linked to molecular transport, organismal development, and gastrointestinal disease. The functional annotations of these top networks were associated with various biofunctions and diseases of the stomach. Merging of the top four networks identified the extracellular-signal-regulated kinase/mitogen-activated protein kinases (ERK/MAPK) pathway as a hub with more connections to other co-expressed genes. It was reported that abnormal activation and mutations of genes involved in the ERK/MAPK pathway occur in more than 50% of human cancer types [38]. Recently, various studies have shown that the ERK/MAPK pathway is involved in regulating cellular mobility in GC cell lines suggesting that this pathway influences GC cell migration and invasion [39]. Besides, of the top upstream regulators, the homeobox genes CDX1 and CDX2 have been reported to be crucial players in stomach carcinogenesis [40], while XBP1 was shown to control the maturation of gastric zymogenic cells [41].
To elucidate the mechanisms of how these lncRNAs exert their functions, lncRNA-lncRNA interactions were predicted in silico using lncRNA2-target databases. An interaction between PCAT18 and BANCR was predicted. PCAT18 expression was increased following knock-down of BANCR [42]. Interestingly, a recent study on GC reported that BANCR was significantly up-regulated in GC tissues, and cell lines and its down-regulation led to the inhibition of GC cell proliferation [43]. Accordingly, down-regulation of PCAT18 along with up-regulation of BANCR in GC tissues suggests a possible regulatory interaction between these two lncRNAs. Further exploration using AnnoLnc [44], a web server, which provides systematic annotation of newly identified human lncRNAs, predicted an interaction of PCAT18 with C15orf57 (now called CCDC32 gene) and GABRR3 protein.
Our study has some limitations that are related to the small size of the internal validation set, and to the retrospective study cohorts. Another limitation is the lack of functional analyses, which should be performed to yield detailed insight into the mechanism of downregulation of PCAT18 and LINC01133 in gastric carcinogenesis.
Altogether, we showed a decreased tissue-specific expression of PCAT18 and LINC01133 lncRNAs in GC and other gastrointestinal tumor tissues, suggesting a role of these lncRNAs in the development of gastrointestinal tumors. The reduced lncRNAs expression levels may interfere with normal harmony of gene regulation in normal gastric cells and potentiate them towards GC progression and development, which may be achieved via a gene regulation process leading to metabolic adaptation in tumor cells. The two lncRNA biomarkers may contribute to a better understanding of the complex mechanisms of gastric carcinogenesis. The reported data should guide future studies on the associations of PCAT18 and LINC01133 with GC and their functions.

Data Extraction from the GEO Database and Literature Review
Ten GC datasets were retrieved from GEO (http://www.ncbi.nlm.nih.gov/geo/) using the keywords: "lncRNA stomach cancer" (study keyword), "homo sapiens" (organism), "expression profiling by array" (study type), and "tissue" (attribute name). Seven datasets fulfilling the following parameters were selected for expression analyses: (1) Availability of data on GC and adjacent normal tissues; (2) inclusion of expression data of lncRNA genes; and (3) availability of minimum information about the microarray experiment. Four datasets (obtained using the Agilent platform);, GSE70880 (20 tumor and 20 adjacent normal tissues), GSE51308 (5 tumor and 5 adjacent normal tissues), GSE84787 (10 tumor and 10 adjacent normal tissues), and GSE50710 (10 tumor and 10 adjacent normal tissues) were selected for the discovery of lncRNA candidates. These sets contained data from 45 GCs and their paired normal tissues. Besides, three datasets (obtained using the Affymetrix platform); GSE79973 (10 tumor and 10 adjacent normal tissues), GSE19826 (12 tumor and adjacent normal tissues + 3 normal gastric tissues), and GSE54129 (111 tumor and 21 noncancerous gastric tissues), were used for data validation. These sets contained data from 133 GCs and 46 noncancerous normal tissues.
The comparison between tumor and adjacent normal tissues allowed the identification of differentially expressed genes in the GEO datasets. p values were adjusted (p adj .) using the Benjamini and Hochberg method. A p adj . < 0.05 and a |logFC| ≥ 1 were set as cut-off criteria [45][46][47]. Among the top candidates, those already known to be associated with GC were excluded.
Moreover, in order to identify novel lncRNAs associated with other cancers but not reported in GC, a comprehensive PubMed literature search was performed. The following keywords, selected from the medical subject headings (MeSH) database, were used: ("Neoplasms") AND "RNA, Long Noncoding").

Patient Samples
Fifty tissues comprising 25 GC and paired normal tissues (25 GC/normal tissue sample set) were obtained from the Iran National Tumor Bank (INTB, Tehran, Iran). All tissues were collected during surgical resection of patients diagnosed with primary GC at the Imam Khomeini Hospital, Tehran, Iran from 02/2009 to 11/2014. Adjacent normal tissues were obtained from areas at least 6 cm away from the tumor site. None of the patients received radiation and/or chemotherapy treatment before surgery. Tissues were stored in liquid nitrogen until nucleic acid extraction.
The study was approved by the Ethical Committee of the Shahroud University of Medical Sciences (9559, 08/10/2016). All study participants provided written informed consent.

RNA Extraction and cDNA Synthesis
Tissues were grinded in liquid nitrogen using a mortar and pestle, instantly transferred into the lysis buffer, and homogenized using a needle and syringe. Total RNA was extracted using AllPrep DNA/RNA Mini kit (Qiagen, Hilden, Germany), according to manufacturer's instructions.
The quantity and quality of isolated RNA samples were determined by Picodrop microliter spectrophotometer (OEM, Hinxton, UK), and electrophoresis on a 0.8% agarose gel. Afterwards, 1 µg of total RNA was converted into cDNA using PrimeScript TM RT reagent kit (TaKaRa Bio, Shiga, Japan) according to the manufacturer's instruction.  Table S4. Conditions for amplification were 95 • C for 15 min, followed by 40 cycles of 95 • C for 5 s, 60 • C for 30 s. Melting curves were obtained by slow heating (0.5 • C/s) at temperatures in the range of 65 to 95 • C. All samples were run in duplicate.

Data Extraction from TCGA and Data Analyses
For further data validation, RNA-sequencing data (RNA-seq) of stomach (gastric) adenocarcinoma (STAD) from TCGA were analyzed and used for comparison. Moreover, the expression levels of the lncRNA candidates were analyzed in other cancer entities including adenocarcinomas of the colon, esophagus, rectum, liver hepatocellular, prostate, and breast and adjacent normal tissues. The RNA-seq raw data files were downloaded from the TCGA GDC data portal, normalized, and filtered using the R/Bioconductor software package TCGAbiolinks (R version 3.4.2, http://www.r-project.org/) [48]. Differential expression analysis of lncRNA genes in tumor and normal tissues was performed using the edgeR package [49]. This package implements the trimmed mean of M-values (TMM) method to give the normalized read counts.

Weighted Gene Correlation Network Analysis
Given that the functions of most lncRNAs are unknown, prediction of their functions mostly relies on the analysis of their co-expressed genes. Network analysis was performed using the WGCNA package in R as described previously [50,51]. To identify modules of highly correlated genes, WGCNA was performed on RNA-seq data of the TCGA-STAD dataset obtained from 407 GC and normal tissues. To identify modules with different expression patterns, a soft threshold power was assigned to create co-expression networks. The networks were built by merging genes with highly similar co-expression patterns into modules and the eigengenes of these modules were determined. Finally, the module with the key lncRNAs and their co-expressed genes was obtained. The reconstructed co-expression network was visualized using the Cytoscape software (version 3.5.1 (http://www.cytoscape.org)) [52].

Functional Annotation of the Co-Expressed Genes in the Module
To investigate the potential functions of PCAT18 and LINC01133 and their associated biological pathways, a functional enrichment analysis of their co-expressed genes was performed using Ingenuity Pathway Analysis (IPA; Ingenuity Systems, Mountain View, CA, USA) software. IPA provides a graphical representation of the molecular relationships between genes in networks. Functional analysis identified statistical significant (Fisher's exact test p value < 0.05) over-represented Canonical Pathways, Molecular and Cellular Functions, Physiological System Development and Function, upstream regulators, and Diseases and Bio Functions in the imported data sets.

Statistical Analyses
Statistical analyses were performed using GraphPad Prism 6 software (GraphPad Software Inc., San Diego, CA, USA) and SPSS 24.0 (SPSS Inc., Chicago, IL, USA). Differences between the means of two groups were determined using student's t-test. All p values were two-sided, with a p value of less than 0.05 considered statistically significant. All results are presented as the mean ± standard deviation (SD) of the experiments.

Conclusions
The down-regulation of PCAT18 and LINC01133 in GC implies that these lncRNAs may have a tumor suppressive function in the development of gastric tumors. The two lncRNA biomarkers may contribute to a better understanding of the complex mechanisms of gastric carcinogenesis.