Large-Scale Transcriptomics-Driven Approach Revealed Overexpression of CRNDE as a Poor Survival Prognosis Biomarker in Glioblastoma

Simple Summary Most glioblastoma patients succumb to the disease within 12 to 18 months, and only 9% are alive 2 years after diagnosis. Even with extensive research, the life expectancy of glioblastoma patients has not changed in decades. We aimed to identify differences in the transcriptomic profiles of glioblastoma patients with long and short survival. With large-scale transcriptomic analysis, we examined information from publicly available datasets (TCGA and CGGA) in combination with FFPE patient tissue samples. We identified one gene, the long noncoding RNA CRNDE, whose overexpression is directly correlated with poor patient survival. Therefore, we suggest its further confirmation as a negative prognostic glioblastoma biomarker. Glioblastoma management still lacks suitable diagnostic, predictive, and prognostic biomarkers for early disease diagnosis, and treatment follow-up. We believe our findings can serve as the basis for identification of new and potential suitable disease biomarkers by looking beyond the classical molecules (DNA, RNA, and proteins) into the noncoding genome. Abstract Glioblastoma is the most common and malignant brain malignancy worldwide, with a 10-year survival of only 0.7%. Aggressive multimodal treatment is not enough to increase life expectancy and provide good quality of life for glioblastoma patients. In addition, despite decades of research, there are no established biomarkers for early disease diagnosis and monitoring of patient response to treatment. High throughput sequencing technologies allow for the identification of unique molecules from large clinically annotated datasets. Thus, the aim of our study was to identify significant molecular changes between short- and long-term glioblastoma survivors by transcriptome RNA sequencing profiling, followed by differential pathway-activation-level analysis. We used data from the publicly available repositories The Cancer Genome Atlas (TCGA; number of annotated cases = 135) and Chinese Glioma Genome Atlas (CGGA; number of annotated cases = 218), and experimental clinically annotated glioblastoma tissue samples from the Institute of Pathology, Faculty of Medicine in Ljubljana corresponding to 2–58 months overall survival (n = 16). We found one differential gene for long noncoding RNA CRNDE whose overexpression showed correlation to poor patient OS. Moreover, we identified overlapping sets of congruently regulated differential genes involved in cell growth, division, and migration, structure and dynamics of extracellular matrix, DNA methylation, and regulation through noncoding RNAs. Gene ontology analysis can provide additional information about the function of protein- and nonprotein-coding genes of interest and the processes in which they are involved. In the future, this can shape the design of more targeted therapeutic approaches.


Introduction
Glioma incidence has been steadily rising over recent decades, and the mortality curve follows the same trend [1].World Health Organization (WHO) grade IV astrocytoma otherwise known as glioblastoma is the most common and accounts for 60-70% of all glioma cases [2].Because of unspecific symptoms such as headache, confusion, and hearing or vision problems, glioblastoma is usually diagnosed in advanced stage.
Clinical presentation is typically short and ranges from 3 to 6 months before diagnosis [3].The most widely accepted standard of care is the Stupp protocol that consists of maximal surgical resection, 6 weeks radiation and daily oral temozolomide chemotherapy followed by six subsequent cycles of adjuvant temozolomide [4].Such aggressive treatment prolongs patient survival up to 12-18 months after diagnosis [5].As low as 9% of patients are alive 2 years after diagnosis [6], while only 0.7% of all glioblastoma patients survive more than 10 years after diagnosis [7].
Glioblastomas are divided into two main groups primary or isocitrate dehydrogenase (IDH) wild-type and secondary or IDH mutant [8,9].The Cancer Genome Atlas (TCGA) project provided the first comprehensive analysis of the molecular profile of glioblastoma [10].Later, by molecular profiling, three main glioblastoma subtypes were defined [11].Moreover, glioblastoma presents with a whole spectrum of genetic and epigenetic changes as well as whole or partial chromosome gains or losses, and transcriptional interference [12].
Currently there are no available biomarkers for diagnosing glioblastoma in early disease stages or for monitoring patient performance [13].In recent years, several different molecules have been proposed as putative imaging or molecular biomarkers [14][15][16][17][18][19].So far, only two predictive biomarkers have been identified, namely O 6 -methylguanine DNA methyltransferase (MGMT) promoter methylation [20,21] and chromosome 1p/19q codeletion [22,23], that predict positive response to therapy and longer survival in elderly patients.Dismal patient prognosis means new methods need to be applied for identification of promising molecular biomarkers for diagnosis and prediction of treatment response.
In this study, we performed large-scale transcriptomic profiling to identify differences between glioblastoma patients with relatively short-and long-term overall and progressionfree survival (OS and PFS, respectively).We found the long noncoding RNA CRNDE as differentially expressed i.e., overexpressed in all investigated glioblastoma datasets.Because the overexpression of CRNDE was significantly associated with poor OS and PFS, we conclude it shows the potential to be used as a negative prognostic biomarker for glioblastoma.Still, its possible implication in clinical practice as a prognostic biomarker needs to be further experimentally elaborated.

Ethics Statement
This study was carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving human samples.The manuscript is in line with the Recommendations for the Conduct, Reporting, Editing and Publication of Scholarly Work in Medical Journals and aim for the inclusion of representative human populations (sex, age and ethnicity) as per those recommendations.The use of biological tissue samples in the study was approved by the National Medical Ethics Committee of the Republic of Slovenia approval numbers 0120-196/2017/7, 0120-190/2018/4 and 0120-190/2018/11.All patients signed written informed consent.Samples used in this study are anonymous.

Biosamples
In this study, we used 16 formalin-fixed, paraffin-embedded (FFPE) primary tumor tissue samples from glioblastoma patients.FFPE samples were obtained from the Institute of Pathology, Faculty of medicine, University of Ljubljana.FFPE samples contained from 50% (1 sample), 70% (3 samples), 80% (5 samples) and 90% (7 samples) cancer cells in the specimens.Patients were treated at the Institute of Oncology, Ljubljana.Samples were collected retrospectively from patients admitted between the years 2012 and 2017.The patient's age at diagnosis ranged from 22 to 70 years old, 11 (68.75%) of patients were man, 5 (31.25%) were women.The patient's PFS defined as the time from first diagnosis to glioblastoma recurrence varied in the range of 2-37 months, OS varied between 2 and 58 months.More detailed clinicopathological information is shown in Table 1.

RNA Isolation and Sequencing
RNA libraries were generated, sequenced, and processed as previously described [24].RNA extraction was performed using RecoverAll™ Total Nucleic Acid Isolation Kit (Invitrogen, New York, NY, USA), and RNA Integrity Number (RIN) was measured with Agilent 2100 bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA).RNA concentrations were measured using Qubit RNA Assay Kit (Life Technologies Limited, Paisley, UK), and ribosomal RNA was depleted using RNA Hyper Kit with RiboErase (KAPA Biosystem, Roche, Cape Town, South Africa).Processed library concentrations and length distributions were measured using Qubit ds DNA HS Assay kit (Life Technologies Limited, Paisley, UK) and Agilent Tapestation (Agilent Technologies, Inc., Santa Clara, CA, USA), respectively.The samples were sequenced using Illumina HiSeq 3000 engine (Illumina Inc., Hayward, CA, USA) for single end sequencing, 50 bp read length, for approximately 30 million raw reads per sample.Data quality check was conducted using Illumina SAV.De-multiplexing was performed using Illumina Bcl2fastq2 v 2.17 software.

RNA Sequencing Data Collection and Processing
Non-experimental RNA sequencing profiles of primary glioblastoma specimens annotated with known overall survival (OS) and progression-free survival (PFS) time were collected from TCGA [25] and CGGA [26] databases.In total we took 135 samples from TCGA and 218 samples from CGGA database.Data from TCGA, CGGA, and current experimental dataset were used in raw gene counts format.Totally, expression levels were established for 36,596 annotated human genes in TCGA, 23,271 genes in CGGA and 37,002 genes in the experimental dataset.Raw gene counts were normalized using quantile normalization method [27] and log 10 -transformed for further analyses.

Quantization of Molecular Pathway Activation
Pathway activation levels (PALs) were established using Oncobox pathway analysis method [28] for 1611 molecular pathways containing 10 or more gene products extracted from the public databases [29] using the original software [29].For PAL calculations, each sample expression profile was normalized on mean geometrical levels of gene expression for all samples in the dataset under investigation.

Survival Analysis
Each gene and molecular pathway were assessed separately to interrogate their possible impact on PFS and OS.For the analysis of each database, we included patients of the following two groups based on their gene expression or PAL: (i) group with gene expression or PAL higher than 66th-percentile, (ii) group with gene expression or PAL lower than 33rd-percentile.
Cox survival analysis for OS and PFS was performed between those two groups using R packages survival [30] and survminer [31].Then log-rank p-value and hazard ratio p-value were calculated.Genes and molecular pathways with both p-values below 0.05 threshold were selected for further analyses.

Testing of Intersections Significance
To test whether a given number of common differential genes or pathways between the two of three intersecting datasets is significant, 1000 random intersections were performed according to [33].In every case, two/three random samples from two/three corresponding gene sets of the respective datasets were taken.Then these random samples were intersected for each iteration and 1000 numbers of random common genes were obtained.p-value of intersection significance was calculated as a fraction of random numbers equal or higher than the experimentally observed number of common genes.

Data and Code Availability
Sequencing data were deposited in NCBI Sequencing Read Archive (SRA) under accession ID PRJNA742887.Code for the data analysis can be found on Gitlab [35].

RNA Sequencing of Experimental Glioblastoma Samples
In this study, samples from 16 glioblastoma patients (Table 1) were enrolled to obtain experimental RNA sequencing profiles annotated with survival data.In this group there were 11 males, 5 females, age range was 22-70 y.o., mean age was 55.75 y.o.The minimal Karnofsky Performance Status (KPS) at the time of diagnosis in this group was 50, median KPS was 90.Overall survival (OS) data were available for 15 patients (range 2-58 months, mean OS 21.1 months) after diagnosis i.e., the date of the surgical resection.Progression-free survival (PFS) data were available for 14 patients (range 2-37 months, mean PFS 13.2 months).The tumor samples were formalin-fixed, paraffin-embedded tumor tissue blocks with at least 50% of tumor cells.We then profiled gene expression in these biosamples by RNA sequencing (RNAseq) and obtained 26.36-38.99 million raw reads per library (mean 32.68 million reads).A range of 5.53-11.09 million reads (mean 7.29) was uniquely mapped to Ensembl genes using STAR aligner (Table 2).Two samples, NB-00339/13 and NB-00046/12, were not of enough quality.

Primary Comparison of RNAseq Profiles among the Experimental and Literature Datasets
To overall characterize the data obtained, we compared by principal component analysis (PCA) distributions of RNAseq profiles among the experimental dataset (n = 16) and publicly available datasets from The Cancer Genome Atlas (TCGA) project database; number of annotated cases = 135, and from Chinese Glioma Genome Atlas (CGGA) project database; n = 218.TCGA profiles were annotated with OS and PFS data, and CGGA profiles with only OS data.
PCA was performed in the space of log 10 transformed quantile normalized gene counts.We found that CGGA-derived samples formed two overall distinct clusters that corresponded to two different batches with different biomaterial treatment and sequencing protocols (Figure 1A).TCGA and experimental (termed as "Slovenia") samples also formed distinct clusters according to each experimental platform (Figure 1A).PCA was performed in the space of log10 transformed quantile normalized gene counts.We found that CGGA-derived samples formed two overall distinct clusters that corresponded to two different batches with different biomaterial treatment and sequencing protocols (Figure 1A).TCGA and experimental (termed as "Slovenia") samples also formed distinct clusters according to each experimental platform (Figure 1A).
We then built PCA plot based on 1611 molecular pathway activation profiles [29], where pathway activation level (PAL) of a pathway is calculated using transcriptomic data (Figure 1B).PAL can take positive or negative values in the case of pathway up-or downregulation, respectively, and positively reflects the extent of a pathway activation, and thus can be used as the quantitative characteristic of the interactome under study [36].
On the 1611-pathway PCA plot, we observed the same trend as on the gene-based figure, thus demonstrating that CGGA-derived samples from different batches (CGGA_325 and CGGA_693) should be regarded as two separate datasets (Figure 1B).

Survival-Linked Differential Gene Analysis
For each dataset under investigation we performed Cox survival analysis and extracted relevant differential genes [37].To this end, for every gene we identified two respective patient groups by this gene expression levels: (i) patients with the expression level higher than 66th-percentile among all patients (in the top third), and (ii) patients We then built PCA plot based on 1611 molecular pathway activation profiles [29], where pathway activation level (PAL) of a pathway is calculated using transcriptomic data (Figure 1B).PAL can take positive or negative values in the case of pathway up-or downregulation, respectively, and positively reflects the extent of a pathway activation, and thus can be used as the quantitative characteristic of the interactome under study [36].
On the 1611-pathway PCA plot, we observed the same trend as on the gene-based figure, thus demonstrating that CGGA-derived samples from different batches (CGGA_325 and CGGA_693) should be regarded as two separate datasets (Figure 1B).

Survival-Linked Differential Gene Analysis
For each dataset under investigation we performed Cox survival analysis and extracted relevant differential genes [37].To this end, for every gene we identified two respective patient groups by this gene expression levels: (i) patients with the expression level higher than 66th-percentile among all patients (in the top third), and (ii) patients with the expression level less than 33rd-percentile (in the bottom third).We then performed Coxregression analysis for OS and PFS data between these groups according to the previous protocol [38,39].
For each gene, log-rank p-value and hazard ratio (HR) p-value were obtained and then genes with both p-values less than 0.05 were considered differential [40] (Figures 2 and 3 and Supplementary Table S1).The differential genes obtained were then intersected between the TCGA, CGGA (separately CGGA_325 and CGGA_693 batches), and the experimental (Slovenia) datasets.
Cancers 2021, 13, x FOR PEER REVIEW 4 of 20 The genes that were statistically significantly positively (HR > 0), and negatively (HR < 0) associated with OS (termed plus and minus genes, respectively) were considered separately.Totally, we found 1003 plus/413 minus genes for the TCGA, 1331/1670 genes for the CGGA, and 502/377 genes for the experimental datasets, respectively.We then intersected the gene sets obtained and found only one common gene that was differentially regulated (overexpressed) in all four datasets (Figures 4 and 5, Supplementary Table S2).This was the gene for noncoding RNA CRNDE (ColoRectal Neoplasia Differentially Expressed) that was previously associated with many cancers [41,42] including glioblastoma [43,44].To test if a given number of common differential genes or pathways between intersecting datasets is statistically significant, perturbation test for randomness was performed according to [33] (see Materials and Methods).Briefly, we selected random sets of genes/pathways from all available genes/pathways and repeated this operation 1000 The genes that were statistically significantly positively (HR > 0), and negatively (HR < 0) associated with OS (termed plus and minus genes, respectively) were considered separately.Totally, we found 1003 plus/413 minus genes for the TCGA, 1331/1670 genes for the CGGA, and 502/377 genes for the experimental datasets, respectively.We then intersected the gene sets obtained and found only one common gene that was differentially regulated (overexpressed) in all four datasets (Figures 4 and 5, Supplementary Table S2).This was the gene for noncoding RNA CRNDE (ColoRectal Neoplasia Differentially Expressed) that was previously associated with many cancers [41,42] including glioblastoma [43,44].To test if a given number of common differential genes or pathways between intersecting datasets is statistically significant, perturbation test for randomness was performed according to [33] (see Materials and Methods).Briefly, we selected random sets of genes/pathways from all available genes/pathways and repeated this operation 1000 times.The percentile of the observed intersection in the distribution of random intersections was considered to be a measure of statistical significance.For OS data, we found that the Cancers 2021, 13, 3419 9 of 18 intersections of the plus genes in TCGA vs. CGGA comparison, and in CGGA vs. Slovenia dataset comparison were non-random, because they returned statistically significantly greater number of intersecting genes as for the random intersection model (Figure 4 and Supplementary Table S2).To test if a given number of common differential genes or pathways between int secting datasets is statistically significant, perturbation test for randomness was p formed according to [33] (see Materials and Methods).Briefly, we selected random sets genes/pathways from all available genes/pathways and repeated this operation 10 We then investigated these non-random intersecting gene sets using Gene Ontology (GO) terms analysis [45] and identified enriched biological processes associated with these gene sets (Figure 6).For statistical estimates, we used Benjamini-Hochberg method for false discovery rate (FDR) correction [46] and p-value threshold 0.05 [40].
For plus genes that were in statistically significant intersection of TCGA, CGGA_325, and CGGA_693 datasets, the most strongly enriched GO terms were linked with endoplasmic reticulum lumen and structure of extracellular matrix (Figure 6A and Supplementary Table S3).In turn, for the statistically significant intersection of CGGA_325, CGGA_693 and Slovenia plus genes, the most strongly enriched terms dealt with the processes of meiosis and chromosomal segregation, DNA demethylation, fibroblast growth factor receptor signaling, and tRNA transport from the nucleus (Figure 6B and Supplementary Table S4).
We then assessed associations of gene expression biomarkers with progression-free survival (PFS) data.PFS was annotated only for the TCGA and Slovenia datasets, but no information could be extracted for both CGGA batches.Similar to OS expression biomarker analysis, we screened for differentially regulated genes (Figures 3 and 5) using the same analytic approaches and settings.However, for both plus and minus genes, the intersections observed did not pass the randomness permutation test, as reflected by high p-values (Figure 5).

Differential Pathway Activation Analysis
When performing differential pathway-activation-level (PAL) analysis for the same three datasets and overlapping the results with the same settings as for the individual genes, we identified non-random intersection pattern only for the plus pathways between CGGA and TCGA datasets (p = 0.042; Figure 7A, Supplementary Tables S2, S4 and S5).
venia dataset comparison were non-random, because they returned statistically significantly greater number of intersecting genes as for the random intersection model (Figure 4 and Supplementary Table S2).
We then investigated these non-random intersecting gene sets using Gene Ontology (GO) terms analysis [45] and identified enriched biological processes associated with these gene sets (Figure 6).For statistical estimates, we used Benjamini-Hochberg method for false discovery rate (FDR) correction [46] and p-value threshold 0.05 [40].For plus genes that were in statistically significant intersection of TCGA, CGGA_325, and CGGA_693 datasets, the most strongly enriched GO terms were linked with endoplasmic reticulum lumen and structure of extracellular matrix (Figure 6A and Supplementary Table S3).In turn, for the statistically significant intersection of CGGA_325, Figure 6.Gene Ontology analysis of differential plus genes (linked with hazard ratio > 0 in overall survival analysis) that were statistically significantly intersected between (A) CGGA_325, CGGA_693, and TCGA; (B) CGGA_325, CGGA_693, and Slovenia datasets.
We totally identified 47 (CGGA, TCGA) common differential plus pathways (Supplementary Table S2).In good agreement with the results of individual gene-level analysis (Figure 4), they were mostly dealing with the regulation of extracellular matrix maintenance (Supplementary Table S5).However, there were no triple intersections, and we could not identify consensus biomarker molecular pathway(s) for OS in glioblastoma patients.There were also no minus pathways that would pass the randomness permutation test (Figure 7B and Supplementary Table S2).
When screening for the PFS biomarkers, we also could not identify statistically significant interactions, for both plus and minus pathways (Figure 8).
(Figure 4), they were mostly dealing with the regulation of extracellular matrix maintenance (Supplementary Table S5).However, there were no triple intersections, and we could not identify consensus biomarker molecular pathway(s) for OS in glioblastoma patients.There were also no minus pathways that would pass the randomness permutation test (Figure 7B and Supplementary Table S2).
When screening for the PFS biomarkers, we also could not identify statistically significant interactions, for both plus and minus pathways (Figure 8).

CRNDE Overexpression Is Associated with Poor Patient Overall Survival
Thus, we identified that CRNDE was the unique consensus OS expression biomarker among the glioblastoma datasets tested; however, for PFS no common expression biomarkers were identified.We then separately analyzed the impact of CRNDE expression on glioblastoma patient OS using the clinical information from the same annotated expression datasets (Figure 9).High CRNDE expression was significantly associated with poor OS in CGGA_693, TCGA, and Slovenia datasets (Figure 9B-D).The same trend was observed for CGGA_325 dataset; however, the survival difference did not reach p-value threshold of 0.05 (Figure 9A).
We observed a similar trend of CRNDE expression being positively associated with poor survival also for the PFS (Figures 3 and 10).Log-rank p-values were good enough for both TCGA (p = 0.0016) and experimental (Slovenia; p = 0.027) datasets (Figure 10).Hazard ratio p-value was good for the TCGA (p = 0.002) but borderline (p = 0.059) for the experimental dataset (Figure 10).
Thus, we conclude that increased CRNDE expression can be indicative of poor OS and PFS in glioblastoma.

CRNDE Overexpression Is Associated with Poor Patient Overall Survival
Thus, we identified that CRNDE was the unique consensus OS expression biomarker among the glioblastoma datasets tested; however, for PFS no common expression biomarkers were identified.We then separately analyzed the impact of CRNDE expression on glioblastoma patient OS using the clinical information from the same annotated expression datasets (Figure 9).High CRNDE expression was significantly associated with poor OS in CGGA_693, TCGA, and Slovenia datasets (Figure 9B-D).The same trend was observed for CGGA_325 dataset; however, the survival difference did not reach p-value threshold of 0.05 (Figure 9A).We observed a similar trend of CRNDE expression being positively associated with poor survival also for the PFS (Figures 3 and 10).Log-rank p-values were good enough for both TCGA (p = 0.0016) and experimental (Slovenia; p = 0.027) datasets (Figure 10).Hazard ratio p-value was good for the TCGA (p = 0.002) but borderline (p = 0.059) for the experimental dataset (Figure 10).
Thus, we conclude that increased CRNDE expression can be indicative of poor OS and PFS in glioblastoma.

Discussion
In years of life lost, primary glioblastomas are ranked first among cancer types-on average 20.1 years compared to 11.8 years for lung cancer and 6.8 years for prostate cancer [12].This is partially because of the lack of molecular biomarkers for early disease diagnosis and treatment follow-up.Because of the high mortality rate, discovery of molecular biomarkers for glioblastoma is one of the priorities in neuro-oncology research.With our study we aimed to exploit the potential of high throughput in silico tools to identify putative biomarkers to clinically manage glioblastoma.
To obtain more information about the molecular differences between glioblastoma patients with relatively long-and short-term survival, we performed transcriptomic analysis.We combined data already available from public repositories of TCGA and CGGA projects with the original data obtained for clinically annotated experimental FFPE tissue samples from glioblastoma patients.As shown on Table 1, the patients that were included in this study had represented both short-and long-term survival glioblastoma groups.We could identify one gene, CRNDE, which expression has the potential to be used as a negative prognostic biomarker of patient survival, for both OS and PFS.Specifically, Kaplan-Meier analysis revealed that patients who had increased CRNDE expression levels generally had shorter survival times.Moreover, Gene Ontology and pathway-activation-level analysis revealed cell migration, reshaping of extracellular matrix, meiosis, and chromosomal segregation, FGFR signaling, tRNA/noncoding RNA transfer from nucleus, and DNA demethylation as the major differential processes between the long and poor survivors in glioblastoma.

CRNDE
Colorectal neoplasia differentially expressed (CRNDE) was discovered first as overexpressed gene in colorectal adenomas and adenocarcinomas [47].Since that it was associated with different human malignancies [48] including glioblastoma [49,50].CRNDE is located on chromosome 16 [41] and is transcribed to form multiple RNA transcripts, some of which function as noncoding regulatory RNAs.Among them CRNDE-a, -b, -d, -e, -f,h and -j are thought to be major transcripts in different cancers [42].These noncoding RNAs can regulate gene expression [51,52].
Most of the transcribed human genome is noncoding and contains small (sRNAs) and long noncoding RNAs (lncRNAs) [53].lncRNAs vary in length from 200 bp to over

Discussion
In years of life lost, primary glioblastomas are ranked first among cancer types-on average 20.1 years compared to 11.8 years for lung cancer and 6.8 years for prostate cancer [12].This is partially because of the lack of molecular biomarkers for early disease diagnosis and treatment follow-up.Because of the high mortality rate, discovery of molecular biomarkers for glioblastoma is one of the priorities in neuro-oncology research.With our study we aimed to exploit the potential of high throughput in silico tools to identify putative biomarkers to clinically manage glioblastoma.
To obtain more information about the molecular differences between glioblastoma patients with relatively long-and short-term survival, we performed transcriptomic analysis.We combined data already available from public repositories of TCGA and CGGA projects with the original data obtained for clinically annotated experimental FFPE tissue samples from glioblastoma patients.As shown on Table 1, the patients that were included in this study had represented both short-and long-term survival glioblastoma groups.We could identify one gene, CRNDE, which expression has the potential to be used as a negative prognostic biomarker of patient survival, for both OS and PFS.Specifically, Kaplan-Meier analysis revealed that patients who had increased CRNDE expression levels generally had shorter survival times.Moreover, Gene Ontology and pathway-activation-level analysis revealed cell migration, reshaping of extracellular matrix, meiosis, and chromosomal segregation, FGFR signaling, tRNA/noncoding RNA transfer from nucleus, and DNA demethylation as the major differential processes between the long and poor survivors in glioblastoma.

CRNDE
Colorectal neoplasia differentially expressed (CRNDE) was discovered first as overexpressed gene in colorectal adenomas and adenocarcinomas [47].Since that it was associated with different human malignancies [48] including glioblastoma [49,50].CRNDE is located on chromosome 16 [41] and is transcribed to form multiple RNA transcripts, some of which function as noncoding regulatory RNAs.Among them CRNDE-a, -b, -d, -e, -f, -h and -j are thought to be major transcripts in different cancers [42].These noncoding RNAs can regulate gene expression [51,52].
Because of their tissue specificity lncRNAs are excellent candidates for the design and development of targeted therapeutics.However, in order to achieve this one has to identify disease-relevant signaling pathways and their druggable targets, and then to find a way to genetically manipulate them as well as to find an effective delivery system [69].
Other studies have also established a correlation between CRNDE and glioblastoma.For example, by analyzing a cohort of previously published glioma gene expression profiles from the Gene Expression Omnibus (GEO) Zhang et al. [70] showed that expression of lncRNAs CRNDE and HOTAIR increases with ascending glioma malignancy grade.A small scale preliminary study by Kiang et al. [71] showed that different transcript variants of CRNDE have clinical impact in glioblastoma.Although the authors did not show a direct link between a specific CRNDE transcript variant and patient survival, they observed a trend between the ratio of two variants and patient survival.
By analyzing 19 astrocytomas, of which 5 were low grade and 14 were high grade tumors, Kiang et al. [72] showed that CRNDE is strongly up-regulated in gliomas and positively correlates with epidermal growth factor receptor (EGFR) amplification.Moreover, the authors showed that CRNDE knockdown suppresses glioma cell growth in vitro and in vivo, and is associated with decreased Bcl2/Bax ratio.Its potential to be used as a predictive marker was published by Jing et al. [73] in a retrospective study.The authors performed qRT-PCR to examine CRNDE expression and established association between high CRNDE expression and clinicopathological features such as larger tumor size, higher WHO grade, and recurrence.
It is clear now that lncRNAs play critical roles in many pathologies and they are the forefront molecules for translational research.Because of its obvious overexpression in tumors, CRNDE is one of the most extensively investigated lncRNAs in cancer research.The major novelty of our study is that we performed RNA sequencing profiling instead of the most commonly used microarray gene expression analysis.Additionally, we used a combination of data from previous repositories and from the experimental cohort consisting of short-and long-term glioblastoma survivors.Our methodology proved successful in the identification of the lncRNA CRNDE as a negative prognostic biomarker.Although other studies linked CRNDE with increased malignancy grade, we confirmed here for the first time that CRNDE overexpression directly correlates with shorter patient survival times.Thus, we propose its further detailed investigation as a prognostic biomarker for life expectancy of glioblastoma patients.

Limitations
The relatively small number (n = 16) of experimental clinically annotated samples we used in our study can be considered to be a limitation.Our initial aim was to collect samples from long-term glioblastoma survivors.Because of the low incidence of glioblastoma and poor patient survival it is difficult to obtain large number of such samples in countries with small population.Still, to provide more scientifically solid results, the experimental data were complemented by publicly available transcriptomic data.The results may vary between the databases due to different patient ethnicities and/or countries' clinical guidelines for diagnosis and therapy.
Another limitation is that we did not include other potential prognostic factors such as extent of the surgical resection in our analysis.Therefore, we reason it is crucial to additionally investigate the role of CRNDE in respect to other prognostic factors in glioblastoma before determining its potential clinical application.

Conclusions
Despite decades of extensive research and implementation of unconventional therapeutic approaches, long-term survival is not common for glioblastoma patients.Identification of diagnostic as well as prognostic and predictive therapeutic biomarkers is therefore crucial.This can be initially achieved by large-scale profiling combined with extensive bioinformatics analysis, and further confirmed experimentally before it reaches clinical settings.As shown in our study, transcriptome profiling by RNA sequencing is an appropriate method for identification of biomarkers with potential clinical utility.We analyzed pathway activation levels of intersecting genes between TCGA, CCGA, and experimental dataset to determine their possible impact on OS and PFS.Using Cox survival analysis and intersection of positive and negative differential genes between datasets, we identified only one common gene, CRNDE, that was negatively correlated with OS.With Kaplan-Meier analysis we confirmed that overexpression of CRNDE can be used as an indication of poor OS and PFS.As suggested, lncRNA CRNDE can be used as a negative prognostic marker for glioblastoma patients.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10.3390/cancers13143419/s1,Table S1: HR and log-rank p-value for gene expression levels in CGGA, TCGA, and Slovenia datasets, Table S2: Intersections of differentially expressed genes and pathways, Table S3: Common plus genes that were in statistically significant intersection of TCGA, CGGA_325, and CGGA_693 datasets, Table S4: Common plus genes that were in statistically significant intersection of Slovenia, CGGA_325, and CGGA_693 datasets, Table S5: Common plus pathways that were in statistically significant intersection of CGGA_325 and TCGA datasets.Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Cancers 2021 ,
13, x FOR PEER REVIEW 3 of 20 database; n = 218.TCGA profiles were annotated with OS and PFS data, and CGGA profiles with only OS data.

Figure 1 .
Figure 1.Principal Component Analysis (PCA) of (A) gene expression and (B) pathway activation levels (PALs) for experimental (Slovenia) clinically annotated glioblastoma tissue samples and publicly available glioblastoma samples from CGGA (batches CGGA_325 and CGGA_693) and TCGA.

Figure 1 .
Figure 1.Principal Component Analysis (PCA) of (A) gene expression and (B) pathway activation levels (PALs) for experimental (Slovenia) clinically annotated glioblastoma tissue samples and publicly available glioblastoma samples from CGGA (batches CGGA_325 and CGGA_693) and TCGA.

Figure 2 .
Figure 2. Distribution of differential genes by log-rank p-value and log hazard ratio for overall survival (OS) data across: (A) CGGA_325, (B) CGGA_693, (C) TCGA and (D) Slovenia glioblastoma datasets.

Figure 2 .
Figure 2. Distribution of differential genes by log-rank p-value and log hazard ratio for overall survival (OS) data across: (A) CGGA_325, (B) CGGA_693, (C) TCGA and (D) Slovenia glioblastoma datasets.

Figure 3 .
Figure 3. Distribution of differential genes by log-rank p-value and log hazard ratio for progression-free survival (PFS) data across: (A) TCGA and (B) Slovenia glioblastoma datasets.

Figure 4 .
Figure 4. Intersections of differential (A) plus (hazard ratio > 0) and (B) minus (hazard ratio < 0) genes for overall survival (OS) analysis between CGGA_325, CGGA_693, TCGA and the experimental (Slovenia) glioblastoma samples; p-values for intersection significance that are less than 0.05 are highlighted in bold.

Figure 3 .
Figure 3. Distribution of differential genes by log-rank p-value and log hazard ratio for progression-free survival (PFS) data across: (A) TCGA and (B) Slovenia glioblastoma datasets.

20 Figure 3 .
Figure 3. Distribution of differential genes by log-rank p-value and log hazard ratio for progression-free survival (PFS) data across: (A) TCGA and (B) Slovenia glioblastoma datasets.

Figure 4 .
Figure 4. Intersections of differential (A) plus (hazard ratio > 0) and (B) minus (hazard ratio < 0) genes for overall survival (OS) analysis between CGGA_325, CGGA_693, TCGA and the experimental (Slovenia) glioblastoma samples; p-values for intersection significance that are less than 0.05 are highlighted in bold.

Figure 4 .
Figure 4. Intersections of differential (A) plus (hazard ratio > 0) and (B) minus (hazard ratio < 0) genes for overall survival (OS) analysis between CGGA_325, CGGA_693, TCGA and the experimental (Slovenia) glioblastoma samples; p-values for intersection significance that are less than 0.05 are highlighted in bold.

Figure 4 .
Figure 4. Intersections of differential (A) plus (hazard ratio > 0) and (B) minus (hazard ratio < 0) genes for overall survival (OS) analysis between CGGA_325, CGGA_693, TCGA and the experimental (Slovenia) glioblastoma samples; p-values for intersection significance that are less than 0.05 are highlighted in bold.

Figure 7 .
Figure 7. Intersections of differential (A) plus (hazard ratio > 0) and (B) minus (hazard ratio < 0) molecular pathways for overall survival (OS) analysis between CGGA_325, CGGA_693, TCGA and the experimental (Slovenia) glioblastoma expression datasets; p-values for intersection significance that are less than 0.05 are highlighted in bold.

Figure 7 .Figure 8 .
Figure 7. Intersections of differential (A) plus (hazard ratio > 0) and (B) minus (hazard ratio < 0) molecular pathways for overall survival (OS) analysis between CGGA_325, CGGA_693, TCGA and the experimental (Slovenia) glioblastoma expression datasets; p-values for intersection significance that are less than 0.05 are highlighted in bold.Cancers 2021, 13, x FOR PEER REVIEW 8 of 20

Figure 9 .
Figure 9. Kaplan-Meier plot and hazard ratio for CRNDE expression in glioblastoma samples across (A) CGGA_325, (B) CGGA_693, (C) TCGA and (D) Slovenia datasets for overall survival (OS) analysis.These results demonstrate that in there out of four tested datasets with available overall survival data (CGGA_693, TCGA, Slovenia), glioblastoma patients with overexpressed CRNDE had significantly lower overall survival (p = 0.013-0.046).

Figure 9 .
Figure 9. Kaplan-Meier plot and hazard ratio for CRNDE expression in glioblastoma samples across (A) CGGA_325, (B) CGGA_693, (C) TCGA and (D) Slovenia datasets for overall survival (OS) analysis.These results demonstrate that in there out of four tested datasets with available overall survival data (CGGA_693, TCGA, Slovenia), glioblastoma patients with overexpressed CRNDE had significantly lower overall survival (p = 0.013-0.046).

Figure 10 .
Figure 10.Kaplan-Meier plot and hazard ratio for CRNDE expression in glioblastoma samples across (A) TCGA and (B) Slovenia datasets for progression-free survival (PFS) analysis.

Figure 10 .
Figure 10.Kaplan-Meier plot and hazard ratio for CRNDE expression in glioblastoma samples across (A) TCGA and (B) Slovenia datasets for progression-free survival (PFS) analysis.

Author
Contributions: M.S. (Maxim Sorokin), M.R., D.V.K., A.B., M.S. (Maria Suntsova) -transcriptomic analysis; M.S. (Maxim Sorokin), M.R., A.B.-survival analysis; A.Z. (Alja Zottel), N.Š.-critical reading of the manuscript; J.M.-provided FFPE tissue samples; A.M., A.Z. (Andrej Zupan)-methylation analysis; M.S.V.-provided patient information; I.J. and A.B.-conceptualized and designed the study; I.J.-wrote the manuscript.All authors read and approved the final version of the manuscript.Funding: The study was financially supported by the postdoctoral research project Z3-1869 (IJ) and the bilateral project Slovenia-Russian federation BI-RU 19/20-005 from the Slovenian Research Agency (SRA), by the OmicsWay research program in oncology.M.S. and A.B. were supported by the Ministry of Science and Higher Education of the Russian Federation within the framework of state support for the creation and development of World-Class Research Centers "Digital biodesign and personalized healthcare" No 075-15-2020-926.Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the National Medical Ethics Committee of the Republic of Slovenia approval numbers 0120-196/2017/7 (15 November 2017), 0120-190/2018/4 (16 April 2018) and 0120-190/2018/11 (15 May 2019).

Table 1 .
Clinicopathological information about experimental glioblastoma patients involved in this study.