Identification of Two Novel Circular RNAs Deriving from BCL2L12 and Investigation of Their Potential Value as a Molecular Signature in Colorectal Cancer

The utility of circular RNAs (circRNAs) as molecular biomarkers has recently emerged. However, only a handful of them have already been studied in colorectal cancer (CRC). The purpose of this study was to identify new circRNAs deriving from BCL2L12, a member of the BCL2 apoptosis-related family, and investigate their potential as biomarkers in CRC. Total RNA extracts from CRC cell lines and tissue samples were reversely transcribed. By combining PCR with divergent primers and nested PCR followed by Sanger sequencing, we were able to discover two BCL2L12 circRNAs. Subsequently, bioinformatical tools were used to predict the interactions of these circRNAs with microRNAs (miRNAs) and RNA-binding proteins (RBPs). Following a PCR-based pre-amplification, real-time qPCR was carried out for the quantification of each circRNA in CRC samples and cell lines. Biostatistical analysis was used to assess their potential prognostic value in CRC. Both novel BCL2L12 circRNAs likely interact with particular miRNAs and RBPs. Interestingly, circ-BCL2L12-2 expression is inversely associated with TNM stage, while circ-BCL2L12-1 overexpression is associated with shorter overall survival in CRC, particularly among TNM stage II patients. Overall, we identified two novel BCL2L12 circRNAs, one of which can further stratify TNM stage II patients into two subgroups with substantially distinct prognosis.


Introduction
Colorectal cancer (CRC) is a malignancy developed in the colon or rectum. Of all annually diagnosed cancers and cancer-related deaths, CRC represents approximately 10%, as it is the second most common cancer diagnosed in women and the third most common one in men [1]. Due to the high mortality rate of CRC, several attempts have been made towards the elucidation of its molecular background. Three specific gene categories are considered to play a basic role in CRC development: tumor suppressor genes such as APC, DCC, TP53, and the SMAD protein family; oncogenes such as KRAS and NRAS; and DNA repair genes [2]. Moreover, CRC cells are characterized by great molecular heterogeneity: their development is followed by loss of genomic integrity, accumulation of mutations, chromosomal and microsatellite instability, methylation of the DNA, and DNA repair defects. Moreover, signaling pathways that are vital for the cells, such as WNT/β-catenin, PI3K/AKT, and 4, and the extended exon 5, which was previously identified by members of our research group [11]; the length of this circRNA is 501 nucleotides (nt) and the back-splice event takes place between exons 6 and 1. The second one, which will be termed as circ-BCL2L12-2, consists of truncated exon 6 and whole exons 2, 4, and 5. It is important to note that the second circRNA contains a longer part of exon 6, compared to the first one. The length of this circRNA is 442 nt and the back-splice event takes place between exons 6 and 2. The exon structures and sequences of both circRNAs are shown in Figure 1.

Putative Interactions of BCL2L12 circRNAs with miRNAs and RBPs
In silico analysis showed that circ-BCL2L12-1 is predicted to sponge seven miRNAs, while circ-BCL2L12-2 is predicted to sponge five miRNAs. Out of these 12 miRNAs, only miR-1915-5p had a high prediction score for binding to circ-BCL2L12-1. These miRNAs, the prediction scores, and the binding motifs of the circRNAs are shown in Table 1. The targets of miR-1915-5p, which were found in at least two of the four tools and have been reported to play a role in CRC, are shown in Table S1. In both circRNAs, various protein-binding sites were detected and many RBPs were predicted to bind to them. RBPmap provided more information about the number of RBPs and their binding sites compared to beRBP; thus, we chose to use the results observed by this tool. We selected those RBPs with more than five binding sites, accompanied by a high probability score; these are presented in Table 2. circ-BCL2L12-2 is also predicted to have an open reading frame (ORF) and be subjected to translation. This ORF is marked in Figure 1f. However, no internal ribosomal entry sites (IRES) were predicted.

Standardization of Real-Time qPCR Assays for BCL2L12 circRNA Quantification
A real-time qPCR assay was standardized for the specific BCL2L12 circRNA quantification. Aiming to assure the quantitative results of each assay, a standard curve was generated for each amplicon, using serial dilutions of PCR products, deriving from a 25-cycle PCR assay. For circ-BCL2L12-1 and circ-BCL2L12-2 standard curves, PCR products from pre-amplification of Caco-2 cell line cDNA were serially diluted. For the HPRT1 standard curve, cDNA from a fresh frozen tissue specimen was pre-amplified and serially diluted. The efficiencies of the standard curves for HPRT1, circ-BCL2L12-1, and circ-BCL2L12-2 were 91%, 96%, and 107%, respectively, and therefore, the prerequisites for the quantification of our results were fulfilled. The standard curves of circ-BCL2L12-1 and circ-BCL2L12-2 are shown in Figure 2a,b, respectively. Moreover, each melt curve was unique, indicating the specificity of the amplicon. The melt curves of circ-BCL2L12-1 and circ-BCL2L12-2 are shown in Figure 2c,d, respectively.

Expression Analysis of BCL2L12 circRNAs in CRC Cell Lines
Through the pre-amplification and qPCR assays, we relatively quantified the expression levels of both circRNAs in Caco-2, HCT 116, HT-29, COLO 205, SW 620, DLD-1, and RKO CRC cell lines. circ-BCL2L12-1 is present in all CRC cell lines, while circ-BCL2L12-2 was detected in two of them, namely Caco-2 and HCT 116. The expression levels of circ-BCL2L12-1 in these two cell lines were higher than those of circ-BCL2L12-2. The amplification plots of the qPCR assays for both circRNAs are shown in Figure 3.

Expression of BCL2L12 circRNAs in Malignant Tumors and Non-Cancerous Tissues
The distributions of BCL2L12 circRNAs did not differ significantly between malignant tumors and paired non-cancerous tissues ( Figure S1). The mean expression values of both BCL2L12 circRNAs and mean values of other scale variables are shown in Table 3.

Association of Circ-BCL2L12-2 Expression with TNM II and III Stages
The CRC patients were stratified into two distinct subgroups (high vs. low), according to the expression levels of each circRNA, determined in relative quantification units (RQUs). The median expression of each circRNA was used as a cut-off point. The frequencies of circ-BCL2L12-2-high and -low patients varied significantly between TNM II and TNM III stages. Specifically, the frequencies of circ-BCL2L12-2-low patients were significantly higher in the TNM III stage compared to the TNM II stage, in contrast to circ-BCL2L12-2-high patients (p = 0.007) ( Figure S2).

Circ-BCL2L12-1 as a Potential Molecular Indicator of Poor Prognosis in CRC
Survival data were available for 151 out of the 168 CRC patients. Kaplan-Meier survival analysis revealed the prognostic potential of circ-BCL2L12-1, concerning overall survival (OS); specifically, CRC patients with higher levels of circ-BCL2L12-1 showed significantly shorter OS intervals, compared to those with lower levels (p = 0.036; Figure 4a). Additionally, similar results regarding OS were shown in the subgroup of TNM stage II patients (p = 0.045; Figure 4b). These results were strengthened by univariate Cox regression analysis, which showed a hazard ratio (HR) of 1.92 for patients with high circ-BCL2L12-1 levels [bias-corrected and accelerated (BCa) 95% confidence interval (CI) = 1.01-3.76, bootstrap p = 0.035; Table 4].

Discussion
By virtue of the advances in RNA sequencing (RNA-seq) and other new technological approaches, circRNAs have gained ground in molecular and cellular research, as they exert various biological functions. Some of these functions, such as transcription regulation, protein binding, and miRNA-sponging, can be crucial for cancer development. Various studies have implied their role in cancer [32], and particularly, in CRC, a deregulated expression pattern of circRNAs has been reported, compared to normal tissues [33].
Prompted by our interest in circRNAs, we first attempted to seek, in the circBase, RNA-seq data that prove the existence of circRNAs deriving from BCL2L12. Indeed, seven entries concerning BCL2L12 circRNAs were available [34]; however, none of these circRNAs have been experimentally validated yet, while the circRNAs identified in the current study have not been submitted to any of the available databases. Thus, no prediction data were available for the circRNAs identified in this study.
Many studies have declared that circRNAs are formed by exons that are abundant in the mRNA transcripts derived by the same gene [35]. However, the present study showed that both novel EcircRNAs include truncated exons, which have not been detected in the mRNAs transcribed from BCL2L12, despite the great number of existing splice variants [11,12]. Interestingly the truncated exons of both circRNAs are located in the back-splice site (exons 6 and 1 in circ-BCL2L12-1; exon 6 in circ-BCL2L12-2). Moreover, the back-splice site in both cases is not a canonical one; this could be attributed either to the cancer state of the cells [36], or to a circRNA biogenesis mechanism, which has not been elucidated yet.
Attempting to predict the interactions of the aforementioned circRNAs with other molecules and understand their function, we concluded that both of them are predicted to bind to splicing factors. This is expectable, due to the implication of specific splicing factors, including MNBL1, in circRNA biogenesis [37]. Interestingly, a linkage between circRNAs and splicing factors has previously emerged, affecting cancer development. More specifically, circSMARCA5 has been reported to inhibit the migration of glioblastoma multiforme cells through a molecular axis including SRFS1/SFFS3/PTB [38], indicating that the binding of splicing factors may be involved in other functions besides biogenesis. Moreover, a deregulated expression pattern of splice factors has been reported in CRC and in cancer generally, for instance, PTB1 and CELF1 have been reported to play a role in CRC proliferation [39,40]. Since all the aforementioned factors are predicted to bind to these novel BCL2L12 circRNAs, it is important that these interactions are experimentally verified and further elucidated.
As far as miRNAs are concerned, miR-1915-5p, which is predicted to be sponged by circ-BCL2L12-1 as shown in Table 1, is also predicted to target IGF2BP1, PRDX3, HDAC2, DAPK1, and EAF1 (Table S1). Particularly, HDAC2 was reported to facilitate tumorigenesis in CRC via chromatin structure regulation, while PRDX3 is overexpressed in CRC stem cells, and is involved in tumorigenesis as well. It seems to exert its function by maintaining the proper mitochondrial function in CRC stem cells and subsequently, promotes their survival [41,42]. On the other hand, DAPK1 acts as a tumor suppressor, through the inhibition of TACSDT2, a receptor that transduces Ca2+ signals and subsequently, leads to enhanced proliferation, invasion, and self-renewal signals [43]. EAF1 serves as a tumor suppressor as well, via the regulation of the canonical WNT/β-catenin pathway, which plays a key role in CRC development [44,45]. A tumor-suppressive function is exerted by IGF2BP1 too, which inhibits KRAS, CDC34, and MYC expression and promotes apoptosis [46]. As miR-1915-5p is predicted to be sponged by circ-BCL2L12-1, all the aforementioned targets are likely to be upregulated, due to the lack of miRNA-mRNA binding and the subsequent inhibition of the mRNA expression. This information suggests a multifarious impact of circ-BCL2L12-1 on CRC cells; however, further investigation is required.
In the current study, the prognostic potential of these circRNAs in CRC was also assessed. Due to their increased stability, which is attributed to their circular structure, circRNAs could serve as great molecular biomarkers. Although circ-BCL2L12-2 was not proved to have prognostic value and was not detected in many CRC samples and most cell lines, it was inversely associated with the TNM stage; thus, its biological function merits further examination. On the contrary, circ-BCL2L12-1 was more abundant and was also shown to predict the OS of the patients in our cohort (Figure 4a).
Interestingly, circ-BCL2L12-1 was shown to predict the OS of TNM II patients. The survival intervals of TNM II stage CRC patients often differ significantly. Therefore, studies have focused on the molecular mechanisms underlying CRC, in order to establish a stratification system that describes the clinical stage and predicts the outcome of these patients more efficiently. One step towards the establishment of a new stratification system was made with the formation of consensus molecular subtyping (CMS). CMS is a stratification system based on the biological characteristics of CRC patients that has recently emerged [47]. Recently, Purcell et al. suggested that consensus molecular subtyping is used, as it improves the prognosis of TNM II stage CRC patients [48]. Our results show that TNM II stage patients can be sub-grouped into two distinct strata, with different survival probability: patients with lower circ-BCL2L12-1 levels show longer OS intervals, close enough to those of TNM I stage patients, while patients with higher circ-BCL2L12-1 levels show shorter OS intervals, even shorter than those of TNM III stage patients (Figure 4b). Thus, circ-BCL2L12-1 could be included in a molecular stratification of CRC patients.
Summarizing the key findings of this study, the existence of circRNAs deriving from BCL2L12 was revealed; both circRNAs consist of known exons and truncated ones. circ-BCL2L12-2 is predicted to have one functional ORF, but no IRES are predicted. Additionally, both circRNAs are predicted to bind miRNAs and RBPs, as revealed by the miRDB and RBPmap tools, and shown in Tables 1 and 2, respectively. The deciphering of their interactions can provide new insights and be a step towards the understanding of the molecular background of CRC. The identification of peptides translated by these molecules is also a putative study field, as it would be interesting to examine functional similarities between those peptides and proteins encoded by linear transcripts. The biostatistical analysis conducted revealed that circ-BCL2L12-1 could be used as a molecular biomarker of poor prognosis for the OS of CRC patients and can provide a better stratification for TNM II patients based on their OS intervals. Moreover, it would be interesting to assess putative associations of these novel circRNAs and other molecular characteristics of the patients, for instance, RAS or BRAF or TP53 mutations, and MMR deficiency. Additionally, the existence of these molecules can be investigated in other tissues, to examine if their expression pattern is tissue-specific or not.
Our future goals include the elucidation of the biological functions of the circRNAs identified and the investigation of other circRNAs deriving from apoptosis-related genes. Moreover, their dynamic as molecular biomarkers has not been fully elucidated yet; as circRNAs have been detected in exosomes, it would be interesting to investigate the abundancy of BCL2L12 circRNAs in these vesicles and evaluate their screening potency. As exosomes are easily accessible in biological fluids, screening molecular biomarkers could gradually replace the highly invasive colonoscopy.

Cell Culture
Seven CRC cell lines, namely Caco-2, HCT 116, HT-29, COLO 205, SW 620, DLD-1, and RKO, were cultivated. The mutational status of the CRC cell lines is shown in Table S2 [49]. All cells were seeded at a concentration of 1 × 10 5 cells/mL and incubated in a humidified atmosphere for 24 h. The CO 2 concentration was adjusted to 5% and the temperature to 37 • C. The appropriate medium was chosen for each cell line, following the guidelines of ATCC ® .

Tissue Sample Collection
One hundred and sixty-eight malignant colorectal tumors and 63 paired non-cancerous colonic fresh frozen tissue specimens were provided by the University General Hospital "Attikon". The patients were followed up for 52 months (median time) and information regarding disease-free (DFS) and OS was collected, for 151 out of the 168 patients. The median age of the CRC patients was 69 years (range: 35-93). The clinicopathological characteristics of the patients are described in Table 5. This research study was conducted in compliance with the 1964 Declaration of Helsinki and its later amendments and was approved by the institutional Ethics Committee of the University General Hospital "Attikon", Athens, Greece (approval number: 31; 29 January 2009). Moreover, written informed consent was obtained from all participants.

RNA Extraction and Reverse Transcription
Total RNA extraction from CRC cell lines and samples was carried out using the TRIzol ® Reagent (Ambion™, Thermo Fisher Scientific Inc., Waltham, MA, USA). The concentration and purity of each RNA sample were evaluated spectrophotometrically. Subsequently, 2 µg of each RNA sample was subjected to reverse transcription using M-MLV reverse transcriptase (Invitrogen™, Thermo Fisher Scientific Inc.) and random hexamers (New England Biolabs Ltd., Hitchin, UK), following the manufacturers' instructions.

Primer Designing and PCR
The CRC cell lines were mixed in equal volumes to create a CRC pool. This pool was generated for experimental purposes, as this way, we were able to identify circRNAs expressed in any of the CRC cell lines. Divergent primers were designated for BCL2L12, so that only circular and not linear RNAs would be amplified during the PCR assay (Figure 1a,b) [50]. The CRC pool was subjected to a first-round PCR assay, which was conducted using KAPA Taq DNA Polymerase (KAPA Biosystems Inc., Woburn, MA, USA) in a MiniAmp Thermal Cycler (Applied Biosystems™, Thermo Fisher Scientific Inc.). The reaction mix contained 19.4 µL nuclease-free H 2 O, 2.5 µL 10x KAPA Taq Buffer, 0.5 µL of a dNTP mix (containing each dNTP at an initial concentration 10 mM), 1 µL of each primer (initial concentration 10 µM), 0.1 µL KAPA Taq DNA Polymerase (initial concentration 5 U/µL), and 0.5 µL of cDNA pool template. The thermal protocol was conducted as follows: An initial denaturation step at 95 • C for 3 min, followed by a cycling step, carried out for 35 cycles, consisting of a denaturation step at 95 • C for 30 s, an annealing step at 60 • C for 30 s, and an elongation step at 72 • C for 30 s. A final elongation step was carried out at 72 • C for 1 min.
Due to the low expression levels of these circRNAs compared to the linear RNAs, second-round (semi-nested) PCR assays were conducted, after diluting the PCR products at a ratio of 1:50 in nuclease-free H 2 O (Figure 1c,d) [51]. The semi-nested PCR assays were conducted as described above. The sequences of the primers used for both assays are listed in Table S3, while the primer pairs and sizes of the amplicons observed are listed in Table S4.

Agarose Gel Electrophoresis and Sanger Sequencing
After the PCR assays, the PCR products observed through the amplification of the cDNA pool were loaded on a 2% agarose gel, stained with ethidium bromide. After visualizing in UV light, the desired bands were excised, and DNA extraction was carried out using spin columns (MACHEREY-NAGEL GmbH & Co. KG, Düren, Germany). The concentration of the extracted PCR product was evaluated using a Qubit fluorometer (Thermo Fisher Scientific, Inc., Waltham, MA, USA). Subsequently, the sequence was identified by Sanger sequencing (Supplementary Materials, Data Set 1).

Bioinformatical Analysis for Prediction of circRNA Interactions with miRNAs and RBPs
In order to elucidate the interactions between circRNAs and miRNAs, miRDB was used [52]. This tool allows the user to submit a custom RNA sequence and provides a list of miRNAs that bind to this sequence, as well as a probability score. After choosing those miRNAs showing a high probability score (≥70) we searched for other putative targets, using miRDB, miRWalk, TargetRank, and TarBase (v.8) [53][54][55]. We then selected the targets that were identified in at least 2 of the tools and have been reported to play a role in CRC, as shown in Table S1.
Additionally, 2 more tools were used to investigate the interactions of these circRNAs with RBPs: RBPmap and beRBP [56,57]. Both tools use a custom RNA sequence as an input and provide a list of RBPs, the binding sites, and a probability score. Subsequently, we search for ORFs in the circRNAs to investigate if they could be translated into peptides. Due to the lack of 5'cap in circRNAs, we also searched for the existence of IRES. ORFs were investigated using ORFfinder, while IRESpy was utilized to search for IRES [58].
In order to carry out these predictions, the sequence of the circRNA should be submitted as linear. Therefore, we should "cut" the sequence in a stochastic position. Aiming to assure that no information for binding sites will be lost, due to this decision, we conducted all of the predictions twice, following the cleavage of circRNA sequence at a different point. More specifically, the last exon of the sequence which was submitted the first time in each prediction was transferred at the beginning of the sequence for the following predictions.

Pre-Amplification and qPCR
After the identification of the circRNAs, we aimed to quantify them in cell lines and human tissue sample cDNAs. Due to the low expression levels of the circRNAs, a PCR assay was conducted for the pre-amplification of these molecules and of HPRT1 mRNA, which was used as a reference for the qPCR assay. After the execution of the PCR assay in 15, 20, 25, 30, and 35 cycles, 25 cycles were chosen as the optimal pre-amplification condition. The thermal protocol was carried out as described above. Prior to the qPCR assay, the PCR products were diluted at a ratio of 1:50 in nuclease-free H 2 O.
A real-time qPCR assay was developed. In brief, the reaction, which contained KAPA SYBR FAST qPCR Master Mix (2X) Kit (KAPA Biosystems Inc.), was performed in an ABI 7500 Fast Real-Time PCR System (Applied Biosystems™), following a standard thermal protocol for cycling and melting, as previously described [59,60]. A standard curve was generated for each amplicon, using serial dilutions of PCR products. A graph was built by plotting the threshold cycle (C T ) versus the quantity. Each qPCR reaction was performed in duplicates, to assure the reproducibility of the obtained data. The expression levels of each circRNA were calculated using the comparative C T (2 −∆∆CT ) method [61,62]. The relative circRNA expression of each sample was determined in RQUs by calculating the ratio of each circRNA to HPRT1 molecules divided by the same ratio calculated for the calibrator (Caco-2 cells). The sequences of the primers used are listed in Table S5, while the primer pairs and sizes of the amplicons observed are listed in Table S6.

Biostatistical Analysis
Due to the fact that the distribution in our cohort was not normal, non-parametric tests such as the Wilcoxon signed-rank and Mann-Whitney U tests were used. Both of them do not take into consideration the distribution in the cohort. After performing descriptive statistics, the Wilcoxon signed-rank test was used to evaluate the difference in the expression of the circRNAs between cancer and non-cancerous samples. The CRC patients were then designated as "high" or "low", depending on the expression of each circRNA, calculated in RQUs; the separation of the two groups was based on the median value of each circRNA expression. Associations between the expression of each circRNA and other categorical variables were evaluated, using a two-tailed chi-square test.
Funding: This research received no external funding. Serine and arginine-rich splicing factor 1 SRSF2